NoQL reference
Contents
Introduction
The Node Query Language, or NoQL for short, is inspired by traditional SQL. As with other NoSQL solutions, it has special capabilities and limitations.
Selectors, joins and update statements are essentially not supported. However, NoQL adds cool features like relevance sorting and aggregations.
Selectors are currently not supported, and the only result of a query will only be identifiers for the matching nodes. Developers must then get the desired nodes (with their data) through a separate request.
Query
Queries are just one of the ways to access data in the storage. Developers may also access nodes by ID, path or by fetching child items. A query normally targets a single repository, but may also query multiple repositories at once.
There are two types of query’s format available in XP: String format and DSL (Domain Specific Language).
Query DSL
Query DSL is an AST (Abstract Syntax Tree) expressed in JSON to define query expressions. See documentation for further explanation and examples.
String Query
String queries are built from traditional expressions. For instance, the following query would return all nodes in the repo, where the property weight
is greater than 10.
weight > 10
Expressions may be combined by using traditional logical operators such as AND, and OR. For instance, we could limit the result further:
weight > 10 AND fulltext('article', 'should have these words', 'AND')
In this case we are adding a so-called dynamic expression to the query. The fulltext() expression performs a free text search on the property article
for the specified search string.
For both the integer comparison and fulltext expression to work, the weight, and article properties need to be indexed properly.
If the query is empty, all nodes will be considered matches and returned.
The results matching a given query will be assigned a score. The scoring is for instance used to achieve relevance sorting.
Syntax
The query syntax is based on SQL and looks very similar:
queryExpr
queryExpr = [ constraintExpr ];
- Empty query
-
If no queryExpression is given, all documents will be in the result.
constraintExpr
constraintExpr = compareExpr
| logicalExpr
| functionExpr
| notExpr ;
compareExpr
compareExpr = fieldExpr operator valueExpr ;
fieldExpr = propertyPath ;
operator = '=', '!=', '>', '>', '<', '<=', 'LIKE', 'NOT LIKE', 'IN', 'NOT IN' ;
valueExpr = string | number | valueFunc ;
valueFunc = geoPoint | instant | time | date | dateTime | localDateTime ;
geoPoint = '"' lat ',' lon '"' ;
instant = 'instant(' string ')' ;
date = 'date(' string ')' ;
time = 'time(' string ')' ;
dateTime = 'dateTime(' string ')' ;
localDateTime = 'localDateTime(' string ')' ;
user.myCategory = "articles"
user.myCategory IN ("articles", "documents")
user.myCategory != "articles"
user.myCategory LIKE "*tic*"
myPriority < 10
myPriority <= 10
myPriority > 10
myPriority < 100
myPriority != 10
myInstant = instant('2014-02-26T14:52:30.00Z')
myInstant <= instant('2014-02-26T14:52:30.00Z')
myInstant <= dateTime('2014-02-26T14:52:30.00+02:00')
myTime = time('09:00')
myDate = date('2015-02-26')
myLocalDateTime = localDateTime('2015-02-26T15:00')
myLocation = '59.9127300,10.7460900'
myLocation IN ('59.9127300,10.7460900','59.2181000,10.9298000')
instant , dateTime , localDateTime , date functions will force search in datetime index, meanwhile for time function usual text index will be used. Using expressions without functions perform a search in text index for any time-related field. See more about indexing there. |
logicalExpr
logicalExpr = constraintExpr operator constraintExpr ;
operator = 'AND' | 'OR' ;
myCategory = "articles" AND myPriority > 10
myCategory IN ("articles", "documents") OR myPriority <= 10
functionExpr
functionExpr = functionName '(' arguments ')' ;
Examples
fulltext('_allText', 'fish spot', 'AND')
fulltext('data.*', 'fish spot', 'AND')
myCategory IN ('article', 'document') AND ngram('title', 'fish', 'AND')
propertyPath
propertyPath = pathElement ( '.' pathElement )* ;
pathElement = ( [ validJavaIdentifier - '.' ] )* ;
myProperty
data.myProperty
data.myCategory.myProperty
Wildcards in propertyPaths are only supported in
|
Query functions
Here’s a description of all functions that can be used in a query.
fulltext()
The fulltext function is searching for words in a field, and calculates relevance scores based on a set of rules (e.g number of occurences, field-length, etc).
Only analyzed properties are considered when applying the fulltext function. |
fulltext(<fields>, <search-string>, <operator>)
- fields
-
Comma-separated list of propertyPaths to include in the search.
'displayName' // Search in single field
'displayName,data.description,my.title' // Search in multiple fields
'data.*' // Wildcard usage
You can boost - thus increasing or decreasing hit-score per field basis. By providing more than one field to the query by appending a weight-factor: ^N
fulltext('displayName^5,data.description', 'my search string', 'AND')
- search-string
-
The search-string supports a set of operators
-
+
signifies AND operation. -
|
signifies OR operation. -
-
negates a single token. -
*
at the end of a term signifies a prefix query. -
(
and)
signify precedence. -
“` and `”
wraps a number of tokens to signify a phrase for searching -
~N
after a word signifies edit distance (fuzziness) with a number representing Levenshtein distance. -
~N
after a phrase signifies slop amount (how far apart terms in phrase are allowed)
-
- operator
-
Allowed operators are:
-
OR
Any of the words in the search-string matches. -
AND
All words in search-string matches.
-
Examples
fulltext("myField", "cheese fish cake onion", "OR")
fulltext("myData.myProperties.*", "cheese fish cake onion", "OR")
myCategory = '"soup" AND fulltext("myField", "cheese fish cake onion", "OR")'
fulltext("myField", "cheese fish cake onion", "AND")
fulltext("myField", "Levenshtein~2", "AND")
fulltext("myField", "fish -boat", "AND")
fulltext("data.*", "fish -boat", "AND")
fulltext('data.*', '"gone fishing today"', "AND")
fulltext('data.*', '"gone fishing today"~2', "AND")
stemmed()
The stemmed function is similar to fulltext() except that it searches language optimized tokens instead of a source text. E.g. source text The monkey loved bananas
will be transformed to the
, monkey
, love
, banana
tokens and they will be used for search.
Stemming is language-dependent, so language must be set either on the content or directly in the node indices via indexConfig. |
stemmed(<fields>, <search-string>, <operator>, <language>)
- fields
-
Comma-separated list of propertyPaths to include in the search.
Only _alltext field is currently indexed for stemming |
- search-string
-
The search-string supports a set of operators
-
+
signifies AND operation. -
|
signifies OR operation. -
-
negates a single token. -
*
at the end of a term signifies a prefix query. -
(
and)
signify precedence. -
“` and `”
wraps a number of tokens to signify a phrase for searching -
~N
after a word signifies edit distance (fuzziness) with a number representing Levenshtein distance. -
~N
after a phrase signifies slop amount (how far apart terms in phrase are allowed)
-
- operator
-
Allowed operators are:
-
OR
Any of the words in the search-string matches. -
AND
All words in search-string matches.
-
- language
-
Content language that was used for stemming. List of supported languages
Examples
stemmed("_alltext", "fish cake", "OR", "en")
nGram()
An n-gram is a sequence of n letters from a term. During ngram indexing, the term "foxy" is also indexed as: "f", "fo", and "fox".
When using the nGram search function, we are able get matches, even if the search only contains parts of a term. This is for instance useful when createing autocomplete functionality. The max limit of the ngram tokenizer is 25 characters, meaning that search strings over 25 characters will not match. As such, ngram queries may successfully be combined with the fulltext search function or other query expressions, to both match fragments of words as well as full phrases.
Only properties analyzed as text are considered when applying the ngram-function. This includes, by default, all text-based fields in the content domain. |
ngram(<field>, <search-string>, <operator>)
- fields
-
Comma-separated list of propertyPaths to include in the search.
- operator
-
Allowed operators are:
-
OR
Any of the words in the search-string matches (default) -
AND
All words in search-string matches.
-
Examples
ngram("myProp", "lev")
ngram("myProp", "lev alg", "AND")
ngram("myProp.*", "fish boat", "OR")
range()
The range functions test each value in the given property for a given range.
range(<field>, <from>, <to>, [<includeFrom>], [<includeTo>])
The from
and to
values must be of the same value type.
includeFrom
and includeTo
are optional with default value 'false', meaning that the actual values for the from
and to
are not included as matches.
Unbounded ranges can be queried by providing an empty string as argument.
Examples
range('version', '6.3.0', '6.4.0', 'true', 'false')
range('publishFrom', instant('2015-08-01T09:00:00Z'), instant('2015-08-01T11:00:00Z') )
2.0
and 3.0
, including 2.0
range('myValue', 2.0, 3.0, 'true', 'false' )
range('publishFrom', instant('2015-08-01T09:00:00Z'), '')
range('publishTo', '', instant('2015-08-01T09:00:00Z'))
pathMatch()
The path-match matches a path in a same branch, scoring the paths closest to the given query path first. Also, a number of minimum matching elements that must match could be set.
pathMatch(<field>, <path>, [<minimum_elements_must_match>])
If not given, the default minimum-must-match value will be 1.
Example
/content/mySite
/content/mySite/fish
/content/mySite/fish/onion
/content/mySite/cheese
/content/mySite/cheese/jam
/content/myOtherSite
pathMatch('_path', '/content/mySite/fish/onion/mayonnaise', 2)
_score
):
-
/content/mySite/fish/onion
-
/content/mySite/fish
-
/content/mySite/cheese/jam
-
/content/mySite/cheese
-
/content/mySite
Sort
Similar to traditional SQL databases, XP lets you sort the result by property in ascending or descending order. A basic sort statement is simply defined by property and sorting direction i.e.:
myproperty DESC
Additionally, similar to Google, text-based query results may be sorted by ranking. Ranking is done through an internal algorithm that scores each individual item based on how it matches with your search. To sort by ranking, use the following statement:
_score DESC
A sorting statement can consist of one or more comma separate expressions:
If sort specified for a query, results will contain a meta system property _sort
. Its exposes field values used for sorting.
To use DSL sort check documentation.
Syntax
sortExpr
sortExpr = ( | dynamicSortExpr ) ( ',' ( fieldSortExpr | dynamicSortExpr ) )* ;
fieldSortExpr
fieldSortExpr = propertyPath [ direction ] ;
direction = 'ASC' | 'DESC' ;
_name ASC
_timestamp DESC
title DESC
data.myProperty
dynamicSortExpr
dynamicSortExpr = functionExpr [ direction ] ;
direction = 'ASC' | 'DESC' ;
geoDistance('locationProperty', '59.9127300,10.746090')
Sort functions
Here’s a description of all functions that can be used in order-by clause.
geoDistance()
The geoDistance-function enables you to order the results according to distance to a given geo-point.
Documents with no geo-point property with the given path will be ordered last if matching the query. |
geoDistance(<field>, <location>, [<unit>])
- field
-
Any geoPoint property
- location
-
The location is a geoPoint from which the distance factor should be calculated, formatted as "latitude,longitude".
- unit
-
The string representation of distance unit to use. Defaults to "m" or "meters", the list of available values below:
-
"m" or "meters"
-
"in" or "inch"
-
"yd" or "yards"
-
"ft" or "feet"
-
"km" or "kilometers"
-
"NM" or "nmi" or "nauticalmiles"
-
"mm" or "millimeters"
-
"cm" or "centimeters"
-
"mi" or "miles"
-
geoDistance("shopLocation", "59.9127300,10.7460900", "km")
Relevance sorting
All properties with values automatically get a sort index entry. This means that any property can be used for sorting, either (default) DESC (descending) or ASC (ascending).
- _score
-
The dynamic system property _score is the default sort property
The score value of a node is dynamically calculated based on a number of factors: For instance, number of matching clauses in boolean expressions, how often the term appears in the documents when searching for text etc.
For more details, check out the Elasticsearch documentation.
Sort DSL
Sort DSL is an AST (Abstract Syntax Tree) expressed in JSON to define sort expressions. See documentation for further explanation and examples.
More examples
Querying paths
All nodes have three system-properties concerning the node placement in a branch, all of type String
:
-
_name
: The node name without path. -
_parentPath
: The parent node path. -
_path
: The full path of the node.
See the pathMatch() query function for advanced path matching
When working with the content API, all items are placed a special root node: |
Examples
/content/mySite/myCategory/myContent
.
_path = '/content/mySite/myCategory/myContent'
myContent
in a folder named myCategory
_name = 'myContent' AND _parentPath LIKE '*myCategory'
/content/mySite/myCategory
including children of children.
_path LIKE '/content/mySite/myCategory/*'
/content/mySite/myCategory
.
_parentPath = '/content/mySite/myCategory'
Querying missing values
From time to time, it is also interesting to search for nodes with or without a value:
myValue LIKE "*"
publish.first NOT LIKE "*"
Using exists and notExists filters is considered the most efficient way searching for nodes with missing values |
Querying references
An aggregation of all outbound references in a node are stored in the system property _references
. This property can then effectively be used also to find incoming references to a node.
_references = 'abc'
Date and time
Querying against date and time-fields may require some knowledge on how data is stored and indexed.
LocalDate
LocalDate represents a date without time-zone in the ISO-8601 calendar, e.g 2015-03-19
. LocalDate-properties are stored as a ISO LocalDate-formatted string in the index, thus all searches are done against string-values.
LocalDate string-format:
yyyy-MM-dd
Given a node with a property named 'myLocalDate' of type localDate
and value 2015-03-19
, all of the following queries will match:
myLocalDate = '2015-03-19' myLocalDate > '2015-03-18' myLocalDate <= '2015-03-19'
LocalTime
LocalTime represents a time without time-zone in the ISO-8601 calendar, e.g 11:39:49
. LocalTime-properties are stored as a ISO LocalTime-formatted string in the index, thus all searches are done against string-values.
LocalTime string-format:
HH:mm[:ss[.SSS]]
LocalTime string value examples:
09:30 10:00 10:00:30 10:00:30.142
Since the queries are matching string-values, the input time in query must either adhere the same string-format restrictions, or be wrapped in a function time
which accepts a time-formatted string as input.
Given a node with a property named 'myLocalTime' of type localTime
and value = 09:36:00
, all the following queries will match:
myLocalTime > '09:00' myLocalTime = '09:36' myLocalTime = '09:36:00' myLocalTime LIKE '09:*' myLocalTime < '09:36:01' myLocalTime < '09:36:00.1'
This must be wrapped in time-function since its not padded with a leading 0:
myLocalTime > time('9:00')
If optional fractions of seconds are given, the string format will also contain this even if 0, and expression will not match unless wrapped in time-function::
myLocalTime = time('09:36:00.0')
Even if the string-matching will do the job 99% of the time, the safest bet is to always go with the time-function when applicable.
LocalDateTime
LocalDateTime represents a date-time without time-zone in the ISO-8601 calendar, e.g 2015-03-19T11:39:49
. LocalDateTime-properties are stored as a ISO LocalDateTime-formatted string in the index, thus all searches are done against string-values.
LocalDateTime string-format:
yyyy-MM-ddTHH:mm[:ss[.SSS]]
Since the queries are matching string-values, the input dateTime in query must either adhere the same string-format restrictions, or be wrapped in a function dateTime
which accepts a dateTime-formatted string as input.
-
Given a node with a property named 'myLocalDateTime' of type
localDateTime
and value2015-03-19T10:30:00
, all of the following queries will match -
myLocalDateTime = '2015-03-19T10:30:00' myLocalDateTime = dateTime('2015-03-19T10:30') myLocalDateTime < dateTime('2015-03-19T10:30:00.001')
DateTime / Instant
DateTime represents a date-time with time-zone in the ISO-8601 calendar, e.g 2015-03-19T11:39:49+02:00
. Its possible to query properties of with value-type DateTime
both as an ISO instant and as ISO dateTime, using the provided built-in functions instant
and dateTime
.
Instant string-format (instant always given in UTC-time):
yyyy-MM-ddTHH:mm[:ss[.SSS]Z
Instant string value examples:
2015-03-19T16:30:20Z 2015-03-19T16:30:20.123Z
DateTime string-format (Z
for UTC, else offset in hours and minutes):
yyyy-MM-ddTHH:mm[:ss[.SSS](Z|+hh:mm|-hh:mm)
DateTime string value examples:
2015-03-19T16:30:20Z 2015-03-19T16:30:20+01:00 2015-03-19T16:30:20-01:30 2015-03-19T16:30:20.123-01:30
Given a node with a property named 'myDateTime' of type dateTime
and value 2015-03-19T10:25:00+02:00
, all of the following queries will match:
myDateTime = instant('2015-03-19T08:25:00Z') myDateTime = dateTime('2015-03-19T08:25:00Z') myDateTime = dateTime('2015-03-19T10:25:00+02:00') myDateTime = dateTime('2015-03-19T11:25:00+03:00')