Index configuration
Contents
Properties in a node can be indexed. Indexing is the process of extracting, processing and storing a search-optimized version of the property value.
Properties are mapped to specific indexes based on its valueType, and the node’s Index config - as illustrated below:
Index Mappings
When nodes are persisted, the property values are instantly indexed. A single property can be indexed multiple times - each index is referred to as an index mapping.
Each mapping enables specific query capabilities:
text
The default mapping used for all value types.
number
Effectively handles any numeric value
datetime
Handles any date value
geoPoint
Supports earth based geographical locations
ngram
nGram-mappings are accessed via queries, using the nGram-function. An nGram-analyzed field will index all substring values from 2 to 25 characters.
Consider a string property, with the value article
. ngram indexing will split the string into the following tokens when analyzed::
'ar', 'art', 'arti', 'artic', 'articl', 'article'
For more information about how the nGram-function works, check out the nGram-function.
analyzed
Splits the string into tokens for effective free text search. Used by the fulltext() query function.
Consider a string property, with the value This is test-driven development
When analyzed, it is split into the following tokens:
'this', 'is', 'test', 'driven', 'development'
stemmed
Language optimized version of the analyzed index mapping. Used by the stemmed() query function.
Tokens are trimmed based on language specific features such as plurals and gender specific endings. Consider the following sentence stemmed for English content: "The monkey loved bananas". When indexed, the result will be something like:
'the', 'monkey', 'love' 'banana'
The same stemming algorithm is added to queries, supporting hits for queries like: "banana love", even if the strings do not match
path
The path-elements (separated by default path-separator '/') are indexed as tokens.
orderby
Any indexed property automatically gets indexed with the orderby mapping as well. The "orderby" index mapping lets us sort text and numbers in a natural way across properties with different ValueTypes
ValueTypes
Every property in a node has a specific value type. The value type enables the data storage to interpret and handle each value specially - applying to both validation and indexing.
Below is the complete list of all supported value-types.
Value Type | Example | Default indexing | Comment |
---|---|---|---|
String |
My String |
text |
String of characters within UTF charset |
BinaryReference |
a-binary-reference |
text |
Handle for accessing a binary |
Boolean |
true |
text |
A value representing |
Double |
11.5 |
number, text |
Double-precision 64-bit IEEE 754 floating point. |
GeoPoint |
59.9090442,10.7423389 |
geoPoint, text |
Represents a geographical point on earth, given in latitude and longitude. |
Instant |
2015-03-16T10:00:02Z |
datetime, text |
A single point on the time-line (may include subsecond up to 9 digits). |
LocalTime |
10:00:03 |
text |
A time representation without date or timezone(nor subsecond). |
LocalDate |
2015-03-16 |
datetime, text |
A date representation. Will be indexed with UTC timezone offset. |
LocalDateTime |
2015-03-16T10:00:02 |
datetime, text |
A date-time representation without timezone. Will be indexed with UTC timezone offset. |
Long |
1234 |
number, text |
64-bit two’s complement integer. |
Reference |
0b7f7720-6ab1-4a37-8edc-731b7e4f439e |
text |
Holds a reference to other nodes in the same repository. |
Set |
Not indexed |
Holds sub properties as it’s value |
|
XML |
<some>xml</some> |
text |
Any valid XML |
_allText
Nodes that contain indexed String values, typically gets a generated system property called _allText. This property has the valueType String
, and by default get indexed as text
, ngram
, and analyzed
The property is commonly used in "search everything" approaches.
When defining custom index configurations, you may choose if a property will be included in _allText, or not. |
Index config
By default, properties are indexed based on their specific value type, according to the valueType table above. This strategy is known as decideByType
.
Every now and then, you may need more detailed control of how your properties are indexed. This is where the index config comes in.
The index config allows you to provide detailed instructions on how the properties of a node should be indexed.
The index config itself is stored as a property on the node. A basic index config might look something like this:
"_indexConfig": {
"default": { (1)
"enabled": true,
"decideByType": false,
"nGram": false,
"fulltext": false,
"includeInAllText": false,
"path": false,
"indexValueProcessors": [],
"languages": []
},
"configs": [ (2)
{
"path": "myProperty", (3)
"config": { (4)
"enabled": true,
"decideByType": false,
"nGram": true,
"fulltext": true,
"includeInAllText": true,
"path": false,
"languages": []
}
},
{
"path": "mySet.**", (5)
"config": {
"enabled": true,
"decideByType": false,
"nGram": false,
"fulltext": false,
"includeInAllText": false,
"path": false,
"languages": ['en','no'] (6)
}
}
]
}
1 | default is the default config for all properties (unless overridden) |
2 | configs overrides the default config for properties matching specified path |
3 | path specifies the propertyPath the config applies to |
4 | config is the specific overriding config |
5 | mySet.** applies to all sub properties of "mySet" |
6 | languages stemmed language indices will be generated for all matched properties |
Property paths
All config entires, with exception of default must specify a path. The path element defines the property scope within the node where this index configuration applies.
Paths follow the propertyPath format, optionally including double wildcard character **.
Examples:
myProperty**
myProperty.myName
Config options
The following options can be added to a configuration entry:
enabled
If false, indexing will be disabled for the affected properties
decideByType
If true, indexing is done based on valueType, according to the table above. I.e. numeric values are indexed as both string and numeric.
fulltext
Values are indexed as 'ngram', 'analyzed' and also added to the _allText system property
ngram
Values are indexed as 'ngram'
path
Values are indexed as 'path' and applicable for the pathMatch-function
includeInAllText
Affected values will be added to the _allText
property
languages
For each specified language, a stemmed index of the property will be created
Language codes are specified in the la[-co]`
format, where:
Code | Language |
---|---|
ar |
Arabic |
bg |
Bulgarian |
bn |
Bengali |
ca |
Catalan |
cs |
Czech |
da |
Danish |
de |
German |
el |
Greek |
en |
English |
eu |
Basque |
fa |
Persian |
fi |
Finnish |
fr |
French |
ga |
Irish |
gl |
Galician |
hi |
Hindi |
hu |
Hungarian |
hy |
Armenian |
id |
Indonesian |
it |
Italian |
ja |
Japanese |
ko |
Korean |
ku |
Sorani |
lt |
Lithuanian |
lv |
Latvian |
nl |
Dutch |
no |
Norwegian |
pt |
Portuguese |
pt-br |
Brazilian |
ro |
Romanian |
ru |
Russian |
es |
Spanish |
sv |
Swedish |
tr |
Turkish |
th |
Thai |
zh |
Chinese |
Use stemmed() function to query data based on these indices.
While setting the language for the content will only index the _allText field, setting the languages in the node config will create stemmed indices for all mapped properties. See node create function. |
repo.create({
_name: "fruits",
displayName: "Fruit basket",
description: "language indices usage example",
english_set: {
fruit_a: "Apple",
fruit_set: {
fruit_b: "Lemon",
fruit_c: "Orange"
}
},
norwegian_set: {
fruit_a: "Eple",
fruit_set: {
fruit_b: "Sitron",
fruit_c: "Oransje"
}
},
_indexConfig: {
default: {
enabled: true,
decideByType: true,
nGram: false,
fulltext: false,
includeInAllText: false,
path: false,
languages: ['en']
}, configs: [{
path: "norwegian_set.**",
config: {
enabled: true,
decideByType: false,
nGram: true,
fulltext: true,
includeInAllText: true,
path: false,
languages: ['no']
}
}]
}
});
Stemmed english indices will be generated for displayName
, description
and all strings inside english_set
. Norwegian indices will be created for strings inside norwegian_set
only.
Config templates
For simplicity, index configs may also be defined using a shorthand format. Rather than providing a full config object, you may instead reference a standard template.
"_indexConfig": {
"default": "byType", (1)
"configs": [
{
"path": "myProperty",
"config": "fulltext" (2)
}
]
}
1 | Referencing the template "byType" |
2 | Referencing the template "fulltext" |
The following templates are available:
none
Turns off indexing completely
"config": {
"enabled": false,
"decideByType": false,
"nGram": false,
"fulltext": false,
"includeInAllText": false,
"path": false
}
byType
Indexing based on valueType
"config": {
"enabled": true,
"decideByType": true,
"nGram": false,
"fulltext": false,
"includeInAllText": false,
"path": false
}
fulltext
Activates common text indexing options
"config": {
"enabled": true,
"decideByType": false,
"nGram": true,
"fulltext": true,
"includeInAllText": true,
"path": false
}
path
Turns on path specific indexing
"config": {
"enabled": true,
"decideByType": false,
"nGram": false,
"fulltext": false,
"includeInAllText": false,
"path": true
}
minimal
Will only create orderby indexes
"config": {
"enabled": true,
"decideByType": false,
"nGram": false,
"fulltext": false,
"includeInAllText": false,
"path": false
}