Index configuration

Properties in a node can be indexed. Indexing is the process of extracting, processing and storing a search-optimized version of the property value.

Properties are mapped to specific indexes based on its valueType, and the node’s Index config - as illustrated below:

Index Mappings

When nodes are persisted, the property values are instantly indexed. A single property can be indexed multiple times - each index is referred to as an index mapping.

Each mapping enables specific query capabilities:

text

The default mapping used for all value types.

geoPoint

Supports earth based geographical locations

ngram

nGram-mappings are accessed via queries, using the nGram-function. An nGram-analyzed field will index all substring values from 2 to 25 characters.

Consider a string property, with the value article. ngram indexing will split the string into the following tokens when analyzed::

'ar', 'art', 'arti', 'artic', 'articl', 'article'

For more information about how the nGram-function works, check out the nGram-function.

analyzed

Splits the string into tokens for effective free text search. Used by the fulltext() query function.

Consider a string property, with the value This is test-driven development When analyzed, it is split into the following tokens:

'this', 'is', 'test', 'driven', 'development'

stemmed

Language optimized version of the analyzed index mapping. Used by the stemmed() query function.

Tokens are trimmed based on language specific features such as plurals and gender specific endings. Consider the following sentence stemmed for English content: "The monkey loved bananas". When indexed, the result will be something like:

'the', 'monkey', 'love' 'banana'

The same stemming algorithm is added to queries, supporting hits for queries like: "banana love", even if the strings do not match

path

The path-elements (separated by default path-separator '/') are indexed as tokens.

orderby

Any indexed property automatically gets indexed with the orderby mapping as well. The "orderby" index mapping lets us sort text and numbers in a natural way across properties with different ValueTypes

ValueTypes

Every property in a node has a specific value type. The value type enables the data storage to interpret and handle each value specially - applying to both validation and indexing.

Below is the complete list of all supported value-types.

Value Type Example Default indexing Comment

Value Type	Example	Default indexing	Comment
String	My String	text	String of characters within UTF charset
BinaryReference	a-binary-reference	text	Handle for accessing a binary
Boolean	true	text	A value representing `true` or `false`
Double	11.5	number, text	Double-precision 64-bit IEEE 754 floating point.
GeoPoint	59.9090442,10.7423389	geoPoint, text	Represents a geographical point on earth, given in latitude and longitude.
Instant	2015-03-16T10:00:02Z	datetime, text	A single point on the time-line (may include subsecond up to 9 digits).
LocalTime	10:00:03	text	A time representation without date or timezone(nor subsecond).
LocalDate	2015-03-16	datetime, text	A date representation. Will be indexed with UTC timezone offset.
LocalDateTime	2015-03-16T10:00:02	datetime, text	A date-time representation without timezone. Will be indexed with UTC timezone offset.
Long	1234	number, text	64-bit two’s complement integer.
Reference	0b7f7720-6ab1-4a37-8edc-731b7e4f439e	text	Holds a reference to other nodes in the same repository.
Set		Not indexed	Holds sub properties as it’s value
XML	<some>xml</some>	text	Any valid XML

String

My String

text

String of characters within UTF charset

BinaryReference

a-binary-reference

text

Handle for accessing a binary

Boolean

true

text

A value representing true or false

Double

11.5

number, text

Double-precision 64-bit IEEE 754 floating point.

GeoPoint

59.9090442,10.7423389

geoPoint, text

Represents a geographical point on earth, given in latitude and longitude.

Instant

2015-03-16T10:00:02Z

datetime, text

A single point on the time-line (may include subsecond up to 9 digits).

LocalTime

10:00:03

text

A time representation without date or timezone(nor subsecond).

LocalDate

2015-03-16

datetime, text

A date representation. Will be indexed with UTC timezone offset.

LocalDateTime

2015-03-16T10:00:02

datetime, text

A date-time representation without timezone. Will be indexed with UTC timezone offset.

Long

1234

number, text

64-bit two’s complement integer.

Reference

0b7f7720-6ab1-4a37-8edc-731b7e4f439e

text

Holds a reference to other nodes in the same repository.

Set

Not indexed

Holds sub properties as it’s value

XML

text

Any valid XML

_allText

Nodes that contain indexed String values, typically gets a generated system property called _allText. This property has the valueType String, and by default get indexed as text, ngram, and analyzed

The property is commonly used in "search everything" approaches.

When defining custom index configurations, you may choose if a property will be included in _allText, or not.

Index config

By default, properties are indexed based on their specific value type, according to the valueType table above. This strategy is known as decideByType.

Every now and then, you may need more detailed control of how your properties are indexed. This is where the index config comes in.

The index config allows you to provide detailed instructions on how the properties of a node should be indexed.

The index config itself is stored as a property on the node. A basic index config might look something like this:

Sample index config

"_indexConfig": {
    "default": {  (1)
        "enabled": true,
        "decideByType": false,
        "nGram": false,
        "fulltext": false,
        "includeInAllText": false,
        "path": false,
        "indexValueProcessors": [],
        "languages": []
    },
    "configs": [   (2)
        {
            "path": "myProperty",   (3)
            "config": {   (4)
                "enabled": true,
                "decideByType": false,
                "nGram": true,
                "fulltext": true,
                "includeInAllText": true,
                "path": false,
                "languages": []
            }
        },
        {
            "path": "mySet.**",   (5)
            "config": {
                "enabled": true,
                "decideByType": false,
                "nGram": false,
                "fulltext": false,
                "includeInAllText": false,
                "path": false,
                "languages": ['en','no']  (6)
            }
        }
    ]
}

1	default is the default config for all properties (unless overridden)
2	configs overrides the default config for properties matching specified `path`
3	path specifies the propertyPath the config applies to
4	config is the specific overriding config
5	mySet.** applies to all sub properties of "mySet"
6	languages stemmed language indices will be generated for all matched properties

Property paths

All config entires, with exception of default must specify a path. The path element defines the property scope within the node where this index configuration applies.

Paths follow the propertyPath format, optionally including double wildcard character **.

Examples:

Applies to "myProperty" and all sub properties

myProperty**

Applies to "myProperty.myName" and all sub properties

myProperty.myName

la= two letter language code as specified by ISO-639
co = optional two letter country code as specified by ISO-3166

Table 1. Supported languages for stemming
Code	Language
ar	Arabic
bg	Bulgarian
bn	Bengali
ca	Catalan
cs	Czech
da	Danish
de	German
el	Greek
en	English
eu	Basque
fa	Persian
fi	Finnish
fr	French
ga	Irish
gl	Galician
hi	Hindi
hu	Hungarian
hy	Armenian
id	Indonesian
it	Italian
ja	Japanese
ko	Korean
ku	Sorani
lt	Lithuanian
lv	Latvian
nl	Dutch
no	Norwegian
pt	Portuguese
pt-br	Brazilian
ro	Romanian
ru	Russian
es	Spanish
sv	Swedish
tr	Turkish
th	Thai
zh	Chinese

Use stemmed() function to query data based on these indices.

Example of creating a node with properties indexed for multiple languages.

Config templates

For simplicity, index configs may also be defined using a shorthand format. Rather than providing a full config object, you may instead reference a standard template.

Sample use of templates

"_indexConfig": {
    "default": "byType",  (1)
    "configs": [
        {
            "path": "myProperty",
            "config": "fulltext"  (2)
        }
    ]
}

1	Referencing the template "byType"
2	Referencing the template "fulltext"

The following templates are available:

none

Turns off indexing completely

None template output

"config": {
    "enabled": false,
    "decideByType": false,
    "nGram": false,
    "fulltext": false,
    "includeInAllText": false,
    "path": false
}

byType

Indexing based on valueType

Minimal template output

"config": {
    "enabled": true,
    "decideByType": true,
    "nGram": false,
    "fulltext": false,
    "includeInAllText": false,
    "path": false
}

fulltext

Activates common text indexing options

Fulltext template output

"config": {
    "enabled": true,
    "decideByType": false,
    "nGram": true,
    "fulltext": true,
    "includeInAllText": true,
    "path": false
}

path

Turns on path specific indexing

Path template output

"config": {
    "enabled": true,
    "decideByType": false,
    "nGram": false,
    "fulltext": false,
    "includeInAllText": false,
    "path": true
}

minimal

Will only create orderby indexes

Minimal template output

"config": {
    "enabled": true,
    "decideByType": false,
    "nGram": false,
    "fulltext": false,
    "includeInAllText": false,
    "path": false
}

Index configuration

Contents

Index Mappings

text

number

datetime

geoPoint

ngram

analyzed

stemmed

path

orderby

ValueTypes

_allText

Index config

Property paths

Config options

enabled

decideByType

fulltext

ngram

path

includeInAllText

languages

Config templates

none

byType

fulltext

path

minimal

Contents

Contents