Index configuration
Contents
Indexing is the process of extracting, processing and storing a search-optimized version of the property value.
Properties are mapped to specific indexes based on their value type and the node’s Index config — as illustrated below:
Index Mappings
When nodes are persisted, the property values are instantly indexed. A single property can be indexed multiple times - each index is referred to as an index mapping.
Each mapping enables specific query capabilities:
text
The default mapping used for all value types.
number
Effectively handles any numeric value
datetime
Handles any date value
geoPoint
Supports earth based geographical locations
ngram
nGram-mappings are accessed via queries, using the nGram-function. An nGram-analyzed field will index all substring values from 2 to 25 characters.
Consider a String property, with the value article. ngram indexing will split the string into the following tokens when analyzed::
'ar', 'art', 'arti', 'artic', 'articl', 'article'
For more information about how the nGram-function works, check out the nGram-function.
analyzed
Splits the string into tokens for effective free text search. Used by the fulltext query expression (and the legacy fulltext() NoQL function).
Consider a String property, with the value This is test-driven development When analyzed, it is split into the following tokens:
'this', 'is', 'test', 'driven', 'development'
stemmed
Language optimized version of the analyzed index mapping. Used by the stemmed query expression (and the legacy stemmed() NoQL function).
Tokens are trimmed based on language specific features such as plurals and gender specific endings. Consider the following sentence stemmed for English content: "The monkey loved bananas". When indexed, the result will be something like:
'the', 'monkey', 'love' 'banana'
The same stemming algorithm is added to queries, supporting hits for queries like: "banana love", even if the strings do not match
path
The path-elements (separated by default path-separator '/') are indexed as tokens.
orderby
Any indexed property automatically gets the orderby index mapping as well. The orderby index mapping lets us sort text and numbers in a natural way across properties and different value types.
Collation By default, orderby uses UNICODE collation, which is fast but comes with some limitations in regards to locale-specific sorting, as well as capital-case vs lowercase. Locale-specific collation can optionally be enabled on a per-field basis via Index config. This means you may force sorting for a specific language, such as sorting "ä" after "z" in Swedish.
| The CMS (Content API) automatically adds locale-aware indexing for the displayName property, should you need it in your queries. |
Read more about collation in the orderby section.
_allText
Nodes that contain indexed String values, typically gets a generated system property called _allText. This property has the valueType String, and by default get indexed as text, ngram, and analyzed.
The allText’s indexing is customizable via the node’s Index config.
The property is commonly used in "search everything" approaches.
When defining custom index configurations, you may choose if a property will be included in _allText, or not. (includeInAllText is set to true/false) |
Index config
By default, properties are indexed based on their specific value type. This strategy is known as decideByType.
Every now and then, you may need more detailed control of how your properties are indexed. This is where the index config comes in.
The index config allows you to provide detailed instructions on how the properties of a node should be indexed.
The index config itself is stored as a property on the node. A basic index config might look something like this:
"_indexConfig": {
"default": { (1)
"enabled": true,
"decideByType": false,
"nGram": false,
"fulltext": false,
"includeInAllText": false,
"path": false,
"indexValueProcessors": [],
"languages": []
},
"configs": [ (2)
{
"path": "myProperty", (3)
"config": { (4)
"enabled": true,
"decideByType": false,
"nGram": true,
"fulltext": true,
"includeInAllText": true,
"path": false,
"languages": []
}
},
{
"path": "mySet.**", (5)
"config": {
"enabled": true,
"decideByType": false,
"nGram": false,
"fulltext": false,
"includeInAllText": false,
"path": false,
"languages": ['en','no'] (6)
}
}
],
"allText": { (7)
"enabled": true,
"nGram": true,
"fulltext": true,
"languages": []
}
}
| 1 | default is the default config for all properties (unless overridden) |
| 2 | configs overrides the default config for properties matching specified path |
| 3 | path specifies the Property paths the config applies to |
| 4 | config is the specific overriding config |
| 5 | mySet.** applies to all sub properties of "mySet" |
| 6 | languages enables language-specific stemming and collation indices for matched properties. Read more in the Languages section below. |
| 7 | allText customizes the indexing of the _allText property. Read more in the _allText section above. |
Options
The following options can be added to a configuration entry:
- enabled
-
If false, indexing will be disabled for the affected properties
- decideByType
-
If true, indexing is done based on the property’s value type. I.e. numeric values are indexed as both
textandnumber. - fulltext
-
Values are indexed as
analyzed, enabling the fulltext query expression - nGram
-
Values are indexed as
ngram, enabling the nGram-function in queries - path
-
Values are indexed as
pathand applicable for the pathMatch-function - includeInAllText
-
Affected values will be added to the
_allTextproperty - languages
-
Generates a stemming index (where supported) and a collation index for each specified language. Read more in the Languages section below.
Property paths
All config entires, with exception of default must specify a path. The path element defines the property scope within the node where this index configuration applies.
Paths follow the propertyPath format, optionally including double wildcard character **.
Examples:
myProperty.**
myProperty.*
myProperty*.**
myProperty.myName
Languages
In general indexing and querying is language agnostic, but for string properties, you can specify a language to enable Stemming and Collation.
Configuration
To activate language-specific indexing for specific properties, you must specify this explicitly in the index configuration. For each specified language, a stemming index (where supported) and a collation index will be created.
| Language-specific indexing slows down indexing and increases the index size, so it should only be used when explicitly needed. |
"_indexConfig": {
"default": "byType",
"configs": [
{
"path": "title",
"config": {
"enabled": true,
"decideByType": false,
"fulltext": true,
"includeInAllText": true,
"languages": ["en"] (1)
}
},
{
"path": "description_no",
"config": {
"enabled": true,
"decideByType": false,
"fulltext": true,
"includeInAllText": true,
"languages": ["no"] (2)
}
}
]
}
| 1 | English indexing for the title property |
| 2 | Norwegian indexing for the Norwegian description |
"_indexConfig": {
"allText": {
"languages": ["en"] (1)
}
}
| 1 | Enables English stemming and collation for the _allText |
Supported languages
Language codes are specified in the la[-co] format, where:
The columns of the table below mean:
- Stemming
-
yesmeans a stemming index is generated;-means stemming is not supported for this language. - Collation
-
custommeans language-specific sorting rules are applied;DUCETmeans the language-neutral fallback is used (also applied for any language not listed here). - Description
-
Summarizes the custom collation rules, where applicable.
| ISO Code | Language | Stemming | Collation | Description |
|---|---|---|---|---|
|
ar |
Arabic |
yes |
custom |
Correctly handles script-specific shaping and ignores non-spacing vowels by default. |
|
hy |
Armenian |
yes |
custom |
Follows classical Armenian alphabetical sequence rules. |
|
az |
Azerbaijani |
- |
custom |
Correctly handles the distinction between dotted İ and dotless I. |
|
eu |
Basque |
yes |
custom |
Tailored modern Spanish-adjacent layout excluding character combinations like ch. |
|
be |
Belarusian |
- |
custom |
Custom Cyrillic sorting prioritizing the letter ў appropriately. |
|
bn |
Bengali |
yes |
custom |
Groups and sorts based on traditional Indic script phonetic structure. |
|
bs |
Bosnian |
- |
custom |
Supports both Latin and Cyrillic variants, accommodating specific digraphs (lj, nj, dž). |
|
pt-br |
Brazilian |
yes |
DUCET |
— |
|
bg |
Bulgarian |
yes |
custom |
Standard Cyrillic rules mapping ь and ъ distinctively. |
|
ca |
Catalan |
yes |
custom |
Treats characters like l·l (ela geminada) according to Spanish/French-hybrid rules. |
|
zh |
Chinese |
yes |
custom |
Supports multi-rule sets like Pinyin sorting or traditional radical/stroke count. |
|
hr |
Croatian |
- |
custom |
Sorts custom Latin digraphs (č, ć, dž, đ, lj, nj, š, ž) as independent letters. |
|
cs |
Czech |
yes |
custom |
Treats the two-letter combination ch as a distinct entity sorting after h. |
|
da |
Danish |
yes |
custom |
Places specific vowels Æ, Ø, and Å sequentially at the absolute end of the alphabet. |
|
nl |
Dutch |
yes |
custom |
Standardized Latin sorting; optionally handles IJ variations depending on context. |
|
en |
English |
yes |
custom |
Standard Latin alphabet sorting with case/accent sensitivity levels. |
|
et |
Estonian |
- |
custom |
Places modified characters õ, ä, ö, ü between v and x. |
|
fi |
Finnish |
yes |
custom |
Treats V and W as primarily identical, sorting custom vowels Å, Ä, Ö at the end. |
|
fr |
French |
yes |
custom |
Evaluates accent changes backwards (from right to left) for secondary string tie-breaking. |
|
gl |
Galician |
yes |
custom |
Adapts traditional Spanish rules to match standard local dictionary frameworks. |
|
ka |
Georgian |
- |
custom |
Maps the unique Mkhedruli script characters sequentially. |
|
de |
German |
yes |
custom |
Supports standard sorting or Phonebook (ä expands to ae) via variants. |
|
el |
Greek |
yes |
custom |
Correctly aligns modern Greek characters and ignores diacritics on uppercase variants. |
|
he |
Hebrew |
- |
custom |
Properly sequences right-to-left Hebrew script characters, ignoring cantillation points. |
|
hi |
Hindi |
yes |
custom |
Alphabetizes native Devanagari script according to strict phonetic vowel/consonant orders. |
|
hu |
Hungarian |
yes |
custom |
Recognizes compound consonants (cs, dz, dzs, gy) as distinct, singular letters. |
|
is |
Icelandic |
- |
custom |
Retains classical letters like Þ (thorn) and Ð (eth) in historical positions. |
|
id |
Indonesian |
yes |
custom |
Standard Latin alphabet sorting order. |
|
ga |
Irish |
yes |
custom |
Handles specialized Gaelic prefix sorting rules (e.g., ignoring mutations like t- or n-). |
|
it |
Italian |
yes |
custom |
Standard Latin rules with typical accent-insensitive initial matching. |
|
ja |
Japanese |
yes |
custom |
Supports intricate Kanji/Kana sorting structures including Hiragana vs. Katakana levels. |
|
ko |
Korean |
yes |
custom |
Sorts according to the foundational structural order of Hangul Jamo syllables. |
|
lv |
Latvian |
yes |
custom |
Sorts modified characters (like č, ģ, ķ) directly behind their unmodified base. |
|
lt |
Lithuanian |
yes |
custom |
Places the character Y distinctly between I and J. |
|
no |
Norwegian |
yes |
custom |
Aligns identically with Danish rules, grouping Æ, Ø, and Å at the end. |
|
fa |
Persian |
yes |
custom |
Customizes standard Arabic script ordering to account for distinct Persian extensions (پ, چ, ژ, گ). |
|
pl |
Polish |
- |
custom |
Standardized to place accented elements (ą, ć, ę, ł, ń, ó, ś, ź, ż) after base characters. |
|
pt |
Portuguese |
yes |
custom |
Standard Latin alphabetic sorting matching Iberian/Brazilian requirements. |
|
ro |
Romanian |
yes |
custom |
Groups specific characters (ă, â, î, ș, ț) sequentially behind their base counterparts. |
|
ru |
Russian |
yes |
custom |
Standard modern Cyrillic ordering, treating Е and Ё distinctly based on context levels. |
|
sk |
Slovak |
- |
custom |
Treats unique digraphs like ch and accented sets (ä, č, ď) with independent spacing. |
|
ku |
Sorani |
yes |
DUCET |
— |
|
es |
Spanish |
yes |
custom |
Supports standard layout or traditional option sets treating ch and ll as unique letters. |
|
sv |
Swedish |
yes |
custom |
Places Å, Ä, and Ö at the end; traditionally treats W as a variant of V. |
|
th |
Thai |
yes |
custom |
Automatically rearranges leading vowels when preceding consonants for accurate grouping. |
|
tr |
Turkish |
yes |
custom |
Strictly separates I (uppercase dotless) to ı from İ (uppercase dotted) to i. |
Stemming
Stemming is a powerful feature for improving search relevance in multilingual systems. It works by reducing words to their root form, making search results more flexible and user-friendly.
When a property is configured with a language-specific stemming index:
-
During indexing: The text is tokenized and each token is reduced to its stem (root form)
-
During search: The search query is also stemmed using the same algorithm
-
Matching: The stemmed tokens from the query are compared against the stemmed index
This means that searches for "running", "runs", or "ran" can all match content containing "run" and vice versa.
Collation
The default UNICODE sorting works in most cases, but does not cover special scenarios such as sorting the Norwegian characters Æ, Ø, and Å correctly. This is where collation comes into play.
For each supported language specified, an additional orderby index is created, applying that language’s collation rules when sorting. Languages without specific rules fall back to DUCET, a language-neutral ordering.
| Sorting by a collation index is slower at query time than the default UNICODE sort. |
Templates
For simplicity, index configs may also be defined using a shorthand format. Rather than providing a full config object, you may instead reference a standard template.
"_indexConfig": {
"default": "byType", (1)
"configs": [
{
"path": "myProperty",
"config": "fulltext" (2)
}
]
}
| 1 | Referencing the template "byType" |
| 2 | Referencing the template "fulltext" |
The following templates are available:
none
Turns off indexing completely
"config": {
"enabled": false,
"decideByType": false,
"nGram": false,
"fulltext": false,
"includeInAllText": false,
"path": false
}
byType
Indexing based on valueType
"config": {
"enabled": true,
"decideByType": true,
"nGram": false,
"fulltext": false,
"includeInAllText": false,
"path": false
}
fulltext
Activates common text indexing options
"config": {
"enabled": true,
"decideByType": false,
"nGram": true,
"fulltext": true,
"includeInAllText": true,
"path": false
}
path
Turns on path specific indexing
"config": {
"enabled": true,
"decideByType": false,
"nGram": false,
"fulltext": false,
"includeInAllText": false,
"path": true
}
minimal
Will only create orderby indexes
"config": {
"enabled": true,
"decideByType": false,
"nGram": false,
"fulltext": false,
"includeInAllText": false,
"path": false
}