NoSQL Data Store

Contents

NoSQL Data Store

Enonic XP ships with a unique and powerful document-oriented data storage built on the shoulders of popular search engine Elasticsearch.

Background

When designing XP, we carefully evaluated existing and popular technologies from Git, file systems, object storage, SQL, the Java Content Repository, and various search and NoSQL solutions. However, we were unable to find any single solution that supported our requirements to functionality, performance, simplicity and technology. So, we decided to build our own solution.

The result is now a central part of Enonic XP, called the "Data store".

Features

The XP Data store enables you to persist, index, query and access data of virtually any kind, fast and efficiently. Below is a highlight of the capabilities provided:

  • Create multiple of Repositories within a single XP deployment

  • Store documents aka Nodes hierarchically, no schemas required

  • Nodes contain key-value entries called Properties

  • Query your data using an SQL inspired Query language

  • Use Aggregations to get powerful statistics on your data

  • Get highlighted query results based on scoring with Highlighting

  • Versions are created for every modification of a node

  • Supports Access Control down to a single document

  • Git inspired Branches enable you to stage and promote your data

  • Editors eliminate cumbersome update statements

  • Snapshot and restore feature to rollback entire system or per repository

  • Export and import nodes and tree structures in a human readable format

  • Dump and load your entire system

Repositories

Repositories are the fundamental concept that enable you to create and store data separately from one another. You may create any number of repositories within a single instance of Enonic XP.

Each repository must have a unique name. Best practice is to name or prefix the repository according to your application. i.e. a good repository name might be: com.company.myapp

Data stored in a repository will typically belong to a common domain. Fetches and searches are typically executed against a single repository, but may also be done across multiple repositories.

repos

System repository

Enonic XP ships with a system repository. The system repo is used for the following data:

  • Repo metadata, for other repos created within the instance

  • Id-providers, including users, groups and roles.

  • Application binaries, for apps installed through the API

Managing repositories

An API for managing repositories is available internally, for use by applications. Additionally, a remote API is available and can be accessed using the Enonic CLI. We also recommend checking out https://market.enonic.com/vendors/glenn-ricaud/data-toolbox.

Nodes

Each entry in a repository is called a Node.

A node essentially consist of:

When nodes are created in a repository, the following happens:

  • For new nodes, a unique identifier is created

  • For every modification, a unique version identifier is created

  • The data are stored the underlying Blobstore

  • The specific version of the node is then added to the specified Branch.

Properties

Properties inside a node hold the actual data values. Properties use a key-value format.

The key must be a unique name within the node, and the value must have a specific valueType, such as String, or GeoPoint. The valueType is used to index the property correctly, and provide basic validation.

Examples of what properties might look like:

mytext = "a string"
mynumber = 1

Each property has a specific ValueType (i.e. string or boolean) which again is used to determine indexing.

Some characters are illegal in a property key. Here’s a list of illegal characters:

  • _ is system reserved prefix

  • . is the path separator.

  • [ and ] are array index indicators.

Properties may also be nested, making the key a path. Elements in the path are separated by . (dot).

Here’s an example of properties with arrays and nested properties.

first-name = "Thomas"
cities = ["Oslo", "San Francisco"]
city.location = geoPoint('37.785146,-122.39758')
person.age = 39
person.birth-date = localDate("1975-17-10")

In the example above, the property person is of the ValueType Set. Sets are special in the way that they don’t hold actual values, but rather act as containers for other properties.

Properties are of a specific ValueType. ValueTypes are used for validation and securing correct indexing.

System Properties

In order to separate system properties from user defined properties, _ (underscore) has been reserved as a starting character for system standard properties.

The repository contains several standard metadata properties such as _id, _name, and _timestamp.

For more details on system properties, please consult the system properties section.

Query language

The Node Query Language, or NoQL for short, is inspired by traditional SQL. As with other NoSQL solutions, it has special capabilities and limitations.

Selectors, joins and update statements are not supported. However, NoQL adds cool features like relevance sorting and aggregations.

Selectors are currently not supported, and the only result of a query will only be identifiers for the matching nodes. Developers must then get the desired nodes (with their data) through a separate request.

A NoQL statement is essentially composed from three parts: Query, Sorting and Aggregations.

Queries

Queries represent an efficient way to accessing data stored in XP. Developers may also access data by Node IDs, path or child items. A query normally targets a single repository, but may also query multiple repositories at once.

Queries are built from traditional expressions. For instance, the following query would return all nodes in the repo, where the property weight is greater than 10.

weight > 10

Expressions may be combined by using traditional logical operators such as AND, and OR. For instance, we could limit the result further:

weight > 10 AND fulltext('article', 'should have these words', 'AND')

In this case we are adding a so-called dynamic expression to the query. The fulltext() expression performs a free text search on the property article for the specified search string.

For both the integer comparison and fulltext expression to work, the weight, and article properties need to be indexed properly.

For more insight check out the detailed documentation on queries.

Sorting

Like traditional SQL databases, XP lets you sort the result by property in ascending or descending order. A basic sort statement is simply defined by property and sorting direction i.e.:

myproperty DESC

Additionally, similar to Google, text-based query results may be sorted by ranking. Ranking is done through an internal algorithm that scores each individual item based on how it matches with your search. To sort by ranking, use the following statement:

_score DESC

For more insight check out the detailed documentation on sorting.

Aggregations

With Aggregations, developers may extract statistical results from your data blazingly fast. Aggregations can be used for anything from data visualization to creating navigational UI’s.

A common aggregation might be to determine the number of occurences of a "term" within a specific property. For instance, if you have 500 blog posts, that store a tag property where each tag is stored as a separate array entry. We might then perform a term aggregation to get the top 10 terms, and how many times they have occured.

We could define this aggregation as follows:

  {
    "aggregations": {
      "top-tags": {
        "terms": {
          "field": "tag",
          "order": "_count desc",
          "size": 10
        }
      }
    }
  }

And the result might look like this:

{
  "aggregations": {
    "top-tags": {
      "buckets": [
        {
          "docCount": 132,
          "key": "a tag"
        },
        {
          "docCount": 52,
          "key": "another tag"
        },
        {
          "docCount": 43,
          "key": "tag along"
        }
      ]
    }
  }
}

This may again be used to create a visualization, for instance as a Tag Cloud. XP supports several different kinds of aggregation types.

For more insight check out the detailed documentation on aggregations

Highlighting

Highlighters enable you to get highlighted snippets from one or more properties in your search results so you can show users where the query matches are. When you request highlights, the response contains an additional highlight element for each search hit that includes the highlighted properties and the highlighted fragments.

Highlighted query structure:

{
    "query" : {...},
    "highlight" : {
        ... global properties ...

        "properties" : {
            "<propertyName>" : {
                ... property settings...
            }
        }
    }
}

For more details check highlighting documentation

Branches

Inspired by Git, XP repos supports a concept called branches. All repos have a default branch called master. This means that the fully qualified location of a node consists of:

<repo> + <branch> + <path>

Any number of branches could be added to facilitate your data model. Branches are typically ideal for facilitating long running transactions.

As an example, XP’s CMS functionality makes use of two branches draft and master to support the editorial workflow, with previewing and bulk publishing of changes.

For more details, dive into the branches documentation.

Editors

Inspired by modern design patterns like Command Query Responsibility Segregation (CQRS), Enonic XP strongly separates accessing and querying data from writing.

Rather than using update statements, or sending pre-defined objects or structures for persisting, Enonic XP uses a concept called "Editors". An editor is typically a query, combined with a piece of code.

The query determines which nodes to modify, and the code is then executed for each single node.

Snapshot and restore

You may create snapshots of the storage at any time. Snapshots normally completes within seconds. Once you have a snapshot, you may rollback to the snapshot at a later time.

Snapshots will only restore the metadata and search index of your system, not the blobs. As such, you must have an intact set of blobs from when the snapshot was taken. Deleting blobs normally requires running a vacuum job on the storage.

Snapshotting and restoring can be performed by using Enonic CLI, the API or a management application from Enonic Market.

Export and import

You may export and import tree structures of nodes, or even single leaf nodes using the Export and Import feature. Exporting will produce a human-readable export of the selected repo/branch/node and it’s child items.

An export contains all metadata and blobs required to re-create the selected nodes. You may import any valid export, but pay attention as export/import will keep your existing node identifiers.

Exporting and importing can be performed by using Enonic CLI, the API or a management application from Enonic Market.

Dump and load

You may dump and load your entire system. Dumping will produce a machine readable file that can be used for loading at a later time.

The dump will include the complete set of all data in the storage, including repositories, branches, metadata, version history and binaries.

Dumping and loading can be performed by using Enonic CLI, the API or a management application from Enonic Market.

Dumping is not a recommended backup solution, rather use snapshots in combination with backing up your Blobstore.

Blobstore

Enonic XP currently uses a combination of file system and the embedded Elasticsearch for persistence of data. Segments of information is chunked into files and stored in so-called BlobStores. Files are written using an "append only" technique, meaning files are never locked or updated. BlobStores are organized by repository, and type, making it easy to identify which files belong to what repository. A small, but important set of metadata uses Elasticsearch as its primary data store.

For clustered deployments, Enonic XP by default relies on access to a shared file system.

You may tune configuration of both blobstore and Elasticsearch through the configuration files.

Contents