Aggregations
Contents
Aggregations
Introduction
An aggregation is a function that is executed on a collection of search results. The searchresults are defined by the query and queryfilter of the search request.
For instance, consider a query returning all nodes that have a property "price" less than, say, $100. Now, we want to divide the result nodes into ranges, say 0$25, $25$50 and so on. We also would like to know the average price for each category. This could be done by doing multiple separate queries and calculating the average manually, but this would be very inefficient and cumbersome. Luckily, aggregations solve these types of problems easily.
In some API functions it is possible to send in an aggregations expression object. This object is either in Java or a JSON like the following:
{
"aggregations" : {
"[name]" : {
"[type]" : {
... body ...
},
"aggregations": {
... subaggregations ...
}
}
}
There are two different types of aggregations:
 Bucket aggregations

A bucket aggregation places documents matching the query in a collection  a bucket. Each bucket has a key.
 Metrics aggeregations

A metric aggeregation computes metrics over a set of documents.
Typically, you will divide data into buckets and then use metric aggregations to calculate e.g average values, sum, etc for each bucket, if necessary.
terms
The terms
aggregation places documents into bucket based on property values. Each unique value of a property will get its own bucket. Here’s a list of properties:
 field (string)

The property path.
 size (int)

The number of bucket to return, ordered by the given orderType and orderDirection. Default to
10
.  order (string)

How to order the results, type and direction. Default to
_term ASC
.
Types:

_term
: Alphabetic ordering of bucket keys. 
_count
: Numeric ordering of number of document in buckets.
{
"aggregations": {
"categories": {
"terms": {
"field": "myCategory",
"order": "_count desc",
"size": 10
}
}
}
}
{
"aggregations": {
"categories": {
"buckets": [
{
"docCount": 132,
"key": "articles"
},
{
"docCount": 101,
"key": "documents"
},
{
"docCount": 43,
"key": "casestudies"
}
]
}
}
}
stats
The statsaggregations calculates the following statistics for the parentaggregation buckets:
avg
, min
, max
, count
, and sum
Here’s a list of properties:
 field (string)

The property path.
{
"start": 0,
"count": 0,
"aggregations": {
"products": {
"terms": {
"field": "data.product.category",
"order": "_count desc",
"size": 10
},
"aggregations": {
"priceStats": {
"stats": {
"field": "data.product.price"
}
}
}
}
}
}
{
"products": {
"buckets": [
{
"key": "tv",
"docCount": 123,
"priceStats": {
"count": 123,
"min": 2599,
"max": 87944,
"avg": 7400,
"sum": 578100
}
},
{
"key": "bluray player",
"docCount": 42,
"priceStats": {
"count": 42,
"min": 699,
"max": 5999,
"avg": 1548,
"sum": 65016
}
},
{
"key": "reciever",
"docCount": 12,
"priceStats": {
"count": 12,
"min": 2999,
"max": 26950,
"avg": 5548,
"sum": 66756
}
}
]
}
}
range
The range aggregation query defines a set of ranges that represents a bucket. Here’s a list of properties:
 field (string)

The property path.
 ranges (range[])

The rangebuckets to create.
 range (from: number, to: number)

Defines a range to create a bucket for. Fromvalue is included in bucket, to is excluded.
{
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{
"to": 50
},
{
"from": 50,
"to": 100
},
{
"from": 100
}
]
}
}
}
{
"price_ranges": {
"buckets": [
{
"docCount": 2,
"key": "a",
"to": 50
},
{
"docCount": 4,
"from": 50,
"key": "b",
"to": 100
},
{
"docCount": 4,
"from": 100,
"key": "c"
}
]
}
}
geoDistance
The geoDistance aggregation needs a defined range to split the documents into buckets. Only documents with properties of type 'GeoPoint' will be considered in the geoDistance
aggregation buckets.
Here’s a list of properties:
 field (string)

The property path.
 ranges (range[])

The rangebuckets to create.
 range (from: number, to: number)

Defines a range to create a bucket for. Fromvalue is included in bucket, to is excluded.
 unit (string)

The meassurement unit to use for the ranges. Legal values are either the full name or the abbreviation of the following: km (kilometers), m (meters), cm (centimeters), mm (millimeters), mi (miles), yd (yards), ft (feet) or nmi (nauticalmiles).
 origin (lat: number, lon: number)

The GeoPoint from which the distance is measured.
{
"aggregations": {
"distance": {
"geoDistance": {
"field": "data.cityLocation",
"unit": "km",
"origin": {
"lat": "90.0",
"lon": "0.0"
},
"ranges": [
{
"from": 0,
"to": 1200
},
{
"from": 1200,
"to": 4000
},
{
"from": 4000,
"to": 12000
},
{
"from": 12000
}
]
}
}
}
}
{
"aggregations": {
"distance": {
"buckets": [
{
"key": "*1200.0",
"doc_count": 3
},
{
"key": "1200.04000.0",
"doc_count": 4
},
{
"key": "4000.012000.0",
"doc_count": 5
},
{
"key": "12000.0*",
"doc_count": 1
}
]
}
}
}
At the time of writing, there is only one way of find out which result belongs to which bucket: By also sorting the result on geoDistance, and matching the order to the number of each bucket. In a future version, there will easier ways of doing this. 
dateRange
The dateRange aggregation query defines a set of dateranges that represents a bucket. Only documents with properties of type 'DateTime' will considered in the dateRange
aggregation buckets. Here’s a list of properties:
 field (string)

The property path.
 format (string)

The dateformat of which the buckets will be formatted to on return. Defaults to
yyyyMMdd’T’HH:mm:ss.SSSZ
.  ranges (range[])

The rangebuckets to create.
 range (from: <number>, to: <number>)

Defines a range to create a bucket for. Fromvalue is included in bucket, to is excluded. The from and to follows a special datemath explained below.
{
"my_date_range": {
"dateRange": {
"field": "date",
"format": "MMyyy",
"ranges": [
{
"to": "now10M"
},
{
"from": "now10M"
}
]
}
}
}
{
"my_date_range": {
"buckets": [
{
"key": "*122017",
"docCount": 2,
"to": "20171201T00:00:00Z"
},
{
"key": "122017*",
"docCount": 4,
"from": "20171201T00:00:00Z"
}
]
}
}
Datemath expression
The range fields accepts a datemath expression to calculate the timespans.

Now minus a day:
now1d

The given date minus 3 days plus one minute:
20141210T10:00:00Z3h+1m

Range describing now plus one day and thirty minutes, rounded to minutes:
now+1d+30m/m
dateHistogram
The datehistogram aggregation query defines a set of bucket based on a given timeunit. For instance, if querying a set of logevents, a dateHistorgram
aggregations query with interval h
(hour) will divide each log event into a bucket for each hour in the timespan of the matching events. Here’s a list of properties:
 field (string)

The property path.
 interval (string)

The timeunit interval for creating bucket. Supported timeunit notations:

y
= Year 
M
= Month 
w
= Week 
d
= Day 
h
= Hour 
m
= Minute 
s
= Second

 format (string)

Output format of date string.
 minDocCount (int)

Only include bucket in result if number of hits ⇐
minDocCount
.
{
"by_month": {
"dateHistogram": {
"field": "init_date",
"interval": "1M",
"minDocCount": 0,
"format": "MMyyy"
}
}
}
{
"by_month": {
"buckets": [
{
"docCount": 8,
"key": "201401"
},
{
"docCount": 10,
"key": "201402"
},
{
"docCount": 12,
"key": "201403"
}
]
}
}