How to sort elasticsearch buckets by timestamp BEFORE results are filtered?
Posted By: Anonymous
my use case is as follows:
I have documents on elasticsearch, I save some telemetry data and i’m aggregating it before ingesting it to elastic search (each document can have a count field that tells how many times this event occurred).
Example of a json field:
{
"timestamp": "2021-06-05T06:53:25.275172"
"count": 5
"name": "myname",
"type: "mytype",
"sub_type": "mysubtype"
},
{
"timestamp": "2021-05-05T07:53:25.275172"
"count": 2
"name": "myname",
"type: "mytype",
"sub_type": "zzzzzmysubtype222"
}
{
"timestamp": "2021-04-05T06:22:25.275172"
"count": 1
"name": "myname",
"type: "mytype",
"sub_type": "mysubtype"
},
I’m doing a query to aggregate all "unique" documents (by name, type and sub_type), while counting the total number of responses, and i want to sort the buckets by timestamp.
The thing is, i have a "limit" parameter that limits the number of results received, and when i limit it by 1, i don’t get the latest event bucket (as i understand it sorts them by the 3 fields, then limits the size, and only then sorts it.
So, when running the below query – i would want to get the "myname, mytype, mysubtype", but i actually get the "myname, mytype, zzzzzmysubtype222" as the first bucket.
If i increase the limit – then it’s sorted correctly.
This is my query:
{
"query": {
"bool": {
"filter": [{"range": {"timestamp": {"lte": "now", "gte": "2020-06-05"}}}],
}
},
"aggs": {
"events": {
"composite": {
"size": 1, <----------- this doesn't give me latest bucket
"sources": [
{"name": {"terms": {"field": "name.keyword"}}},
{"type": {"terms": {"field": "type.keyword"}}},
{"sub_type": {"terms": {"field": "sub_type.keyword"}}},
],
},
"aggs": {
"events_count": {"sum": {"field": "count"}},
"latest_event": {"top_hits": {"sort": [{"timestamp": {"order": "desc"}}], "size": 1}},
"top_hit": {"max": {"field": "timestamp"}},
"events_bucket_sort": {"bucket_sort": {"sort": [{"top_hit": {"order": "desc"}}]}},
},
}
},
"size": 0,
}
So, what i’m trying to do is to sort the buckets by the latest event timestamp (top_hit) – any help would be appreciated.
I think it’s related to this: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html#_order
But timestamp shouldn’t be aggregated so how do i control the natural sort of the buckets? I need only the newest ones to be received first, no matter the size,
If it’s not possible, any way to get those unique documents aggregation that share several common fields?
Solution
As far as I can understand your question, you need to have the unique buckets based on the name
, type
, and sub_type
field, and then based on the buckets formed you need to get the latest bucket(based on the timestamp).
If you want to get unique buckets based on multiple fields, you can use multi terms aggregation along with the top_hits aggregation to get the latest bucket based on the timestamp
{
"size": 0,
"aggs": {
"genres_and_products": {
"multi_terms": {
"terms": [
{
"field": "name.keyword"
},
{
"field": "type.keyword"
},
{
"field": "sub_type.keyword"
}
]
}
},
"latest_event": {
"top_hits": {
"sort": [
{
"timestamp": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
Search Result:
"aggregations": {
"genres_and_products": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": [
"myname",
"mytype",
"mysubtype"
],
"key_as_string": "myname|mytype|mysubtype",
"doc_count": 2
},
{
"key": [
"myname",
"mytype",
"zzzzzmysubtype222"
],
"key_as_string": "myname|mytype|zzzzzmysubtype222",
"doc_count": 1
}
]
},
"latest_event": {
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "67775188",
"_type": "_doc",
"_id": "1",
"_score": null,
"_source": {
"timestamp": "2021-06-05T06:53:25.275172",
"count": 5,
"name": "myname",
"type": "mytype",
"sub_type": "mysubtype"
},
"sort": [
1622876005275
]
}
]
}
}
Answered By: Anonymous
Disclaimer: This content is shared under creative common license cc-by-sa 3.0. It is generated from StackExchange Website Network.