-
Notifications
You must be signed in to change notification settings - Fork 25.4k
[apm-data] Set event.dataset if empty for logs #129074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
011910c
to
71b8459
Compare
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @carsonip, I've created a changelog YAML for you. |
Testing notesTested with adding the processor to default pipeline.
Output: {
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": ".ds-logs-apm.app.foo-bar-2025.06.09-000001",
"_id": "_WCQVZcBbmWL1cSKlTSi",
"_score": 1,
"_source": {
"@timestamp": 1,
"data_stream": {
"namespace": "bar",
"type": "logs",
"dataset": "apm.app.foo"
},
"event": {
"ingested": "2025-06-09T16:40:28Z",
"dataset": "apm.app.foo"
}
}
},
{
"_index": ".ds-logs-apm.app.foo-bar-2025.06.09-000001",
"_id": "_mCQVZcBbmWL1cSKoTRm",
"_score": 1,
"_source": {
"@timestamp": 1,
"data_stream": {
"namespace": "bar",
"type": "logs",
"dataset": "apm.app.foo"
},
"event": {
"ingested": "2025-06-09T16:40:31Z",
"dataset": "ds"
}
}
}
]
}
} |
3a8135a
to
7add32d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a question about sending docs to logs-apm.*
without explicitly setting data_stream.dataset
. Not sure if that's a thing we need to cater for.
- set: | ||
if: "ctx.data_stream?.dataset != null" | ||
field: event.dataset | ||
value: "{{{data_stream.dataset}}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should we use copy_from
and ignore_empty_value
, and remove the if
and value
?
(See https://www.elastic.co/docs/reference/enrich-processor/set-processor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will data_stream.dataset
be automatically set for the logs-apm.*
data stream documents, if they're not specified in _source
? If that's the case, I'm not sure they will be in ctx.*
, and we may need to have a fallback.
Can you add test cases for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will data_stream.dataset be automatically set for the logs-apm.* data stream documents, if they're not specified in _source?
No, at least not visible to the log ingest pipeline.
If that's the case, I'm not sure they will be in ctx.*
Correct, they will not be in ctx.
we may need to have a fallback.
Do you mean a fallback to a non-null event.dataset
in the case where both data_stream.dataset and event.dataset is missing? I'm inclined to not perform any special case handling to make things up because it sounds more like a misconfiguration issue. Unless the user explicitly remove the DS fields in an intermediate ingest pipeline, DS fields should always be present in docs from apm-server (even if reroute processor is used).
In any case, I've added test in 957d9ef to confirm this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean a fallback to a non-null event.dataset in the case where both data_stream.dataset and event.dataset is missing? I'm inclined to not perform any special case handling to make things up because it sounds more like a misconfiguration issue.
I'm OK with this. I don't think it's a real thing we need to handle, even if it's technically possible.
Is there a reason why you switched away from Can this be considered a breaking change in behavior? |
The new |
Ahh okay, that makes sense. I missed the call out to the original |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks, and sorry for the delay
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
For APM logs, set event.dataset to data_stream.dataset if event.dataset is empty, to satisfy Anomaly Detection's requirement to have event.dataset in every logs-* data stream. (cherry picked from commit 466afba) # Conflicts: # test/framework/src/main/java/org/elasticsearch/test/rest/ESRestTestCase.java
For APM logs, set event.dataset to data_stream.dataset if event.dataset is empty, to satisfy Anomaly Detection's requirement to have event.dataset in every logs-* data stream. (cherry picked from commit 466afba) # Conflicts: # test/framework/src/main/java/org/elasticsearch/test/rest/ESRestTestCase.java
Tested manually with steps outlined in #129074 (comment) With version 9.0.3
With version 9.1.0
By inspecting the outputs above, version
|
For APM logs, set
event.dataset
todata_stream.dataset
ifevent.dataset
is empty, to satisfy Anomaly Detection's requirement to haveevent.dataset
in everylogs-*
data stream.Fixes elastic/apm-server#16397