Skip to content

Update ecs@mappings.json with new GenAI fields #129122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

eyalkoren
Copy link
Contributor

Following the addition of new gen_ai* fields to ECS.
All added fields are of types that were not mapped in the ECS dynamic templates so far.
I am also not sure why gen_ai.request.model is text and not keyword...

@eyalkoren eyalkoren requested review from ruflin and felixbarny June 8, 2025 05:22
@eyalkoren eyalkoren self-assigned this Jun 8, 2025
@elasticsearchmachine elasticsearchmachine added v9.1.0 needs:triage Requires assignment of a team area label labels Jun 8, 2025
@eyalkoren eyalkoren added >feature Team:Data Management Meta label for data/management team and removed needs:triage Requires assignment of a team area label labels Jun 8, 2025
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:Data Management Meta label for data/management team labels Jun 8, 2025
@eyalkoren eyalkoren added the :Data Management/Data streams Data streams and their lifecycles label Jun 8, 2025
@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team and removed needs:triage Requires assignment of a team area label labels Jun 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @eyalkoren, I've created a changelog YAML for you.

{
"ecs_gen_ai_text": {
"mapping": {
"type": "text"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not sure why gen_ai.request.model is text and not keyword...

That's a bit weird indeed. Using text for metric dimensions is quite unusual. Especially since TSDB doesn't support text fields as dimensions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@susan-shu-c this PR is a followup for elastic/ecs#2475 - it updates our ECS dynamic templates (ecs@mappings) to include the new gen_ai.* fields.
Can you explain the rationale of using text for these two?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eyalkoren thanks for catching this. Looking through it, I do agree keyword would be better for gen_ai.request.model. I missed it earlier...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the next steps?
Should I make an update on top of elastic/ecs#2475 with a new PR before you can update this PR?

Copy link
Member

@susan-shu-c susan-shu-c Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought process on gen_ai.agent.description being text is that it might be 1-2 sentences depending on how verbose it is, say describing a lot of components. Would love to hear what you think as well as your team would be much more aware of best practices!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the next steps?

Yes, it would make more sense if you change it first and we won't have to merge an "erroneous" state of the mappings. Please let me know how long you expect it to get approved and merged though, because until then we have failing test notifications, so if it's going to take a while, we may prefer merging as is and then fixing. Our change should be very quick.

Would love to hear what you think as well as your team would be much more aware of best practices!

My thinking was similar to yours, but what @felixbarny says is that if gen_ai.agent.description is used as metic dimensions, it probably shouldn't be text, because it won't work as expected in TSDB. I see that this field has a semantic conventions counterpart, so the definition there may indicate what it's used for. Its description in your PR says: "Free-form description of the GenAI agent provided by the application", which doesn't sound like something intended to be used as dimension. @felixbarny WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was similar to yours, but what @felixbarny says is that if gen_ai.agent.description is used as metic dimensions, it probably shouldn't be text, because it won't work as expected in TSDB.

Thinking about it again, there's a dedicated version of the dynamic ECS mappings for TSDB, incorporating the limited field types available: ecs-tsdb@template. So maybe it's not as big of a deal, even if this field is used as a metric attribute. But is it really used as a metric attribute or is it something that would rather be attached to other signals, such as logs and traces? A free-form description does sound a bit curious to add as a metric attribute (aka dimension).

My thought process on gen_ai.agent.description being text is that it might be 1-2 sentences depending on how verbose it is, say describing a lot of components.

The text field type is useful for full-text search use cases where phrase queries are used a lot. It's powerful but also rather expensive from a storage and indexing perspective. Irrespective of the length of the field, I'd only use it if you have concrete use cases for doing a full-text searches on it. If you just want to store the value to provide additional context, I'd recommend the keyword field type. While it has a limit on how many characters it can store in doc_values it falls back to storing the value in a stored field if ignore_above is set, which seems like a good compromise here. You could potentially disable doc_values and enable store, however, you wouldn't be able to benefit from the dictionary encoding (ordinals) and other optimizations, like run-length encoding for shorter descriptions. Also, if you don't intend to filter on that field (with an exact value), you may want to set index to false to save space and ingest time overhead.

Copy link
Member

@susan-shu-c susan-shu-c Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this field has a semantic conventions counterpart, so the definition there may indicate what it's used for.

The documentation for the OTel field is here, it's of their type "string" so we have to decide ourselves what's best in ECS. I ported the same definition "Free-form description of the GenAI agent provided by the application."

While it has a limit on how many characters it can store in doc_values it falls back to storing the value in a stored field if ignore_above is set, which seems like a good compromise here.

I've created a new PR, could you check if this covers what you mentioned about ignore_above?

elastic/ecs#2489

@susan-shu-c
Copy link
Member

Thanks both for the suggestions, the ECS PR has now been merged: elastic/ecs#2489

@eyalkoren
Copy link
Contributor Author

Thanks @susan-shu-c 🙏

@felixbarny note that the mappings for gen_ai.agent.description in the generated ECS files contain doc_values: false as well. The dynamic template for this field in ecs@mappings enforces that accordingly.

@felixbarny
Copy link
Member

That seems unintentional. See also elastic/ecs#2489 (review)

@eyalkoren
Copy link
Contributor Author

eyalkoren commented Jun 16, 2025

That seems unintentional

Yeah, I saw it wasn't part of your suggested change, this is why I referred you to this.

The question is whether we need to insist on it. I assume you don't think we need to have doc_values for aggregations or sorting based on a free-text description.

Is it for the storage benefits related to the doc_values RLE codec? I see why RLE is very efficient when comparing to the inverted index, but since this field is also mapped with index: false, and assuming the stored values will be compressed with ZSTD, do you still think there's considerable value in it?

@felixbarny
Copy link
Member

Yeah, it's probably not a big deal or something to insist on. I was indeed thinking that this could yield benefits related to run-length-encoding for doc_values. If the combination of doc_values: true and index: false in ECS would be difficult, having this in a stored fields should be just fine.

@eyalkoren
Copy link
Contributor Author

If the combination of doc_values: true and index: false in ECS would be difficult, having this in a stored fields should be just fine.

What I tried to say is that I think the combination of doc_values: true and index: false just doesn't have any benefit over store: true (assuming stored values are compressed), so maybe not required even if it is easy to do within ECS...
The benefit of keeping it as is in our narrow perspective is to avoid the additional dynamic template 🙂

@eyalkoren eyalkoren requested a review from felixbarny June 16, 2025 08:47
@susan-shu-c
Copy link
Member

Hi @eyalkoren @felixbarny I took a look 🤔 everything in the generated directory was created from their make scripts, I only edit schemas/gen_ai.yml . It seems like it there were some assumptions baked in. Is it possible to merge this based on the current stage of elastic/ecs#2489 and we can continue to update it?

@felixbarny
Copy link
Member

Sure 👍

@eyalkoren eyalkoren merged commit 94c63ca into elastic:main Jun 16, 2025
18 checks passed
@eyalkoren eyalkoren deleted the support-double-EcsDynamicTemplatesIT branch June 16, 2025 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles >feature Team:Data Management Meta label for data/management team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants