-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Update ecs@mappings.json with new GenAI fields #129122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ecs@mappings.json with new GenAI fields #129122
Conversation
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @eyalkoren, I've created a changelog YAML for you. |
{ | ||
"ecs_gen_ai_text": { | ||
"mapping": { | ||
"type": "text" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also not sure why gen_ai.request.model is text and not keyword...
That's a bit weird indeed. Using text
for metric dimensions is quite unusual. Especially since TSDB doesn't support text
fields as dimensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@susan-shu-c this PR is a followup for elastic/ecs#2475 - it updates our ECS dynamic templates (ecs@mappings
) to include the new gen_ai.*
fields.
Can you explain the rationale of using text
for these two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eyalkoren thanks for catching this. Looking through it, I do agree keyword
would be better for gen_ai.request.model
. I missed it earlier...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the next steps?
Should I make an update on top of elastic/ecs#2475 with a new PR before you can update this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought process on gen_ai.agent.description
being text is that it might be 1-2 sentences depending on how verbose it is, say describing a lot of components. Would love to hear what you think as well as your team would be much more aware of best practices!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the next steps?
Yes, it would make more sense if you change it first and we won't have to merge an "erroneous" state of the mappings. Please let me know how long you expect it to get approved and merged though, because until then we have failing test notifications, so if it's going to take a while, we may prefer merging as is and then fixing. Our change should be very quick.
Would love to hear what you think as well as your team would be much more aware of best practices!
My thinking was similar to yours, but what @felixbarny says is that if gen_ai.agent.description
is used as metic dimensions, it probably shouldn't be text
, because it won't work as expected in TSDB. I see that this field has a semantic conventions counterpart, so the definition there may indicate what it's used for. Its description in your PR says: "Free-form description of the GenAI agent provided by the application", which doesn't sound like something intended to be used as dimension. @felixbarny WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was similar to yours, but what @felixbarny says is that if gen_ai.agent.description is used as metic dimensions, it probably shouldn't be text, because it won't work as expected in TSDB.
Thinking about it again, there's a dedicated version of the dynamic ECS mappings for TSDB, incorporating the limited field types available: ecs-tsdb@template
. So maybe it's not as big of a deal, even if this field is used as a metric attribute. But is it really used as a metric attribute or is it something that would rather be attached to other signals, such as logs and traces? A free-form description does sound a bit curious to add as a metric attribute (aka dimension).
My thought process on gen_ai.agent.description being text is that it might be 1-2 sentences depending on how verbose it is, say describing a lot of components.
The text
field type is useful for full-text search use cases where phrase queries are used a lot. It's powerful but also rather expensive from a storage and indexing perspective. Irrespective of the length of the field, I'd only use it if you have concrete use cases for doing a full-text searches on it. If you just want to store the value to provide additional context, I'd recommend the keyword
field type. While it has a limit on how many characters it can store in doc_values
it falls back to storing the value in a stored field if ignore_above
is set, which seems like a good compromise here. You could potentially disable doc_values
and enable store
, however, you wouldn't be able to benefit from the dictionary encoding (ordinals) and other optimizations, like run-length encoding for shorter descriptions. Also, if you don't intend to filter on that field (with an exact value), you may want to set index
to false
to save space and ingest time overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this field has a semantic conventions counterpart, so the definition there may indicate what it's used for.
The documentation for the OTel field is here, it's of their type "string" so we have to decide ourselves what's best in ECS. I ported the same definition "Free-form description of the GenAI agent provided by the application."
While it has a limit on how many characters it can store in doc_values it falls back to storing the value in a stored field if ignore_above is set, which seems like a good compromise here.
I've created a new PR, could you check if this covers what you mentioned about ignore_above
?
Thanks both for the suggestions, the ECS PR has now been merged: elastic/ecs#2489 |
Thanks @susan-shu-c 🙏 @felixbarny note that the mappings for |
That seems unintentional. See also elastic/ecs#2489 (review) |
Yeah, I saw it wasn't part of your suggested change, this is why I referred you to this. The question is whether we need to insist on it. I assume you don't think we need to have Is it for the storage benefits related to the |
Yeah, it's probably not a big deal or something to insist on. I was indeed thinking that this could yield benefits related to run-length-encoding for doc_values. If the combination of doc_values: true and index: false in ECS would be difficult, having this in a stored fields should be just fine. |
What I tried to say is that I think the combination of |
Hi @eyalkoren @felixbarny I took a look 🤔 everything in the |
Sure 👍 |
Following the addition of new
gen_ai*
fields to ECS.All added fields are of types that were not mapped in the ECS dynamic templates so far.
I am also not sure why
gen_ai.request.model
istext
and notkeyword
...