Skip to content

Aggs: Fix significant terms not finding background documents for nested fields #128472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 4, 2025

Conversation

ivancea
Copy link
Contributor

@ivancea ivancea commented May 26, 2025

Closes #101163

Fixes the significant_terms aggregation not working correctly on nested fields.

It was returning buckets with bg_count: 0, meaning it wasn't detecting any background document.
The filter it used to check for background documents was a plain TermsQuery, which wouldn't work on Nested fields.

The PR adds an extra NestedQuery wrapping the Terms in case the field is inside a Nested parent.

SignificantText was also checked, but it's explicitly documented that it doesn't work on text fields, as it needs to access the source.

TBD: Backports

@ivancea ivancea requested a review from nik9000 May 26, 2025 15:26
@ivancea ivancea added >bug :Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0 labels May 26, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @ivancea, I've created a changelog YAML for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To help a bit with the review, these tests have:

  • Setup: Index with a type ("normal" and "outlier"), and a value (1 and 2). That value is replicated 6 times (integer and keyword, and then as a nested and doubly nested field. Every "value" field has the same value in each document, so testing against each of them should render the same results.
  • A first "Checking" test to ensure all data ingested is correct
  • Test cases for the 3 kinds of values (normal, nested, doubly nested). Each of them do a sig_terms expecting the same results (except for the background_filter one)

@ivancea ivancea marked this pull request as ready for review May 27, 2025 12:07
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Comment on lines +225 to +226
var nestedParentField = context.nestedLookup().getNestedParent(fieldType.name());
if (nestedParentField != null) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the best way to detect a nested field, but it worked well, and I didn't find any edge case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martijnvg, do you know the most right way to do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if buildQuery should take the nested context into account? That sounds like a bigger change than I'd like to make to aggs at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been some time since I worked on nested related functionality. Assuming that fieldType.name() returns a path, then I think this is the right way to figure out the parent field.

@elasticsearchmachine
Copy link
Collaborator

Hi @ivancea, I've updated the changelog YAML for you.

@ivancea ivancea requested a review from iverase May 27, 2025 13:05
@ivancea ivancea changed the title Aggs: Fix significant terms not finding background docuemnts for nested fields Aggs: Fix significant terms not finding background documents for nested fields May 28, 2025
@ivancea ivancea added auto-backport Automatically create backport pull requests when merged v8.19.0 v9.0.3 labels May 30, 2025
@ivancea ivancea enabled auto-merge (squash) June 3, 2025 15:04
@ivancea ivancea merged commit 36828e2 into elastic:main Jun 4, 2025
17 of 18 checks passed
@ivancea ivancea deleted the significant-terms-nested-investigation branch June 4, 2025 12:54
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.19
9.0

ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jun 4, 2025
…ed fields (elastic#128472)

Closes elastic#101163

Fixes the `significant_terms` aggregation not working correctly on nested fields.
ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jun 4, 2025
…ed fields (elastic#128472)

Closes elastic#101163

Fixes the `significant_terms` aggregation not working correctly on nested fields.
elasticsearchmachine pushed a commit that referenced this pull request Jun 4, 2025
…ed fields (#128472) (#128896)

Closes #101163

Fixes the `significant_terms` aggregation not working correctly on nested fields.
elasticsearchmachine pushed a commit that referenced this pull request Jun 4, 2025
…ed fields (#128472) (#128897)

Closes #101163

Fixes the `significant_terms` aggregation not working correctly on nested fields.
ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jun 5, 2025
…ed fields (elastic#128472)

Closes elastic#101163

Fixes the `significant_terms` aggregation not working correctly on nested fields.
ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jun 5, 2025
…ed fields (elastic#128472)

Closes elastic#101163

Fixes the `significant_terms` aggregation not working correctly on nested fields.
ivancea added a commit that referenced this pull request Jun 5, 2025
…ed fields (#128472) (#128970)

Closes #101163

Fixes the `significant_terms` aggregation not working correctly on nested fields.
ivancea added a commit that referenced this pull request Jun 5, 2025
…ed fields (#128472) (#128969)

Closes #101163

Fixes the `significant_terms` aggregation not working correctly on nested fields.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.17.8 v8.18.3 v8.19.0 v9.0.3 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Background Count (bg_count) Remains Zero in Nested and Filtered significant_terms Aggregation
4 participants