Avoid over collecting in Limit or Lucene Operator #123296

dnhatn · 2025-02-24T17:13:28Z

Currently, we rely on signal propagation for early termination. For example, FROM index | LIMIT 10 can be executed by multiple Drivers: several Drivers to read document IDs and extract fields, and the final Driver to select at most 10 rows. In this scenario, each Lucene Driver can independently collect up to 10 rows until the final Driver has enough rows and signals them to stop collecting. In most cases, this model works fine, but when extracting fields from indices in the warm/cold tier, it can impact performance. This change introduces a Limiter used between LimitOperator and LuceneSourceOperator to avoid over-collecting. We will also need a follow-up to ensure that we do not over-collect between multiple stages of query execution.

elasticsearchmachine · 2025-02-24T19:33:25Z

Hi @dnhatn, I've created a changelog YAML for you.

elasticsearchmachine · 2025-02-24T23:56:25Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

costin

Great stuff. Sharing state in the pipeline execution (limit, min/max for filters) to trigger early execution is going to help significantly in high cardinality scenarios.

dnhatn · 2025-03-01T00:43:11Z

Thanks Costin!

elasticsearchmachine · 2025-03-01T00:45:41Z

Status	Branch	Result
✅	8.18
✅	8.x
✅	9.0

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 123296

Currently, we rely on signal propagation for early termination. For example, FROM index | LIMIT 10 can be executed by multiple Drivers: several Drivers to read document IDs and extract fields, and the final Driver to select at most 10 rows. In this scenario, each Lucene Driver can independently collect up to 10 rows until the final Driver has enough rows and signals them to stop collecting. In most cases, this model works fine, but when extracting fields from indices in the warm/cold tier, it can impact performance. This change introduces a Limiter used between LimitOperator and LuceneSourceOperator to avoid over-collecting. We will also need a follow-up to ensure that we do not over-collect between multiple stages of query execution.

…23784) * Avoid over collecting in Limit or Lucene Operator (#123296) Currently, we rely on signal propagation for early termination. For example, FROM index | LIMIT 10 can be executed by multiple Drivers: several Drivers to read document IDs and extract fields, and the final Driver to select at most 10 rows. In this scenario, each Lucene Driver can independently collect up to 10 rows until the final Driver has enough rows and signals them to stop collecting. In most cases, this model works fine, but when extracting fields from indices in the warm/cold tier, it can impact performance. This change introduces a Limiter used between LimitOperator and LuceneSourceOperator to avoid over-collecting. We will also need a follow-up to ensure that we do not over-collect between multiple stages of query execution. * Fix compilation after #123784 * fix compile * fix compile

…123783) * Avoid over collecting in Limit or Lucene Operator (#123296) Currently, we rely on signal propagation for early termination. For example, FROM index | LIMIT 10 can be executed by multiple Drivers: several Drivers to read document IDs and extract fields, and the final Driver to select at most 10 rows. In this scenario, each Lucene Driver can independently collect up to 10 rows until the final Driver has enough rows and signals them to stop collecting. In most cases, this model works fine, but when extracting fields from indices in the warm/cold tier, it can impact performance. This change introduces a Limiter used between LimitOperator and LuceneSourceOperator to avoid over-collecting. We will also need a follow-up to ensure that we do not over-collect between multiple stages of query execution. * Fix compilation after #123784 * fix compile * fix compile

A follow-up to #123296 to address a potential block leak that may occur when a circuit-breaking exception is triggered while truncating the docs or scores blocks. Relates #123296

A follow-up to elastic#123296 to address a potential block leak that may occur when a circuit-breaking exception is triggered while truncating the docs or scores blocks. Relates elastic#123296 (cherry picked from commit 7560e2e)

A follow-up to #123296 to address a potential block leak that may occur when a circuit-breaking exception is triggered while truncating the docs or scores blocks. Relates #123296 (cherry picked from commit 7560e2e)

A follow-up to #123296 to address a potential block leak that may occur when a circuit-breaking exception is triggered while truncating the docs or scores blocks. Relates #123296

idegtiarenko · 2025-03-03T12:50:33Z

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/Limiter.java

+/**
+ * A shared limiter used by multiple drivers to collect hits in parallel without exceeding the output limit.
+ * For example, if the query `FROM test-1,test-2 | LIMIT 100` is run with two drivers, and one driver (e.g., querying `test-1`)
+ * has collected 60 hits, then the other driver querying `test-2` should collect at most 40 hits.


Nice idea!

I wonder if we should make it explicit that this works as long as test-1 and test-2 are on the same node?

elasticsearchmachine added the v9.1.0 label Feb 24, 2025

dnhatn force-pushed the shared-limiter branch 2 times, most recently from 7754300 to 80a3936 Compare February 24, 2025 18:19

dnhatn changed the title ~~Share limiter between limit or lucene source operators~~ Avoid over collecting in Limit or Lucene Operator Feb 24, 2025

Share limiter between limit or lucene source operators

bbef844

dnhatn force-pushed the shared-limiter branch from 80a3936 to bbef844 Compare February 24, 2025 19:32

dnhatn added v9.0.1 v8.19.0 v8.18.1 v8.17.2 :Analytics/ES|QL AKA ESQL >bug labels Feb 24, 2025

Update docs/changelog/123296.yaml

cc2c838

dnhatn requested review from nik9000 and idegtiarenko February 24, 2025 23:55

dnhatn added the auto-backport Automatically create backport pull requests when merged label Feb 24, 2025

dnhatn marked this pull request as ready for review February 24, 2025 23:56

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 24, 2025

costin approved these changes Mar 1, 2025

View reviewed changes

dnhatn merged commit 333e252 into elastic:main Mar 1, 2025
17 checks passed

dnhatn deleted the shared-limiter branch March 1, 2025 00:43

This was referenced Mar 1, 2025

[8.18] Avoid over collecting in Limit or Lucene Operator (#123296) #123783

Merged

[8.x] Avoid over collecting in Limit or Lucene Operator (#123296) #123784

Merged

[9.0] Avoid over collecting in Limit or Lucene Operator (#123296) #123785

Merged

elasticsearchmachine added the backport pending label Mar 1, 2025

dnhatn mentioned this pull request Mar 3, 2025

Fix potential block leak in LuceneSourceOperator #123835

Merged

dnhatn removed backport pending v8.17.2 labels Mar 3, 2025

idegtiarenko reviewed Mar 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid over collecting in Limit or Lucene Operator #123296

Avoid over collecting in Limit or Lucene Operator #123296

Uh oh!

dnhatn commented Feb 24, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Feb 24, 2025

Uh oh!

elasticsearchmachine commented Feb 24, 2025

Uh oh!

costin left a comment •

edited

Loading

Uh oh!

dnhatn commented Mar 1, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Mar 1, 2025 •

edited by dnhatn

Loading

Uh oh!

idegtiarenko Mar 3, 2025

Uh oh!

Uh oh!

Avoid over collecting in Limit or Lucene Operator #123296

Avoid over collecting in Limit or Lucene Operator #123296

Uh oh!

Conversation

dnhatn commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Feb 24, 2025

Uh oh!

elasticsearchmachine commented Feb 24, 2025

Uh oh!

costin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn commented Mar 1, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Mar 1, 2025 • edited by dnhatn Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

idegtiarenko Mar 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dnhatn commented Feb 24, 2025 •

edited

Loading

costin left a comment •

edited

Loading

elasticsearchmachine commented Mar 1, 2025 •

edited by dnhatn

Loading