-
Notifications
You must be signed in to change notification settings - Fork 25.4k
ESQL: Support ST_EXTENT_AGG (#117451) #118829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
elasticsearchmachine
merged 1 commit into
elastic:8.x
from
GalLalouche:cherry_pick/st_extent
Dec 17, 2024
Merged
ESQL: Support ST_EXTENT_AGG (#117451) #118829
elasticsearchmachine
merged 1 commit into
elastic:8.x
from
GalLalouche:cherry_pick/st_extent
Dec 17, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Documentation preview: |
5bd6b61
to
9efa791
Compare
This PR adds support for ST_EXTENT_AGG aggregation, i.e., computing a bounding box over a set of points/shapes (Cartesian or geo). Note the difference between this aggregation and the already implemented scalar function ST_EXTENT. This isn't a very efficient implementation, and future PRs will attempt to read these extents directly from the doc values. We currently always use longitude wrapping, i.e., we may wrap around the dateline for a smaller bounding box. Future PRs will let the user control this behavior. Fixes elastic#104659.
9efa791
to
2029c26
Compare
craigtaverner
added a commit
that referenced
this pull request
Jan 16, 2025
Support for `ST_EXTENT_AGG` was added in #118829, and then partially optimized in #118829. This optimization worked only for cartesian_shape fields, and worked by extracting the Extent from the doc-values and re-encoding it as a WKB `BBOX` geometry. This does not work for geo_shape, where we need to retain all 6 integers stored in the doc-values, in order to perform the datelline choice only at reduce time during the final phase of the aggregation. Since both geo_shape and cartesian_shape perform the aggregations using integers, and the original Extent values in the doc-values are integers, this PR expands the previous optimization by: * Saving all Extent values into a multi-valued field in an IntBlock for both cartesian_shape and geo_shape * Simplifying the logic around merging intermediate states for all cases (geo/cartesian and grouped and non-grouped aggs) * Widening test cases for testing more combinations of aggregations and types, and fixing a few bugs found * Enhancing cartesian extent to convert from 6 ints to 4 ints at block loading time (for efficiency) * Fixing bugs in both cartesian and geo extents for generating intermediate state with missing groups (flaky tests in serverless) * Moved the int order to always match Rectangle for 4-int and Extent for 6-int cases (improved internal consistency) Since the PR already changed the meaning of the invalid/infinite values of the intermediate state integers, it was already not compatible with the previous cluster versions. We disabled mixed-cluster testing to prevent errors as a result of that. This leaves us the opportunity to make further changes that are mixed-cluster incompatible, hence the decision to perform this consistency update now.
rjernst
pushed a commit
to rjernst/elasticsearch
that referenced
this pull request
Jan 16, 2025
) Support for `ST_EXTENT_AGG` was added in elastic#118829, and then partially optimized in elastic#118829. This optimization worked only for cartesian_shape fields, and worked by extracting the Extent from the doc-values and re-encoding it as a WKB `BBOX` geometry. This does not work for geo_shape, where we need to retain all 6 integers stored in the doc-values, in order to perform the datelline choice only at reduce time during the final phase of the aggregation. Since both geo_shape and cartesian_shape perform the aggregations using integers, and the original Extent values in the doc-values are integers, this PR expands the previous optimization by: * Saving all Extent values into a multi-valued field in an IntBlock for both cartesian_shape and geo_shape * Simplifying the logic around merging intermediate states for all cases (geo/cartesian and grouped and non-grouped aggs) * Widening test cases for testing more combinations of aggregations and types, and fixing a few bugs found * Enhancing cartesian extent to convert from 6 ints to 4 ints at block loading time (for efficiency) * Fixing bugs in both cartesian and geo extents for generating intermediate state with missing groups (flaky tests in serverless) * Moved the int order to always match Rectangle for 4-int and Extent for 6-int cases (improved internal consistency) Since the PR already changed the meaning of the invalid/infinite values of the intermediate state integers, it was already not compatible with the previous cluster versions. We disabled mixed-cluster testing to prevent errors as a result of that. This leaves us the opportunity to make further changes that are mixed-cluster incompatible, hence the decision to perform this consistency update now.
craigtaverner
added a commit
to craigtaverner/elasticsearch
that referenced
this pull request
Feb 11, 2025
) Support for `ST_EXTENT_AGG` was added in elastic#118829, and then partially optimized in elastic#118829. This optimization worked only for cartesian_shape fields, and worked by extracting the Extent from the doc-values and re-encoding it as a WKB `BBOX` geometry. This does not work for geo_shape, where we need to retain all 6 integers stored in the doc-values, in order to perform the datelline choice only at reduce time during the final phase of the aggregation. Since both geo_shape and cartesian_shape perform the aggregations using integers, and the original Extent values in the doc-values are integers, this PR expands the previous optimization by: * Saving all Extent values into a multi-valued field in an IntBlock for both cartesian_shape and geo_shape * Simplifying the logic around merging intermediate states for all cases (geo/cartesian and grouped and non-grouped aggs) * Widening test cases for testing more combinations of aggregations and types, and fixing a few bugs found * Enhancing cartesian extent to convert from 6 ints to 4 ints at block loading time (for efficiency) * Fixing bugs in both cartesian and geo extents for generating intermediate state with missing groups (flaky tests in serverless) * Moved the int order to always match Rectangle for 4-int and Extent for 6-int cases (improved internal consistency) Since the PR already changed the meaning of the invalid/infinite values of the intermediate state integers, it was already not compatible with the previous cluster versions. We disabled mixed-cluster testing to prevent errors as a result of that. This leaves us the opportunity to make further changes that are mixed-cluster incompatible, hence the decision to perform this consistency update now.
elasticsearchmachine
pushed a commit
that referenced
this pull request
Feb 12, 2025
…) (#122276) * Optimize ST_EXTENT_AGG for geo_shape and cartesian_shape (#119889) Support for `ST_EXTENT_AGG` was added in #118829, and then partially optimized in #118829. This optimization worked only for cartesian_shape fields, and worked by extracting the Extent from the doc-values and re-encoding it as a WKB `BBOX` geometry. This does not work for geo_shape, where we need to retain all 6 integers stored in the doc-values, in order to perform the datelline choice only at reduce time during the final phase of the aggregation. Since both geo_shape and cartesian_shape perform the aggregations using integers, and the original Extent values in the doc-values are integers, this PR expands the previous optimization by: * Saving all Extent values into a multi-valued field in an IntBlock for both cartesian_shape and geo_shape * Simplifying the logic around merging intermediate states for all cases (geo/cartesian and grouped and non-grouped aggs) * Widening test cases for testing more combinations of aggregations and types, and fixing a few bugs found * Enhancing cartesian extent to convert from 6 ints to 4 ints at block loading time (for efficiency) * Fixing bugs in both cartesian and geo extents for generating intermediate state with missing groups (flaky tests in serverless) * Moved the int order to always match Rectangle for 4-int and Extent for 6-int cases (improved internal consistency) Since the PR already changed the meaning of the invalid/infinite values of the intermediate state integers, it was already not compatible with the previous cluster versions. We disabled mixed-cluster testing to prevent errors as a result of that. This leaves us the opportunity to make further changes that are mixed-cluster incompatible, hence the decision to perform this consistency update now. * Regenerate generated files
craigtaverner
added a commit
to craigtaverner/elasticsearch
that referenced
this pull request
Feb 12, 2025
…ic#119889) (elastic#122276) * Optimize ST_EXTENT_AGG for geo_shape and cartesian_shape (elastic#119889) Support for `ST_EXTENT_AGG` was added in elastic#118829, and then partially optimized in elastic#118829. This optimization worked only for cartesian_shape fields, and worked by extracting the Extent from the doc-values and re-encoding it as a WKB `BBOX` geometry. This does not work for geo_shape, where we need to retain all 6 integers stored in the doc-values, in order to perform the datelline choice only at reduce time during the final phase of the aggregation. Since both geo_shape and cartesian_shape perform the aggregations using integers, and the original Extent values in the doc-values are integers, this PR expands the previous optimization by: * Saving all Extent values into a multi-valued field in an IntBlock for both cartesian_shape and geo_shape * Simplifying the logic around merging intermediate states for all cases (geo/cartesian and grouped and non-grouped aggs) * Widening test cases for testing more combinations of aggregations and types, and fixing a few bugs found * Enhancing cartesian extent to convert from 6 ints to 4 ints at block loading time (for efficiency) * Fixing bugs in both cartesian and geo extents for generating intermediate state with missing groups (flaky tests in serverless) * Moved the int order to always match Rectangle for 4-int and Extent for 6-int cases (improved internal consistency) Since the PR already changed the meaning of the invalid/infinite values of the intermediate state integers, it was already not compatible with the previous cluster versions. We disabled mixed-cluster testing to prevent errors as a result of that. This leaves us the opportunity to make further changes that are mixed-cluster incompatible, hence the decision to perform this consistency update now. * Regenerate generated files
elasticsearchmachine
pushed a commit
that referenced
this pull request
Feb 13, 2025
…) (#122276) (#122420) * Optimize ST_EXTENT_AGG for geo_shape and cartesian_shape (#119889) Support for `ST_EXTENT_AGG` was added in #118829, and then partially optimized in #118829. This optimization worked only for cartesian_shape fields, and worked by extracting the Extent from the doc-values and re-encoding it as a WKB `BBOX` geometry. This does not work for geo_shape, where we need to retain all 6 integers stored in the doc-values, in order to perform the datelline choice only at reduce time during the final phase of the aggregation. Since both geo_shape and cartesian_shape perform the aggregations using integers, and the original Extent values in the doc-values are integers, this PR expands the previous optimization by: * Saving all Extent values into a multi-valued field in an IntBlock for both cartesian_shape and geo_shape * Simplifying the logic around merging intermediate states for all cases (geo/cartesian and grouped and non-grouped aggs) * Widening test cases for testing more combinations of aggregations and types, and fixing a few bugs found * Enhancing cartesian extent to convert from 6 ints to 4 ints at block loading time (for efficiency) * Fixing bugs in both cartesian and geo extents for generating intermediate state with missing groups (flaky tests in serverless) * Moved the int order to always match Rectangle for 4-int and Extent for 6-int cases (improved internal consistency) Since the PR already changed the meaning of the invalid/infinite values of the intermediate state integers, it was already not compatible with the previous cluster versions. We disabled mixed-cluster testing to prevent errors as a result of that. This leaves us the opportunity to make further changes that are mixed-cluster incompatible, hence the decision to perform this consistency update now. * Regenerate generated files
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Analytics/ES|QL
AKA ESQL
:Analytics/Geo
Indexing, search aggregations of geo points and shapes
auto-merge-without-approval
Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!)
backport
>feature
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
v8.18.0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backporting #117451.
This PR adds support for ST_EXTENT_AGG aggregation, i.e., computing a bounding box over a set of points/shapes (Cartesian or geo). Note the difference between this aggregation and the already implemented scalar function ST_EXTENT.
This isn't a very efficient implementation, and future PRs will attempt to read these extents directly from the doc values. We currently always use longitude wrapping, i.e., we may wrap around the dateline for a smaller bounding box. Future PRs will let the user control this behavior. Fixes #104659.