Skip to content

ESQL: revive inlinestats #122257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 12, 2025
Merged

ESQL: revive inlinestats #122257

merged 7 commits into from
Feb 12, 2025

Conversation

astefan
Copy link
Contributor

@astefan astefan commented Feb 11, 2025

First take at enabling inlinestats tests.

@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0 labels Feb 11, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@astefan astefan requested a review from costin February 11, 2025 14:41
@elasticsearchmachine
Copy link
Collaborator

Hi @astefan, I've created a changelog YAML for you.

@@ -2,7 +2,7 @@
// TODO: re-enable the commented tests once the Join functionality stabilizes
//

maxOfInt-Ignore
maxOfInt
required_capability: join_planning_v1
Copy link
Contributor

@alex-spies alex-spies Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we'll need a new capability to avoid running this with old nodes on bwc tests.

@@ -330,7 +330,7 @@ required_capability: join_planning_v1

FROM airports
| RENAME scalerank AS int
| LOOKUP int_number_names ON int
| LOOKUP JOIN int_number_names ON int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect, this test is still disabled and used to use LOOKUP_ 🐔 in the past. (But we probably also want a version of this with LOOKUP JOIN.)

Comment on lines +182 to +184
if (join instanceof InlineJoin) {
return new FragmentExec(bp);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd. What do we need this for? If this is a quick hack, can we add a TODO comment?

I'd think this would prevent the if (right instanceof LocalSourceExec localData) { branch below from ever triggering, which used to be the branch that mapped a physical operation to INLINESTATS.

It seems like this, instead, pushes the execution of the inline join into the data nodes, which cannot always be true - e.g. there could be a regular STATS before the INLINESTATS, which requries execution on the data node. Or just a LIMIT/SORT, or any other pipeline breaker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the inlinestats has been rewritten to allow lookup join to work in a more correct way, what was left for inlinestats was an incomplete solution (inlinestats.csv-spec was ignored in its entirety). EsqlSession code flow expects a fragment for inlinestats that contains logical plans for its right and left branches. It's planning then the right branch, executes it, then it comes back to its left branch and uses the right results as constant blocks in the left logical plan tree.

There are still many queries muted in the inlinestats.csv-spec test and I don't want to go the rabbit hole and try to fix everything in one go. I want on purpose to limit the extent of this PR and I will look further into fixing the rest in followups. If the followups lead to changing the fragment approach, that's ok.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is much smaller than I expected. LGTM

* Fixes a series of issues with inlinestats which had an incomplete implementation after lookup and inlinestats
* were refactored.
*/
INLINESTATS_STUBRELATION_PROPER_REPLACEMENT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe pick a smaller name that we can version while iterating on the implementation, for example: INLINESTATS_V2 ?

@astefan astefan merged commit 4b3acd4 into elastic:main Feb 12, 2025
17 checks passed
astefan added a commit to astefan/elasticsearch that referenced this pull request Feb 13, 2025
(cherry picked from commit 4b3acd4)

Don't run the csv tests for inlinestats in non-snapshot tests (elastic#122407)
(cherry picked from commit f8fb3c3)
@astefan astefan added the v9.0.1 label Feb 13, 2025
elasticsearchmachine pushed a commit that referenced this pull request Feb 13, 2025
(cherry picked from commit 4b3acd4)

Don't run the csv tests for inlinestats in non-snapshot tests (#122407)
(cherry picked from commit f8fb3c3)
@astefan astefan deleted the inlinestats_pickup1 branch February 13, 2025 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.0.1 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants