Skip to content

ESQL: Enhanced DATE_TRUNC with arbitrary intervals #120302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

kanoshiou
Copy link
Contributor

@kanoshiou kanoshiou commented Jan 16, 2025

Originally, DATE_TRUNC only supported 1-month and 3-month intervals for months, and 1-year interval for years, while arbitrary intervals were supported for weeks and days. This PR adds support for DATE_TRUNC with arbitrary month and year intervals.

Closes #120094

@kanoshiou kanoshiou requested a review from a team as a code owner January 16, 2025 16:12
@kanoshiou kanoshiou marked this pull request as draft January 16, 2025 16:12
@elasticsearchmachine elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team v9.0.0 labels Jan 16, 2025
@kanoshiou
Copy link
Contributor Author

Although it is possible to specify an arbitrary interval of months or years in this PR, wouldn't it be clearer to allow users to specify a reference time from which to start calculating the interval? I'm not sure.

@kanoshiou kanoshiou marked this pull request as ready for review January 21, 2025 13:59
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jan 21, 2025
@gbanasiak gbanasiak added the :Analytics/ES|QL AKA ESQL label Jan 23, 2025
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Jan 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@alex-spies
Copy link
Contributor

@bpintea , do you maybe want to have a look at this one? It's related to BUCKET, your favorite :)

@bpintea bpintea self-assigned this Feb 17, 2025
Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good. Left some minor comments.

  • Let's rename the PR and its description to reflect the impact on DATE_TRUNC. BUCKET is a side effect (despite the issue being about it).

  • Could we please add more CSV test with the new intervals, which should make it easy to check, in the lines of: ROW x = ["1970-04-10", "1970-04-11", "1970-04-12"]::DATETIME | MV_EXPAND x | EVAL y = DATE_TRUNC(2 days, x) and/or similar with ...| STATS COUNT(*) BY BUCKET(...) for the newly available intervals.

CC @not-napoleon this updates some utils used by the histogram aggs, should you want to have a look.

@bpintea
Copy link
Contributor

bpintea commented Feb 18, 2025

@elasticsearchmachine test this please

@kanoshiou kanoshiou changed the title ESQL: Support for arbitrary month and year intervals in BUCKET ESQL: Enhanced DATE_TRUNC with arbitrary intervals Feb 19, 2025
@kanoshiou
Copy link
Contributor Author

kanoshiou commented Mar 20, 2025

There are some failing tests in DateTruncTests, and I plan to address them later.

@kanoshiou
Copy link
Contributor Author

I believe this PR is now ready for review. It would be great if you could take a moment to look through it @bpintea !

@bpintea
Copy link
Contributor

bpintea commented Mar 26, 2025

buildkite test this

@kanoshiou
Copy link
Contributor Author

Sorry I forgot to run the CsvTests for another round of checks...

@bpintea
Copy link
Contributor

bpintea commented Mar 27, 2025

buildkite test this

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
It LGTM, but will ask for a review from other aggs-contributors, since the changes affect those too.

@not-napoleon
Copy link
Member

Hi, "other aggs-contributor" here. I am a little concerned about performance here. This adds a lot more math, especially in the month case, and the rounding logic often runs in very hot loops so we need to be careful about performance there. At a minimum, I'd like us to delegate the "multiplier" == 1 case to the original methods.

@bpintea we should make sure to keep an eye on the benchmarks after merging this too.

Also, it would be nice if you added some more comments about how the date math works. This stuff is always tricky, and any help to our future selves when we need to modify it would be nice. Thank you!

@kanoshiou
Copy link
Contributor Author

Thank you for reviewing, @bpintea @not-napoleon!

I've implemented some minor performance improvements and added detailed comments to clarify roundIntervalMonthOfYear and roundYearInterval.

@bpintea
Copy link
Contributor

bpintea commented Mar 31, 2025

buildkite test this

@bpintea
Copy link
Contributor

bpintea commented Mar 31, 2025

buildkite test this

@bpintea bpintea requested a review from not-napoleon March 31, 2025 15:20
Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This addresses my concerns from a classic aggregations perspective. And thank you for adding more explicit comments about the date math, future me will be very happy the next time I need to touch that code.

@bpintea
Copy link
Contributor

bpintea commented Apr 3, 2025

Thanks @kanoshiou for this contribution and your patience.

@bpintea bpintea merged commit 30b2a1f into elastic:main Apr 3, 2025
19 checks passed
andreidan pushed a commit to andreidan/elasticsearch that referenced this pull request Apr 9, 2025
Originally, `DATE_TRUNC` only supported 1-month and 3-month intervals for months, and 1-year interval for years, while arbitrary intervals were supported for weeks and days. This PR adds support for `DATE_TRUNC` with arbitrary month and year intervals. 

Closes elastic#120094
luigidellaquila pushed a commit to luigidellaquila/elasticsearch that referenced this pull request Jun 18, 2025
Originally, `DATE_TRUNC` only supported 1-month and 3-month intervals for months, and 1-year interval for years, while arbitrary intervals were supported for weeks and days. This PR adds support for `DATE_TRUNC` with arbitrary month and year intervals.

Closes elastic#120094
luigidellaquila added a commit that referenced this pull request Jun 19, 2025
Originally, `DATE_TRUNC` only supported 1-month and 3-month intervals for months, and 1-year interval for years, while arbitrary intervals were supported for weeks and days. This PR adds support for `DATE_TRUNC` with arbitrary month and year intervals.

Closes #120094

Co-authored-by: kanoshiou <73424326+kanoshiou@users.noreply.github.com>
@luigidellaquila
Copy link
Contributor

Manual backport: #129643

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: support for buckets larger than 1 year
7 participants