Skip to content

Precompute the BitsetCacheKey hashCode, and account for the size of the key #132418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

joegallo
Copy link
Contributor

@joegallo joegallo commented Aug 4, 2025

If you trace the org.elasticsearch.common.cache.Cache, you can see that it uses the hashCode of the key object pretty extenstively (see some of the discussion on #96050). For example, in a cache.evictEntry(entry) call, we first hashCode to get the right segment, then within the segment we hashCode to do a get, then finally we do a hashCode in the remove. So that's three hashCode calls to remove an entry. And the latter two are done while holding the writeLock. This isn't the be-all-end-all of performance, but it's a tidy little speedup, and it's easy to write.

In the same area, there's a minor bug around the handling of the NULL_MARKER in the bitset cache. While the NULL_MARKER bitset is shared, and so it's fair to count the size of the object as zero, the entries in the cache still take up space because of the size of the keys. Since we're O(segments * queries), it could be the case that many many entries in the cache that could be pointing to the NULL_MARKER, so let's not consider them to be completely free in terms of memory consumption. (Note: I didn't bother to update the logic around logging a warning if a bitset is created that won't fit in the cache -- so technically we could be off by 24 bytes there. I can change that if you'd prefer we remain technically correct.)

As with #132416, I've added @tvernum as a reviewer just as an FYI to him that this PR exists, I don't actually need his +1 specifically (anybody the @elastic/es-security team is fine by me).

This actually matters, because while the hashCode for a BooleanQuery
is already cached, it's not eagerly computed. Pre-computing the
hashCode moves the initial calculation of the hashCode outside of
anywhere that we're holding a lock.
An entry in the cache can be small, but it's not fair to consider it
to be *zero* (in extremely bad luck scenarios, we can end up with a
lot of NULL_MARKER entries, so let's not treat them as *free*).
@joegallo joegallo added >bug :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team v9.2.0 v9.1.1 v8.19.1 v9.0.5 labels Aug 4, 2025
@joegallo joegallo requested review from tvernum and a team August 4, 2025 19:40
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@elasticsearchmachine
Copy link
Collaborator

Hi @joegallo, I've created a changelog YAML for you.

@joegallo joegallo added the auto-backport Automatically create backport pull requests when merged label Aug 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team v8.19.1 v9.0.5 v9.1.1 v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants