Skip to content

Improve base-32 hash decoding performance with reverse map #13680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

avnik
Copy link

@avnik avnik commented Aug 3, 2025

Motivation

This PR replaces the linear search over nix32Chars with a constexpr-initialized reverse lookup table for decoding base-32 hashes. The changes include:

  • Defining nix32Chars as a constexpr char[].
  • Adding a constexpr std::array<unsigned char, 256> (reverseNix32Map) to map characters to their base-32 digit values at compile time.
  • Replacing the slow character search loop with a direct lookup using reverseNix32Map.
  • Removing std::once_flag/isBase32 logic in references.cc in favor of reverseNix32Map

This significantly improves performance and clarity of base-32 decoding logic, while preserving correctness.

@avnik avnik requested a review from edolstra as a code owner August 3, 2025 15:09
Copy link
Member

@Ericson2314 Ericson2314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! This is indeed much better: faster and clearer.

Just one small thing, then let's land it!

The changes include:

* Defining nix32Chars as a constexpr char[].
* Adding a constexpr std::array<unsigned char, 256> (reverseNix32Map) to map characters to their base-32 digit values at compile time.
* Replacing the slow character search loop with a direct lookup using reverseNix32Map.
* Removing std::once_flag/isBase32 logic in references.cc in favor of reverseNix32Map

Signed-off-by: Alexander V. Nikolaev <avn@avnik.info>
@avnik avnik force-pushed the avnik/fast-base32 branch from cad1d68 to 4bfc007 Compare August 3, 2025 16:20
@avnik avnik requested a review from Ericson2314 August 3, 2025 16:20
@avnik
Copy link
Author

avnik commented Aug 3, 2025

@Ericson2314 your suggestion applied

@xokdvium
Copy link
Contributor

xokdvium commented Aug 3, 2025

I wouldn't mind if we could add a benchmark for this, but this could be done in a follow-up.

@avnik
Copy link
Author

avnik commented Aug 3, 2025

@xokdvium also wouldn't be bad to add dedicated tests for both nix32 encoding/decoding

std::array<unsigned char, 256> map{};

for (size_t i = 0; i < map.size(); ++i)
map[i] = 0xFF; // invalid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this sentinel value could be given a name? Not a huge fan of hardcoded constants, since those don't tend to be self documenting.

Copy link
Contributor

@xokdvium xokdvium Aug 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively this could be an array mapping to std::optional and nullopt would be the sentinel, which should be also okay.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xokdvium any suggestiions for name of constant? my idea is invalid_hash_value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the sentinel value just be zero?

@xokdvium
Copy link
Contributor

xokdvium commented Aug 3, 2025

wouldn't be bad to add dedicated tests for both nix32 encoding/decoding

Absolutely. If you'd like you could handle it in a separate PR. Any help with the test coverage would be greatly appreciated.

@Ericson2314
Copy link
Member

Note re #13682 that this PR is not supposed to affect how reference scanning works. It just affects how the parsing of hashes themselves (a Hash constructor) works, and then deduplicates the two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants