-
-
Notifications
You must be signed in to change notification settings - Fork 32.5k
gh-137273: Fix debug assertion failure in locale.setlocale() on Windows #137300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-137273: Fix debug assertion failure in locale.setlocale() on Windows #137300
Conversation
… Windows It happened when there were at least 16 characters after dot in the locale name.
bbff466
to
c6dbeb8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix should probably be limited to just a few Windows releases, unless Win11 also shows the same behavior.
size_t len = end ? (size_t)(end - locale) : strlen(locale); | ||
const char *dot = memchr(locale, '.', len); | ||
if (dot && locale + len - dot > 16) { | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did you get the "16" from ?
Please include some comments to explain where you got it from and why this check is necessary. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: It's better to make this limit configurable via a constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comes from https://github.com/huangqinjin/ucrt/blob/d6e817a4cc90f6f1fe54f8a0aa4af4fff0bb647d/inc/corecrt_internal.h#L401, though I don't think that constant is available for us at compile time (the internal headers for the CRT are, well, internal).
A constant at least gives it a name though, and using MAX_CP_LEN
may help someone find an updated definition in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now.
When you're done making the requested changes, leave the comment: |
How to check the Windows version? I do not have Windows 11, can you please test this PR on Windows 11? |
I wouldn't bother, this limit has been around forever. The bigger challenge would be how to handle |
Sorry, but I don't have a Windows 11 build system available at the moment (not even a working Win10 system, after the VM crashed some time ago). Let's not worry about this now. We can disable things again selectively, once the Windows SDK provides better support for handling locale names (perhaps in a few years). |
I am planning to handle this in Python code. Translate "zh_HK.UTF-8@hans" to "zh-Hans-HK.utf-8", etc. |
Is that how Windows handles modifiers ? |
I have made the requested changes; please review again. |
Thanks for making the requested changes! @malemburg: please review the changes made to this pull request. |
size_t len = end ? (size_t)(end - locale) : strlen(locale); | ||
const char *dot = memchr(locale, '.', len); | ||
if (dot && locale + len - dot > 16) { | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now.
No - Windows UCRT treats them as part of the code page name and overruns its own buffers, leading to a crash 😉 I don't think there's any support for modifiers in locales at all. The OS itself has always used a different system for those AFAICT, it's just the POSIX emulation in the CRT that's trying to fudge over it. |
In that case, it's probably best to simply strip from them locale strings passed to |
Yes. It supports format I seen also |
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14. |
… Windows (pythonGH-137300) It happened when there were at least 16 characters after dot in the locale name. (cherry picked from commit 718e0c8) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
… Windows (pythonGH-137300) It happened when there were at least 16 characters after dot in the locale name. (cherry picked from commit 718e0c8) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
GH-137305 is a backport of this pull request to the 3.14 branch. |
GH-137306 is a backport of this pull request to the 3.13 branch. |
Thank you for your review, @malemburg and @zooba. |
It happened when there were at least 16 characters after dot in the locale name.