Skip to content

gh-137273: Fix debug assertion failure in locale.setlocale() on Windows #137300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 1, 2025

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Aug 1, 2025

It happened when there were at least 16 characters after dot in the locale name.

… Windows

It happened when there were at least 16 characters after dot in the
locale name.
Copy link
Member

@malemburg malemburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix should probably be limited to just a few Windows releases, unless Win11 also shows the same behavior.

size_t len = end ? (size_t)(end - locale) : strlen(locale);
const char *dot = memchr(locale, '.', len);
if (dot && locale + len - dot > 16) {
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you get the "16" from ?
Please include some comments to explain where you got it from and why this check is necessary. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW: It's better to make this limit configurable via a constant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comes from https://github.com/huangqinjin/ucrt/blob/d6e817a4cc90f6f1fe54f8a0aa4af4fff0bb647d/inc/corecrt_internal.h#L401, though I don't think that constant is available for us at compile time (the internal headers for the CRT are, well, internal).

A constant at least gives it a name though, and using MAX_CP_LEN may help someone find an updated definition in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now.

@bedevere-app
Copy link

bedevere-app bot commented Aug 1, 2025

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@serhiy-storchaka
Copy link
Member Author

How to check the Windows version? I do not have Windows 11, can you please test this PR on Windows 11?

@zooba
Copy link
Member

zooba commented Aug 1, 2025

How to check the Windows version?

I wouldn't bother, this limit has been around forever. The bigger challenge would be how to handle @<extras> if/when support is added, because right now there's no support for them in UCRT at all.

@malemburg
Copy link
Member

How to check the Windows version? I do not have Windows 11, can you please test this PR on Windows 11?

Sorry, but I don't have a Windows 11 build system available at the moment (not even a working Win10 system, after the VM crashed some time ago).

Let's not worry about this now. We can disable things again selectively, once the Windows SDK provides better support for handling locale names (perhaps in a few years).

@serhiy-storchaka
Copy link
Member Author

The bigger challenge would be how to handle @<extras> if/when support is added, because right now there's no support for them in UCRT at all.

I am planning to handle this in Python code. Translate "zh_HK.UTF-8@hans" to "zh-Hans-HK.utf-8", etc.

@malemburg
Copy link
Member

I am planning to handle this in Python code. Translate "zh_HK.UTF-8@hans" to "zh-Hans-HK.utf-8", etc.

Is that how Windows handles modifiers ?

@serhiy-storchaka
Copy link
Member Author

I have made the requested changes; please review again.

@bedevere-app
Copy link

bedevere-app bot commented Aug 1, 2025

Thanks for making the requested changes!

@malemburg: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from malemburg August 1, 2025 13:34
size_t len = end ? (size_t)(end - locale) : strlen(locale);
const char *dot = memchr(locale, '.', len);
if (dot && locale + len - dot > 16) {
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now.

@zooba
Copy link
Member

zooba commented Aug 1, 2025

Is that how Windows handles modifiers ?

No - Windows UCRT treats them as part of the code page name and overruns its own buffers, leading to a crash 😉

I don't think there's any support for modifiers in locales at all. The OS itself has always used a different system for those AFAICT, it's just the POSIX emulation in the CRT that's trying to fudge over it.

@malemburg
Copy link
Member

I don't think there's any support for modifiers in locales at all.

In that case, it's probably best to simply strip from them locale strings passed to setlocale() on Windows. Esp. since the meaning of those modifiers is not standardized anywhere, AFAIK.

@serhiy-storchaka
Copy link
Member Author

Is that how Windows handles modifiers ?

Yes. It supports format language[-script][-region][.codepage] for BCP-47 locale names.

I seen also sr_Latn_RS instead of sr_RS@latin on some of Posix platforms, so we should support this format not only on Windows. setlocale() should try several variants.

@serhiy-storchaka serhiy-storchaka merged commit 718e0c8 into python:main Aug 1, 2025
46 of 49 checks passed
@miss-islington-app
Copy link

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Aug 1, 2025
… Windows (pythonGH-137300)

It happened when there were at least 16 characters after dot in the
locale name.
(cherry picked from commit 718e0c8)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@serhiy-storchaka serhiy-storchaka deleted the locale-long-encoding branch August 1, 2025 14:44
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Aug 1, 2025
… Windows (pythonGH-137300)

It happened when there were at least 16 characters after dot in the
locale name.
(cherry picked from commit 718e0c8)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Aug 1, 2025

GH-137305 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Aug 1, 2025
@bedevere-app
Copy link

bedevere-app bot commented Aug 1, 2025

GH-137306 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Aug 1, 2025
serhiy-storchaka added a commit that referenced this pull request Aug 1, 2025
…n Windows (GH-137300) (GH-137306)

It happened when there were at least 16 characters after dot in the
locale name.
(cherry picked from commit 718e0c8)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@serhiy-storchaka
Copy link
Member Author

Thank you for your review, @malemburg and @zooba.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants