-
Notifications
You must be signed in to change notification settings - Fork 6.2k
Fix group offloading synchronization bug for parameter-only GroupModule's #12077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
cc @seed93, this seems to resolve many different tests I tried that were previously causing outputs to be different. Could you verify on your end if everything works well? Thanks 🤗 |
Let's quickly merge the cleaning PR so that it's easier to review this one :) |
Sorry, I am on a long vacation. Maybe I will try at night.-------- 原始邮件 --------发件人: Aryan ***@***.***>日期: 2025年8月6日周三 06:43收件人: huggingface/diffusers ***@***.***>抄送: seed93 ***@***.***>, Mention ***@***.***>主 题: Re: [huggingface/diffusers] Fix group offloading synchronization bug for parameter-only GroupModule's (PR #12077)a-r-r-o-w left a comment (huggingface/diffusers#12077)
cc @seed93, this seems to resolve many different tests I tried that were previously causing outputs to be different. Could you verify on your end if everything works well? Thanks 🤗
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ridiculously simple fix yet ridiculously critical heisenbug.
# If this group didn't onload itself, it means it was asynchronously onloaded by the | ||
# previous group. We need to synchronize the side stream to ensure parameters | ||
# are completely loaded to proceed with forward pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit): It would be beneficial to comment on the consequences of not performing this synchronization.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Sometimes the hardest of problems have the simplest solutions :) Failing tests are unrelated |
Fixes #11981.
Requires #11990 to be merged first.
code
Tested for 100 rounds with:
Testing with profiling is not helpful because the problem never shows up. See heisenbug thread: https://huggingface.slack.com/archives/C065E480NN9/p1754035222558869