enable compilation in qwen image. #12061

sayakpaul · 2025-08-04T10:18:04Z

What does this PR do?

Adds tests for the Qwen transformer model tests.
Enable compilation with triggering recompilations.

Timing (compilation) gathered from an H100:

PR branch: timings.mean()=tensor(42.8954)
tests/qwen-image: timings.mean()=tensor(75.9385)

Code

from diffusers import DiffusionPipeline
import torch
import time

pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image", torch_dtype=torch.bfloat16).to("cuda")
# pipe.load_lora_weights("trained-qwen-image-lora")
pipe.transformer.compile(fullgraph=True)

timings = []
for _ in range(3):
    start = time.time()
    image = pipe(
        "a 3dicon, a llama with a signboard saying 'Qwen is awesome'", 
        guidance_scale=1.0, 
        num_inference_steps=50,

    ).images[0]
    end = time.time()
    timings.append(end - start)

timings = torch.tensor(timings)
print(f"{timings.mean()=}")
image.save("llama_pretrained.png")

sayakpaul · 2025-08-04T10:18:25Z

src/diffusers/models/transformers/transformer_qwenimage.py

-        if self.pos_freqs.device != device:
-            self.pos_freqs = self.pos_freqs.to(device)
-            self.neg_freqs = self.neg_freqs.to(device)
-


Recompilation trigger one.

sayakpaul · 2025-08-04T10:18:51Z

src/diffusers/models/transformers/transformer_qwenimage.py

        if isinstance(video_fhw, list):
            video_fhw = video_fhw[0]
        frame, height, width = video_fhw
        rope_key = f"{frame}_{height}_{width}"

-        if rope_key not in self.rope_cache:


Recompilation trigger two.

sayakpaul · 2025-08-04T10:19:15Z

tests/models/test_modeling_common.py

+        if self.model_class.__name__ == "QwenImageTransformer2DModel":
+            pytest.skip(
+                "QwenImageTransformer2DModel doesn't support group offloading with disk. Needs to be investigated."
+            )
+


Will investigate in a follow-up.

HuggingFaceDocBuilderDev · 2025-08-04T10:25:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w · 2025-08-04T11:38:11Z

@sayakpaul Could you rebase with main? Sorry I didn't see this before stacked over tests/qwen-image

sayakpaul · 2025-08-04T11:40:06Z

Done!

a-r-r-o-w · 2025-08-04T11:41:29Z

src/diffusers/models/transformers/transformer_qwenimage.py

@@ -236,6 +223,25 @@ def forward(self, video_fhw, txt_seq_lens, device):

        return vid_freqs, txt_freqs

+    @functools.lru_cache(maxsize=None)
+    def _compute_video_freqs(self, frame, height, width):


TODO: we need to remove frame (can be done in future PR)

a-r-r-o-w · 2025-08-04T11:45:17Z

src/diffusers/models/transformers/transformer_qwenimage.py

@@ -236,6 +223,25 @@ def forward(self, video_fhw, txt_seq_lens, device):

        return vid_freqs, txt_freqs

+    @functools.lru_cache(maxsize=None)


Let's remove the self.rope_cache and just use the lru_cache implementation? WDYT @yiyixuxu?

WDYT about maybe putting maxsize=128 or something here so that long running services that use diffusers don't accidentally die with OOM (probably very unlikely though) @sayakpaul?

maxsize=128 sounds reasonable to me.

a-r-r-o-w · 2025-08-04T11:47:00Z

src/diffusers/models/transformers/transformer_qwenimage.py

@@ -179,6 +180,8 @@ def __init__(self, theta: int, axes_dim: List[int], scale_rope=False):
            dim=1,
        )
        self.rope_cache = {}
+        self.register_buffer("pos_freqs", pos_freqs, persistent=False)


This is most likely not equivalent. When registered as buffer, if the model is loaded in bf16, the precision of these will bf16 instead of fp32. Doing RoPE in bf16 may harm image quality, so we need to be careful here. Not sure what's best to do here -- maybe for now we can put the rope layer in _keep_modules_in_fp32?

This recompilation related problem seems to have become too common with RoPE. Maybe we need to rethink the design a bit.

Just for the record, sharing the recompilation error we get without the buffer implementation:

> raise exc.RecompileError(message) E torch._dynamo.exc.RecompileError: Recompiling function forward in /fsx/sayak/diffusers/src/diffusers/models/transformers/transformer_qwenimage.py:529 E triggered by the following guard failure(s): E - 0/0: tensor 'self._modules['pos_embed'].neg_freqs' dispatch key set mismatch. expected DispatchKeySet(CPU, BackendSelect, ADInplaceOrView, AutogradCPU), actual DispatchKeySet(CUDA, BackendSelect, ADInplaceOrView, AutogradCUDA) ../miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/_dynamo/guards.py:3822: RecompileError

But I agree with your first point.

a-r-r-o-w and others added 4 commits August 4, 2025 06:39

update

ee89f79

update

5620f87

update

365b8c1

enable compilation in qwen image.

dfc6018

sayakpaul requested a review from a-r-r-o-w August 4, 2025 10:18

sayakpaul commented Aug 4, 2025

View reviewed changes

Base automatically changed from tests/qwen-image to main August 4, 2025 10:58

Merge branch 'main' into enable-compilation

431cd77

a-r-r-o-w reviewed Aug 4, 2025

View reviewed changes

sayakpaul requested a review from yiyixuxu August 4, 2025 13:50

add tests

6a5fcec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enable compilation in qwen image. #12061

enable compilation in qwen image. #12061

sayakpaul commented Aug 4, 2025 •

edited

Loading

Uh oh!

sayakpaul Aug 4, 2025

Uh oh!

sayakpaul Aug 4, 2025

Uh oh!

sayakpaul Aug 4, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Aug 4, 2025

Uh oh!

a-r-r-o-w commented Aug 4, 2025

Uh oh!

sayakpaul commented Aug 4, 2025

Uh oh!

a-r-r-o-w Aug 4, 2025

Uh oh!

a-r-r-o-w Aug 4, 2025

Uh oh!

sayakpaul Aug 4, 2025

Uh oh!

a-r-r-o-w Aug 4, 2025

Uh oh!

sayakpaul Aug 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

		@@ -236,6 +223,25 @@ def forward(self, video_fhw, txt_seq_lens, device):

		return vid_freqs, txt_freqs

		@functools.lru_cache(maxsize=None)

enable compilation in qwen image. #12061

Are you sure you want to change the base?

enable compilation in qwen image. #12061

Conversation

sayakpaul commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

sayakpaul Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Aug 4, 2025

Uh oh!

a-r-r-o-w commented Aug 4, 2025

Uh oh!

sayakpaul commented Aug 4, 2025

Uh oh!

a-r-r-o-w Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sayakpaul commented Aug 4, 2025 •

edited

Loading

sayakpaul Aug 4, 2025 •

edited

Loading