IndexError: index 0 is out of bounds for dimension 0 with size 0

### Describe the bug


When I test the mit-han-lab/nunchaku-flux.1-kontext-dev model, it runs normally in a non-concurrent scenario, but throws an error when I try to run it with concurrent requests.

My GPU is a single RTX 4090D.

How can I enable multi-concurrency support on a single GPU?

Thank you in advance for your help.


Here is my error message:

[2025-08-08 17:14:50.242] [info] Initializing QuantizedFluxModel on device 0
[2025-08-08 17:14:50.382] [info] Loading partial weights from pytorch
[2025-08-08 17:14:51.445] [info] Done.
Injecting quantized module
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 99.47it/s]
Loading pipeline components...:  57%|████████████████████████████████████████████████████████████████████████████████████████▌                                                                  | 4/7 [00:00<00:00, 28.54it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 19.02it/s]

Generation `height` and `width` have been adjusted to 752 and 1360 to fit the model requirements.
Generation `height` and `width` have been adjusted to 880 and 1168 to fit the model requirements.
 43%|███████████████████████████████████████████████████████████████████████████████▎                                                                                                         | 12/28 [00:17<00:23,  1.45s/it]
 57%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                               | 16/28 [00:18<00:13,  1.17s/it]
处理图像时出错: index 29 is out of bounds for dimension 0 with size 29
处理图像时出错: index 29 is out of bounds for dimension 0 with size 29



### Reproduction

```
import torch
from diffusers import FluxKontextPipeline
from diffusers.utils import load_image
from concurrent.futures import ThreadPoolExecutor

from nunchaku import NunchakuFluxTransformer2dModel
from nunchaku.utils import get_precision
import time

def get_result(image_path,pipeline):
    time_begin = time.time()
    image = load_image(
        image_path
    ).convert("RGB")
    size = image.size
    large_now = 1440
    small_now = round(1440 * (min(size)/max(size)) /32) * 32
    width,height = (large_now,small_now) \
        if size[0]>size[1] else (small_now,large_now)
    prompt = "Remove the watermark from the picture"
    image = pipeline(
        image=image,
        prompt=prompt,
        guidance_scale=2.5,
        num_inference_steps=28,
        height=height,
        width=width,
    ).images[0]
    image.save(image_path[:-4]+"_result.png")

def nunchaku_test(concurrency,pipeline):

    test_images = ["房型图水印.jpg", "卧室水印.png"] * concurrency
    test_images = test_images[:concurrency]  

    overall_start = time.time()

    with ThreadPoolExecutor(max_workers=concurrency) as executor:
        futures = [executor.submit(get_result, img_path, pipeline) for img_path in test_images]

        results = []
        for future in futures:
            try:
                results.append(future.result())
            except Exception as e:
                print(f"处理图像时出错: {e}")

    overall_time = time.time() - overall_start


if __name__ == '__main__':


    transformer = NunchakuFluxTransformer2dModel.from_pretrained(
        f"/root/autodl-tmp/nunchaku-flux.1-kontext-dev/svdq-{get_precision()}_r32-flux.1-kontext-dev.safetensors"
    )

    pipeline = FluxKontextPipeline.from_pretrained(
        "/root/autodl-tmp/FLUX.1-Kontext-dev", transformer=transformer, torch_dtype=torch.bfloat16
    ).to("cuda")

    nunchaku_test(pipeline,2)
    nunchaku_test(pipeline,4)
```

### Logs

```shell

```

### System Info

~/FLUX.1-Kontext-Dev-nunchaku# diffusers-cli env

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

- 🤗 Diffusers version: 0.35.0.dev0
- Platform: Linux-5.15.0-94-generic-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.12.3
- PyTorch version (GPU?): 2.6.0+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.33.1
- Transformers version: 4.53.0
- Accelerate version: 1.8.1
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.5.3
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 4090 D, 24564 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IndexError: index 0 is out of bounds for dimension 0 with size 0 #12104

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

IndexError: index 0 is out of bounds for dimension 0 with size 0 #12104

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions