Skip to content

IndexError: index 0 is out of bounds for dimension 0 with size 0 #12104

@liushiton

Description

@liushiton

Describe the bug

When I test the mit-han-lab/nunchaku-flux.1-kontext-dev model, it runs normally in a non-concurrent scenario, but throws an error when I try to run it with concurrent requests.

My GPU is a single RTX 4090D.

How can I enable multi-concurrency support on a single GPU?

Thank you in advance for your help.

Here is my error message:

[2025-08-08 17:14:50.242] [info] Initializing QuantizedFluxModel on device 0
[2025-08-08 17:14:50.382] [info] Loading partial weights from pytorch
[2025-08-08 17:14:51.445] [info] Done.
Injecting quantized module
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 99.47it/s]
Loading pipeline components...: 57%|████████████████████████████████████████████████████████████████████████████████████████▌ | 4/7 [00:00<00:00, 28.54it/s]You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 19.02it/s]

Generation height and width have been adjusted to 752 and 1360 to fit the model requirements.
Generation height and width have been adjusted to 880 and 1168 to fit the model requirements.
43%|███████████████████████████████████████████████████████████████████████████████▎ | 12/28 [00:17<00:23, 1.45s/it]
57%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 16/28 [00:18<00:13, 1.17s/it]
处理图像时出错: index 29 is out of bounds for dimension 0 with size 29
处理图像时出错: index 29 is out of bounds for dimension 0 with size 29

Reproduction

import torch
from diffusers import FluxKontextPipeline
from diffusers.utils import load_image
from concurrent.futures import ThreadPoolExecutor

from nunchaku import NunchakuFluxTransformer2dModel
from nunchaku.utils import get_precision
import time

def get_result(image_path,pipeline):
    time_begin = time.time()
    image = load_image(
        image_path
    ).convert("RGB")
    size = image.size
    large_now = 1440
    small_now = round(1440 * (min(size)/max(size)) /32) * 32
    width,height = (large_now,small_now) \
        if size[0]>size[1] else (small_now,large_now)
    prompt = "Remove the watermark from the picture"
    image = pipeline(
        image=image,
        prompt=prompt,
        guidance_scale=2.5,
        num_inference_steps=28,
        height=height,
        width=width,
    ).images[0]
    image.save(image_path[:-4]+"_result.png")

def nunchaku_test(concurrency,pipeline):

    test_images = ["房型图水印.jpg", "卧室水印.png"] * concurrency
    test_images = test_images[:concurrency]  

    overall_start = time.time()

    with ThreadPoolExecutor(max_workers=concurrency) as executor:
        futures = [executor.submit(get_result, img_path, pipeline) for img_path in test_images]

        results = []
        for future in futures:
            try:
                results.append(future.result())
            except Exception as e:
                print(f"处理图像时出错: {e}")

    overall_time = time.time() - overall_start


if __name__ == '__main__':


    transformer = NunchakuFluxTransformer2dModel.from_pretrained(
        f"/root/autodl-tmp/nunchaku-flux.1-kontext-dev/svdq-{get_precision()}_r32-flux.1-kontext-dev.safetensors"
    )

    pipeline = FluxKontextPipeline.from_pretrained(
        "/root/autodl-tmp/FLUX.1-Kontext-dev", transformer=transformer, torch_dtype=torch.bfloat16
    ).to("cuda")

    nunchaku_test(pipeline,2)
    nunchaku_test(pipeline,4)

Logs

System Info

~/FLUX.1-Kontext-Dev-nunchaku# diffusers-cli env

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

  • 🤗 Diffusers version: 0.35.0.dev0
  • Platform: Linux-5.15.0-94-generic-x86_64-with-glibc2.35
  • Running on Google Colab?: No
  • Python version: 3.12.3
  • PyTorch version (GPU?): 2.6.0+cu124 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.33.1
  • Transformers version: 4.53.0
  • Accelerate version: 1.8.1
  • PEFT version: not installed
  • Bitsandbytes version: not installed
  • Safetensors version: 0.5.3
  • xFormers version: not installed
  • Accelerator: NVIDIA GeForce RTX 4090 D, 24564 MiB
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions