Attention masking in Chroma pipeline

### Describe the bug

There is an issue with attention masking in the Chroma pipeline. With the prompt in your example here, https://huggingface.co/docs/diffusers/main/api/pipelines/chroma the difference is not very large, probably because there are enough meaningful tokens with some weight.

But short prompts fail because of incorrect masking.

Below are 3 sample pairs: first pair is the prompt in your example, second and third one is just "man" as positive prompt (negative prompt unchanged). First sample each with current code, second sample with correct masking.

The issue is the data type of the attention mask and how it is interpreted. It's created as a floating point mask, which is fine for its first use in T5: https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5ForSequenceClassification.forward.attention_mask

There, 1.0 is not masked and 0.0 is masked.

However it is then passed to the transformer in the same dtype. This attention mask eventually ends up at https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html

There, 0.0 is not masked and -inf is masked.

The good samples below I've generated by changing the type here to torch.bool:
https://github.com/huggingface/diffusers/blob/03c3f69aa57a6cc2c995d41ea484195d719a240a/src/diffusers/pipelines/chroma/pipeline_chroma.py#L586

But a good solution should probably directly convert the tokenizer output to bool, not take the detour via a floating point type.

<img width="1024" height="1024" alt="Image" src="https://github.com/user-attachments/assets/397f3d7a-c884-43ee-80e9-4d48004da5fa" />
<img width="1024" height="1024" alt="Image" src="https://github.com/user-attachments/assets/49b19052-f5b8-47ca-933d-e89195351432" />
---
<img width="1024" height="1024" alt="Image" src="https://github.com/user-attachments/assets/674c3f75-ea87-4694-9dbe-84831052ff15" />
<img width="1024" height="1024" alt="Image" src="https://github.com/user-attachments/assets/f10543c3-d41c-4d63-93ee-7a21b3c5c4c7" />
---
<img width="1024" height="1024" alt="Image" src="https://github.com/user-attachments/assets/ce1769dc-024d-4385-b7f9-3ecf710cbaa9" />

<img width="1024" height="1024" alt="Image" src="https://github.com/user-attachments/assets/cf0d6698-a757-4bf0-b818-6ed2e1c14418" />

### Reproduction

run this example, but with a short prompt:
https://huggingface.co/docs/diffusers/main/api/pipelines/chroma


### Logs

```shell

```

### System Info

diffusers HEAD, python 3.11.11

### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Attention masking in Chroma pipeline #12116

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Attention masking in Chroma pipeline #12116

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions