Skip to content

About the attention implementation with torch < 2.0 #3207

@tyshiwo1

Description

@tyshiwo1

Describe the bug

I tried to run train_unconditional.py with torch 1.12.1 but failed.
The bug seems in https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py#L94

I add a code batch_size = batch_size // head_size right after L90, and the program seems to work well for now. But I'm not sure whether there are other bugs related to the previous torch versions.

Reproduction

https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py#L94

Logs

File "/data/diffusers/src/diffusers/models/attention.py", line 97, in reshape_batch_dim_to_heads
    tensor = tensor.permute(0, 2, 1, 3).reshape(batch_size, seq_len, dim * head_size)
RuntimeError: shape '[1024, 16, 512]' is invalid for input of size 131072

System Info

torch 1.12.1+cu113

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions