PAG variant for AnimateDiff by a-r-r-o-w · Pull Request #8789 · huggingface/diffusers

a-r-r-o-w · 2024-07-04T06:22:27Z

What does this PR do?

Looking at #8710, I thought it might interesting to apply PAG to video generation pipelines, and see if there's interest in supporting this.

In addition to this, I would also like to propose the addition of AutoPipelineForTextToVideo since we support a few video models now, and this will continue to grow with ongoing research progress. WDYT?

Code

import torch
from diffusers import AnimateDiffPipeline, MotionAdapter, DDIMScheduler
from diffusers.pipelines.pag.pipeline_pag_sd_animatediff import AnimateDiffPAGPipeline
from diffusers.utils import export_to_gif


# model_id = "runwayml/stable-diffusion-v1-5"
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
motion_adapter_id = "guoyww/animatediff-motion-adapter-v1-5-2"

prompt = "car, futuristic cityscape with neon lights, street, no human"
negative_prompt = "low quality, bad quality"
num_inference_steps = 25
guidance_scale = 6
pag_scale = 3.0

motion_adapter = MotionAdapter.from_pretrained(motion_adapter_id)
scheduler = DDIMScheduler.from_pretrained(model_id, subfolder="scheduler", beta_schedule="linear", steps_offset=1, clip_sample=False)
pipe = AnimateDiffPAGPipeline.from_pretrained(
    model_id,
    motion_adapter=motion_adapter,
    scheduler=scheduler,
    pag_applied_layers=[],
    torch_dtype=torch.float16,
).to("cuda")

configs = [
    dict(pag_scale=0.0, clip_skip=None, free_init=False),
    dict(pag_scale=0.0, clip_skip=2, free_init=False),
    dict(pag_scale=3.0, clip_skip=None, free_init=False),
    dict(pag_scale=3.0, clip_skip=2, free_init=False),
    dict(pag_scale=3.0, clip_skip=None, free_init=True),
    dict(pag_scale=3.0, clip_skip=2, free_init=True),
    dict(pag_scale=0.5, clip_skip=None, free_init=False),
    dict(pag_scale=0.5, clip_skip=2, free_init=False),
    dict(pag_scale=0.5, clip_skip=None, free_init=True),
    dict(pag_scale=0.5, clip_skip=2, free_init=True),
    dict(pag_scale=5.0, clip_skip=None, free_init=False),
    dict(pag_scale=5.0, clip_skip=2, free_init=False),
    dict(pag_scale=5.0, clip_skip=None, free_init=True),
    dict(pag_scale=5.0, clip_skip=2, free_init=True),
]

for config in configs:
    free_init = config.pop("free_init", False)
    if free_init:
        pipe.enable_free_init(method="butterworth", use_fast_sampling=True)
    
    video = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        height=512,
        width=512,
        num_frames=16,
        guidance_scale=guidance_scale,
        num_inference_steps=num_inference_steps,
        generator=torch.Generator().manual_seed(42),
        **config,
    ).frames[0]

    if free_init:
        pipe.disable_free_init()

    export_to_gif(video, f"animatediff_pag-{config['pag_scale']}_clipskip-{config['clip_skip']}_freeinit-{config['free_init']}.gif")

motion_adapter = MotionAdapter.from_pretrained(motion_adapter_id)
scheduler = DDIMScheduler.from_pretrained(model_id, subfolder="scheduler", beta_schedule="linear", steps_offset=1, clip_sample=False)
pipe = AnimateDiffPAGPipeline.from_pretrained(
    model_id,
    motion_adapter=motion_adapter,
    scheduler=scheduler,
    pag_applied_layers=["mid"],
    torch_dtype=torch.float16,
).to("cuda")

for config in configs:
    free_init = config.pop("free_init", False)
    if free_init:
        pipe.enable_free_init(method="butterworth", use_fast_sampling=True)
    
    video = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        height=512,
        width=512,
        num_frames=16,
        guidance_scale=guidance_scale,
        num_inference_steps=num_inference_steps,
        generator=torch.Generator().manual_seed(42),
        **config,
    ).frames[0]

    if free_init:
        pipe.disable_free_init()

    export_to_gif(video, f"animatediff_pag-mid-{config['pag_scale']}_clipskip-{config['clip_skip']}_freeinit-{config['free_init']}.gif")

pag 0, clip_skip None, free_init False	pag 0, clip_skip 2, free_init False

pag 3, clip_skip None, free_init False	pag 3, clip_skip 2, free_init False

pag 3, clip_skip None, free_init True	pag 3, clip_skip 2, free_init True

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yiyixuxu @asomoza @DN6

asomoza · 2024-07-04T06:32:25Z

Nice! I'll do some tests. It looks good.

HuggingFaceDocBuilderDev · 2024-07-05T06:54:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

DN6

It's looking good to me. Could we add some tests and I think there are some issues with the copied from statement. I think you'd just need to run make fix-copies

yiyixuxu

thanks! super cool!

a-r-r-o-w · 2024-07-11T22:44:15Z

                    in_channels=out_channels,
                    num_layers=temporal_transformer_layers_per_block[i],
-                    norm_num_groups=temporal_norm_num_groups,
+                    norm_num_groups=resnet_groups,


Context for why we need this change: #7707 (comment). It is the correct thing to do here. cc @DN6

a-r-r-o-w · 2024-07-11T22:44:49Z

-                return f"attentions_{module_name.split('.')[3]}"
-            elif "attentions" in module_name.split(".")[1]:
-                return f"attentions_{module_name.split('.')[2]}"
+            # down_blocks.1.motion_modules.0.transformer_blocks.0.attn1 -> "motion_modules_0"


Need to support motion modules self attention layers as well in PAGMixin. cc @yiyixuxu

nice, let's add a note here too https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pag/pag_utils.py#L129

a-r-r-o-w · 2024-07-11T22:46:07Z

            # down_blocks.1.attentions.0.transformer_blocks.0.attn1 -> "block_1"
            # mid_block.attentions.0.transformer_blocks.0.attn1 -> "block_0"
-            if "attentions" in module_name.split(".")[1]:
+            module_name_splits = module_name.split(".")


Did a little bit of a refactor as well to make all functions look similar-ish to get_attn_index. Can open a separate PR if this is out of scope. cc yiyixuxu

ok to include here!

a-r-r-o-w · 2024-07-11T22:47:13Z

+        }
+        return inputs
+
+    def test_from_pipe_consistent_config(self):


it is probably because we didn't handle the deprecated unet config in __init__
related here: #7564

because this pipeline shares the same checkpoints with sd1.5, technically you need to handle the deprecation too even though it is a new pipeline

diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

Line 192 in 3b37fef

if hasattr(scheduler.config, "steps_offset") and scheduler.config.steps_offset != 1:

however in practise, I think maybe no one will use these really old checkpoints on animate diff, so I'm ok to skip the test here

a-r-r-o-w · 2024-07-11T22:53:43Z

+    def get_dummy_components(self):
+        cross_attention_dim = 8
+        block_out_channels = (8, 8)
+
+        torch.manual_seed(0)
+        unet = UNet2DConditionModel(
+            block_out_channels=block_out_channels,
+            layers_per_block=1,
+            sample_size=8,
+            in_channels=4,
+            out_channels=4,
+            down_block_types=("CrossAttnDownBlock2D", "DownBlock2D"),
+            up_block_types=("CrossAttnUpBlock2D", "UpBlock2D"),
+            cross_attention_dim=cross_attention_dim,
+            norm_num_groups=2,
+        )


@DN6 Might be of interest to you for AnimateDiff 👀 Makes the tests much faster!

For test_animatediff.py:

real 2m38,020s user 17m37,497s sys 1m59,702s

For PAG AnimateDiff with new dummy component sizes:

real 1m36,838s user 1m43,263s sys 0m29,261s

a-r-r-o-w · 2024-07-16T15:34:42Z

After giving some more thought and observing the outputs, I feel the perturbed path for motion model seems to have either a lesser impact or negative impact on the generations. PAG works great for spatial self attention layers, that of text to image models, as we've seen from many other pipelines, however I think more experiments are needed when dealing with temporal layers like the attn in motion model. Will report back with some experiments soon.

a-r-r-o-w · 2024-07-16T15:41:53Z

Btw if we're okay with merging this for the generation improvements with just the spatial attn PAG processors, we can go ahead with that. Can continue experimenting with the motion models separately and revert the pag_utils changes here.

yiyixuxu

thanks!

yiyixuxu · 2024-07-16T21:24:37Z

            # down_blocks.1.attentions.0.transformer_blocks.0.attn1 -> "block_1"
            # mid_block.attentions.0.transformer_blocks.0.attn1 -> "block_0"
-            if "attentions" in module_name.split(".")[1]:
+            module_name_splits = module_name.split(".")


ok to include here!

yiyixuxu · 2024-07-16T21:27:36Z

-                return f"attentions_{module_name.split('.')[3]}"
-            elif "attentions" in module_name.split(".")[1]:
-                return f"attentions_{module_name.split('.')[2]}"
+            # down_blocks.1.motion_modules.0.transformer_blocks.0.attn1 -> "motion_modules_0"


nice, let's add a note here too https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pag/pag_utils.py#L129

yiyixuxu · 2024-07-16T21:43:49Z

+        }
+        return inputs
+
+    def test_from_pipe_consistent_config(self):


it is probably because we didn't handle the deprecated unet config in __init__
related here: #7564

because this pipeline shares the same checkpoints with sd1.5, technically you need to handle the deprecation too even though it is a new pipeline

diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

Line 192 in 3b37fef

if hasattr(scheduler.config, "steps_offset") and scheduler.config.steps_offset != 1:

however in practise, I think maybe no one will use these really old checkpoints on animate diff, so I'm ok to skip the test here

yiyixuxu · 2024-07-16T21:50:51Z

Btw if we're okay with merging this for the generation improvements with just the spatial attn PAG processors, we can go ahead with that. Can continue experimenting with the motion models separately and revert the pag_utils changes here.

what do you mean here? do you mean applying PAG only on the motion modules does not generate good results? e.g. pag_applied_layers = ["down.block_1.motion_modules_0"], if you do something like "mid", it will apply on both temporal and spatial, no? would the results be same as spatial only? can we see some results

a-r-r-o-w · 2024-07-16T22:01:28Z

what do you mean here? do you mean applying PAG only on the motion modules does not generate good results? e.g. pag_applied_layers = ["down.block_1.motion_modules_0"], if you do something like "mid", it will apply on both temporal and spatial, no? would the results be same as spatial only? can we see some results

Thanks for the review! So, I added the changes for PAG to work with motion modules self attn (temporal layers) later in this commit). The results posted in the PR description utilize PAG in only the spatial layers. After my latest changes, I prefer the outputs of just spatial as opposed to spatial and temporal PAG from the few quick experiments I tried. I will soon look into it more thoroughly and report back. cc @asomoza

a-r-r-o-w · 2024-07-16T22:17:20Z

I took a quick look at the Comfy implementation earlier today to see what Kosinkadink was using: https://github.com/Kosinkadink/ComfyUI-AnimateDiff-Evolved/blob/4dd592e9fce9ac59edadee40cf4d2069165dc226/animatediff/cfg_extras.py#L59

It is being applied to both spatial and temporal attn1 layers, which is what we have at the moment too, so I'm okay to roll with this and prepare for merge. But, I think we can investigate a bit further eventually into the dynamics of PAG with temporal layers (perhaps community issue with advanced label @yiyixuxu?). Also found unofficial implementations of the latest PAGMixin being used in different ways which makes this interesting (example: https://github.com/pixeli99/Spatio-Temporal-Shuffle-Guidance; demos in README).

yiyixuxu · 2024-07-16T22:29:17Z

The results posted in the PR description utilize PAG in only the spatial layers

I think in the PR description code, you used pag_applied_layers="mid" - that means it's applying to both spatial and temporal layers, no?

the commit you added here in this commit allows you to apply PAG to ONLY temporal layers, that does not yield good results - do I understand this correctly?

a-r-r-o-w · 2024-07-16T22:34:38Z

I think in the PR description code, you used pag_applied_layers="mid" - that means it's applying to both spatial and temporal layers, no?

Correct. With the latest commits, using pag_applied_layers="mid" will apply it to both spatial and temporal midblock layer.

What I'm trying to point out is that the demos that you see in the description are the ones with only spatial PAG because I had not implemented it for motion models yet then.

the commit you added here in this commit allows you to apply PAG to ONLY temporal layers, that does not yield good results - do I understand this correctly?

Nope. Because of that commit, PAG applies to BOTH spatial and temporal layers. I preferred the output of what was before that commit (that is ONLY spatial). Hope it makes sense now 😅

TLDR;

ONLY Spatial PAG -> Good (the results posted in this PR in description)
Spatial and Temporal PAG -> I prefer the quality of ONLY spatial a little better but done limited testing so far. This is the current behaviour if you clone this branch. Okay to roll with it because ComfyUI behaviour is similar.
ONLY Temporal PAG -> Didn't try this and did not mention about it.

yiyixuxu · 2024-07-16T22:59:08Z

you can see that here

diffusers/src/diffusers/pipelines/pag/pag_utils.py

Line 112 in 3b37fef

for name, module in self.unet.named_modules():

when pass pag_applied_layers="mid", it apply PAG to all self-attentions with name starting with "mid", that will include both spatial and temporal, with or without the commit

yiyixuxu · 2024-07-16T23:03:47Z

ohh i see now, we needed the change applied to get_block_index

a-r-r-o-w · 2024-07-16T23:05:21Z

reply to #8789 (comment)

maybe that is correct, but when I was debugging, I did not notice any of the motion module self attn layers being set, and that was one reason for adding my changes. The other reason being more fine grained control to do things like mid_block_0.motion_modules.... Let me try reverting my changes and verifying why they were needed first thing I'm awake next.

yiyixuxu · 2024-07-16T23:09:09Z

ok, we can merge this PR as it is consistent with comfy

a-r-r-o-w · 2024-07-17T20:12:01Z

Looking good to merge from my end after we merge #8846 since this change is out of scope of the PR.

a-r-r-o-w · 2024-07-22T21:37:37Z

Fixed the broken tests here. This looks good to merge after #8846 which is now complete too

…test

* add animatediff pag pipeline * remove unnecessary print * make fix-copies * fix ip-adapter bug * update docs * add fast tests and fix bugs * update * update * address review comments * update ip adapter single test expected slice * implement test_from_pipe_consistent_config; fix expected slice values * LoraLoaderMixin->StableDiffusionLoraLoaderMixin; add latest freeinit test

a-r-r-o-w added 2 commits July 4, 2024 05:23

add animatediff pag pipeline

1be9e52

Merge branch 'main' into animatediff/pag

ab64a42

a-r-r-o-w changed the title ~~Experimenting with PAG variant for AnimateDiff~~ PAG variant for AnimateDiff Jul 4, 2024

DN6 reviewed Jul 5, 2024

View reviewed changes

DN6 requested a review from yiyixuxu July 5, 2024 07:28

yiyixuxu approved these changes Jul 5, 2024

View reviewed changes

Comment thread src/diffusers/models/attention_processor.py Outdated

a-r-r-o-w added 9 commits July 9, 2024 14:54

remove unnecessary print

34ce60d

make fix-copies

2c4d60d

fix ip-adapter bug

4e0caab

Merge branch 'main' into animatediff/pag

6e84225

update docs

c4962da

Merge branch 'main' into animatediff/pag

ae48e42

add fast tests and fix bugs

8265d09

update

d4f9728

update

4a52aaf

a-r-r-o-w commented Jul 11, 2024

View reviewed changes

a-r-r-o-w mentioned this pull request Jul 11, 2024

[tests] speed up animatediff tests #8846

Merged

yiyixuxu approved these changes Jul 16, 2024

View reviewed changes

a-r-r-o-w added 4 commits July 17, 2024 21:26

Merge branch 'main' into animatediff/pag

5c88e16

address review comments

3dadb67

update ip adapter single test expected slice

dc0aa71

Merge branch 'main' into animatediff/pag

62289dc

a-r-r-o-w added 2 commits July 22, 2024 20:59

Merge branch 'main' into animatediff/pag

8f87a03

implement test_from_pipe_consistent_config; fix expected slice values

bb0d061

a-r-r-o-w added 3 commits July 25, 2024 17:09

Merge branch 'main' into animatediff/pag

22f4c14

Merge branch 'main' into animatediff/pag

0f3e408

LoraLoaderMixin->StableDiffusionLoraLoaderMixin; add latest freeinit …

d9eabd4

…test

a-r-r-o-w requested a review from yiyixuxu July 28, 2024 11:14

Merge branch 'main' into animatediff/pag

939ff54

DN6 approved these changes Aug 1, 2024

View reviewed changes

DN6 merged commit 05b706c into huggingface:main Aug 1, 2024

a-r-r-o-w deleted the animatediff/pag branch August 1, 2024 07:19

yiyixuxu added the PAG label Sep 4, 2024

Conversation

a-r-r-o-w commented Jul 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

asomoza commented Jul 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 5, 2024

Uh oh!

DN6 left a comment

Choose a reason for hiding this comment

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w commented Jul 16, 2024

Uh oh!

a-r-r-o-w commented Jul 16, 2024

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Jul 16, 2024

Uh oh!

a-r-r-o-w commented Jul 16, 2024

Uh oh!

a-r-r-o-w commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu commented Jul 16, 2024

Uh oh!

a-r-r-o-w commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu commented Jul 16, 2024

Uh oh!

yiyixuxu commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-r-r-o-w commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu commented Jul 16, 2024

Uh oh!

a-r-r-o-w commented Jul 17, 2024

Uh oh!

a-r-r-o-w commented Jul 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

a-r-r-o-w commented Jul 4, 2024 •

edited

Loading

asomoza commented Jul 4, 2024 •

edited

Loading

a-r-r-o-w commented Jul 16, 2024 •

edited

Loading

a-r-r-o-w commented Jul 16, 2024 •

edited

Loading

yiyixuxu commented Jul 16, 2024 •

edited

Loading

a-r-r-o-w commented Jul 16, 2024 •

edited

Loading