PAG variant for AnimateDiff#8789
Conversation
|
Nice! I'll do some tests. It looks good. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
DN6
left a comment
There was a problem hiding this comment.
It's looking good to me. Could we add some tests and I think there are some issues with the copied from statement. I think you'd just need to run make fix-copies
| in_channels=out_channels, | ||
| num_layers=temporal_transformer_layers_per_block[i], | ||
| norm_num_groups=temporal_norm_num_groups, | ||
| norm_num_groups=resnet_groups, |
There was a problem hiding this comment.
Context for why we need this change: #7707 (comment). It is the correct thing to do here. cc @DN6
| return f"attentions_{module_name.split('.')[3]}" | ||
| elif "attentions" in module_name.split(".")[1]: | ||
| return f"attentions_{module_name.split('.')[2]}" | ||
| # down_blocks.1.motion_modules.0.transformer_blocks.0.attn1 -> "motion_modules_0" |
There was a problem hiding this comment.
Need to support motion modules self attention layers as well in PAGMixin. cc @yiyixuxu
There was a problem hiding this comment.
nice, let's add a note here too https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pag/pag_utils.py#L129
| # down_blocks.1.attentions.0.transformer_blocks.0.attn1 -> "block_1" | ||
| # mid_block.attentions.0.transformer_blocks.0.attn1 -> "block_0" | ||
| if "attentions" in module_name.split(".")[1]: | ||
| module_name_splits = module_name.split(".") |
There was a problem hiding this comment.
Did a little bit of a refactor as well to make all functions look similar-ish to get_attn_index. Can open a separate PR if this is out of scope. cc yiyixuxu
| } | ||
| return inputs | ||
|
|
||
| def test_from_pipe_consistent_config(self): |
There was a problem hiding this comment.
it is probably because we didn't handle the deprecated unet config in __init__
related here: #7564
because this pipeline shares the same checkpoints with sd1.5, technically you need to handle the deprecation too even though it is a new pipeline
however in practise, I think maybe no one will use these really old checkpoints on animate diff, so I'm ok to skip the test here
| def get_dummy_components(self): | ||
| cross_attention_dim = 8 | ||
| block_out_channels = (8, 8) | ||
|
|
||
| torch.manual_seed(0) | ||
| unet = UNet2DConditionModel( | ||
| block_out_channels=block_out_channels, | ||
| layers_per_block=1, | ||
| sample_size=8, | ||
| in_channels=4, | ||
| out_channels=4, | ||
| down_block_types=("CrossAttnDownBlock2D", "DownBlock2D"), | ||
| up_block_types=("CrossAttnUpBlock2D", "UpBlock2D"), | ||
| cross_attention_dim=cross_attention_dim, | ||
| norm_num_groups=2, | ||
| ) |
There was a problem hiding this comment.
@DN6 Might be of interest to you for AnimateDiff 👀 Makes the tests much faster!
For test_animatediff.py:
real 2m38,020s
user 17m37,497s
sys 1m59,702s
For PAG AnimateDiff with new dummy component sizes:
real 1m36,838s
user 1m43,263s
sys 0m29,261s
|
After giving some more thought and observing the outputs, I feel the perturbed path for motion model seems to have either a lesser impact or negative impact on the generations. PAG works great for spatial self attention layers, that of text to image models, as we've seen from many other pipelines, however I think more experiments are needed when dealing with temporal layers like the attn in motion model. Will report back with some experiments soon. |
|
Btw if we're okay with merging this for the generation improvements with just the spatial attn PAG processors, we can go ahead with that. Can continue experimenting with the motion models separately and revert the pag_utils changes here. |
| # down_blocks.1.attentions.0.transformer_blocks.0.attn1 -> "block_1" | ||
| # mid_block.attentions.0.transformer_blocks.0.attn1 -> "block_0" | ||
| if "attentions" in module_name.split(".")[1]: | ||
| module_name_splits = module_name.split(".") |
| return f"attentions_{module_name.split('.')[3]}" | ||
| elif "attentions" in module_name.split(".")[1]: | ||
| return f"attentions_{module_name.split('.')[2]}" | ||
| # down_blocks.1.motion_modules.0.transformer_blocks.0.attn1 -> "motion_modules_0" |
There was a problem hiding this comment.
nice, let's add a note here too https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pag/pag_utils.py#L129
| } | ||
| return inputs | ||
|
|
||
| def test_from_pipe_consistent_config(self): |
There was a problem hiding this comment.
it is probably because we didn't handle the deprecated unet config in __init__
related here: #7564
because this pipeline shares the same checkpoints with sd1.5, technically you need to handle the deprecation too even though it is a new pipeline
however in practise, I think maybe no one will use these really old checkpoints on animate diff, so I'm ok to skip the test here
what do you mean here? do you mean applying PAG only on the motion modules does not generate good results? e.g. |
Thanks for the review! So, I added the changes for PAG to work with motion modules self attn (temporal layers) later in this commit). The results posted in the PR description utilize PAG in only the spatial layers. After my latest changes, I prefer the outputs of just spatial as opposed to spatial and temporal PAG from the few quick experiments I tried. I will soon look into it more thoroughly and report back. cc @asomoza |
|
I took a quick look at the Comfy implementation earlier today to see what Kosinkadink was using: https://github.com/Kosinkadink/ComfyUI-AnimateDiff-Evolved/blob/4dd592e9fce9ac59edadee40cf4d2069165dc226/animatediff/cfg_extras.py#L59 It is being applied to both spatial and temporal attn1 layers, which is what we have at the moment too, so I'm okay to roll with this and prepare for merge. But, I think we can investigate a bit further eventually into the dynamics of PAG with temporal layers (perhaps community issue with advanced label @yiyixuxu?). Also found unofficial implementations of the latest PAGMixin being used in different ways which makes this interesting (example: https://github.com/pixeli99/Spatio-Temporal-Shuffle-Guidance; demos in README). |
I think in the PR description code, you used the commit you added here in this commit allows you to apply PAG to ONLY temporal layers, that does not yield good results - do I understand this correctly? |
Correct. With the latest commits, using What I'm trying to point out is that the demos that you see in the description are the ones with only spatial PAG because I had not implemented it for motion models yet then.
Nope. Because of that commit, PAG applies to BOTH spatial and temporal layers. I preferred the output of what was before that commit (that is ONLY spatial). Hope it makes sense now 😅 TLDR;
|
|
you can see that here when pass |
|
ohh i see now, we needed the change applied to |
|
reply to #8789 (comment) maybe that is correct, but when I was debugging, I did not notice any of the motion module self attn layers being set, and that was one reason for adding my changes. The other reason being more fine grained control to do things like |
|
ok, we can merge this PR as it is consistent with comfy |
|
Looking good to merge from my end after we merge #8846 since this change is out of scope of the PR. |
|
Fixed the broken tests here. This looks good to merge after #8846 which is now complete too |
* add animatediff pag pipeline * remove unnecessary print * make fix-copies * fix ip-adapter bug * update docs * add fast tests and fix bugs * update * update * address review comments * update ip adapter single test expected slice * implement test_from_pipe_consistent_config; fix expected slice values * LoraLoaderMixin->StableDiffusionLoraLoaderMixin; add latest freeinit test
What does this PR do?
Looking at #8710, I thought it might interesting to apply PAG to video generation pipelines, and see if there's interest in supporting this.
In addition to this, I would also like to propose the addition of
AutoPipelineForTextToVideosince we support a few video models now, and this will continue to grow with ongoing research progress. WDYT?Code
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@yiyixuxu @asomoza @DN6