Any plans to add ModelScope's 1.7B text2video synthesis diffusion model?

### Model/Pipeline/Scheduler description

Hello!

There seems to be a new 1.7B-parameter Diffusion-based model by ModelScope allowing text2video synthesis as noted by AKHaliq https://twitter.com/_akhaliq/status/1637321077553606657?s=20. Both the [model implementation](https://github.com/modelscope/modelscope/tree/master/modelscope/models/multi_modal/video_synthesis) and weights (downloaded with their pipeline) are in open access and it's already possible to launch it via [HuggingFace's spaces](https://huggingface.co/spaces/damo-vilab/modelscope-text-to-video-synthesis). However, the model lacks a lot of possible optimizations, especially concerning LowVRAM mode, and accessibility options, and I believe it would benefit greatly from the help of Diffusers community.

Example: monkey playing on drums

https://user-images.githubusercontent.com/14872007/226178634-d97b9782-a8fd-4dd1-989f-2544992a96b3.mp4

At this time the model should be fitting around 16 gbs of VRAM, but since it's a combination of 4 gb, 6 gb, and 5 gb models, I believe with half precision and sequential pipeline it will be eventually possible to launch it on modern consumer hardware.

The license is Apache-2.0 license, so there will be no problems with using the code as the reference.

### Open source status

- [X] The model implementation is available
- [X] The model weights are available (Only relevant if addition is not a scheduler).

### Provide useful links for the implementation

HuggingFace space:

https://huggingface.co/spaces/damo-vilab/modelscope-text-to-video-synthesis

All the parts of the model at HuggingFace:

https://huggingface.co/damo-vilab/modelscope-damo-text-to-video-synthesis/tree/main

The model PyTorch implementation:

https://github.com/modelscope/modelscope/tree/master/modelscope/models/multi_modal/video_synthesis

Google Colab from the devs:

https://colab.research.google.com/drive/1uW1ZqswkQ9Z9bp5Nbo5z59cAn7I0hE6R?usp=sharing

License:  Apache-2.0 license 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any plans to add ModelScope's 1.7B text2video synthesis diffusion model? #2736

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Any plans to add ModelScope's 1.7B text2video synthesis diffusion model? #2736

Description

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions