Upon thorough examination of the repository, we believe it could be enriched by the introduction of additional features. These enhancements aim to augment the repository's functionality and extend the available modules for the MONAI user community. The suggested enhancements are detailed below.
- We propose the development of varied conditional encoder modules, as depicted in the original latent-diffusion repository, for the generation of N-Dimensional medical images. The prospective supplementary modules are outlined as follows:
- ClassEmbedder
- TransformerEmbedder
- BERTTokenizer
- BERTEmbedder
- SpatialRescaler
- FrozenCLIPTextEmbedder
- FrozenClipImageEmbedder, etc.
- Furthermore, it is crucial to incorporate comprehensive tutorials for each newly implemented encoder.
- Consider refining the implementation of latent-diffusion to accommodate various condition types. Currently, it exclusively supports "cross-attention". We propose the inclusion of two or more additional options to enhance the system's capabilities.
- concat
- hybrid, etc.
- Suggested improvements relating to GPU.
- Inclusion of activation checkpointing for memory optimization along with associated tutorials.
- Integration of distributed model training, accompanied by relevant tutorials.
- Introduction of PyTorch
ConvTranspose support in the decoder to prevent int32 limitation on torch.nn.functional.interpolate for large tensors.
- We propose offering pre-trained diffusion model weights, accessible via the Cloud, for user integration within their specific applications, accompanied by a comprehensive demo or tutorial for ease of use.
We express keen interest in proceeding with comprehensive discussions concerning any of the items outlined above.
Upon thorough examination of the repository, we believe it could be enriched by the introduction of additional features. These enhancements aim to augment the repository's functionality and extend the available modules for the MONAI user community. The suggested enhancements are detailed below.
ConvTransposesupport in the decoder to preventint32limitation ontorch.nn.functional.interpolatefor large tensors.We express keen interest in proceeding with comprehensive discussions concerning any of the items outlined above.