Use `device_map="auto"` in single file tests to support large models on limited GPU memory by jiqing-feng · Pull Request #13816 · huggingface/diffusers

jiqing-feng · 2026-05-27T06:52:33Z

Problem

Single file loading tests (SingleFileTesterMixin) used device=torch_device or device_map=torch_device, forcing the entire model onto a single GPU. For large models like FLUX.1-dev (~12B params, ~24GB in bf16), this fails on single 24GB GPUs — especially test_single_file_model_config which loads two models simultaneously.

Changes

tests/models/testing_utils/single_file.py

test_single_file_model_config: device=torch_device → device_map="auto"
test_single_file_model_parameters: device_map=str(torch_device) / device=torch_device → device_map="auto"
test_single_file_loading_with_device_map: device_map=torch_device → device_map="auto"

tests/models/transformers/test_models_transformer_flux.py

TestFluxSingleFile: added torch_dtype = torch.bfloat16 to halve memory usage

tests/single_file/test_model_flux_transformer_single_file.py

test_device_map_cuda → test_device_map_auto: device_map="cuda" → device_map="auto", added torch_dtype=torch.bfloat16

Why `device_map="auto"` instead of CPU offload

enable_model_cpu_offload() is a pipeline-level API, not available for individual model from_single_file loading. device_map="auto" is the model-level solution — accelerate automatically places weights on GPU and offloads the rest to CPU RAM when GPU memory is insufficient.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng · 2026-05-27T07:40:00Z

Hi @sayakpaul . Would you please review this PR? Thanks!

jiqing-feng · 2026-05-28T01:16:33Z

The failed CI ("LoRA tests with PEFT main") is unrelated to this PR. The same failure is happening across other PRs as well. It appears to be a flaky/known issue with PEFT main. All other 12 checks pass.

fix flux tests OOM on 24G GPU

9a5d226

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

github-actions Bot added size/M PR with diff < 200 LOC tests and removed size/M PR with diff < 200 LOC labels May 27, 2026

revert wrong change

2db10d5

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

github-actions Bot added the size/S PR with diff < 50 LOC label May 27, 2026

sayakpaul requested a review from DN6 May 27, 2026 09:20

DN6 approved these changes May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `device_map="auto"` in single file tests to support large models on limited GPU memory#13816

Use `device_map="auto"` in single file tests to support large models on limited GPU memory#13816
jiqing-feng wants to merge 2 commits into
huggingface:mainfrom
jiqing-feng:flux

jiqing-feng commented May 27, 2026

Uh oh!

jiqing-feng commented May 27, 2026

Uh oh!

jiqing-feng commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiqing-feng commented May 27, 2026

Problem

Changes

Why device_map="auto" instead of CPU offload

Uh oh!

jiqing-feng commented May 27, 2026

Uh oh!

jiqing-feng commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why `device_map="auto"` instead of CPU offload