Skip to content

Use device_map="auto" in single file tests to support large models on limited GPU memory#13816

Open
jiqing-feng wants to merge 2 commits into
huggingface:mainfrom
jiqing-feng:flux
Open

Use device_map="auto" in single file tests to support large models on limited GPU memory#13816
jiqing-feng wants to merge 2 commits into
huggingface:mainfrom
jiqing-feng:flux

Conversation

@jiqing-feng
Copy link
Copy Markdown
Contributor

Problem

Single file loading tests (SingleFileTesterMixin) used device=torch_device or device_map=torch_device, forcing the entire model onto a single GPU. For large models like FLUX.1-dev (~12B params, ~24GB in bf16), this fails on single 24GB GPUs — especially test_single_file_model_config which loads two models simultaneously.

Changes

tests/models/testing_utils/single_file.py

  • test_single_file_model_config: device=torch_devicedevice_map="auto"
  • test_single_file_model_parameters: device_map=str(torch_device) / device=torch_devicedevice_map="auto"
  • test_single_file_loading_with_device_map: device_map=torch_devicedevice_map="auto"

tests/models/transformers/test_models_transformer_flux.py

  • TestFluxSingleFile: added torch_dtype = torch.bfloat16 to halve memory usage

tests/single_file/test_model_flux_transformer_single_file.py

  • test_device_map_cudatest_device_map_auto: device_map="cuda"device_map="auto", added torch_dtype=torch.bfloat16

Why device_map="auto" instead of CPU offload

enable_model_cpu_offload() is a pipeline-level API, not available for individual model from_single_file loading. device_map="auto" is the model-level solution — accelerate automatically places weights on GPU and offloads the rest to CPU RAM when GPU memory is insufficient.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@github-actions github-actions Bot added size/M PR with diff < 200 LOC tests and removed size/M PR with diff < 200 LOC labels May 27, 2026
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@github-actions github-actions Bot added the size/S PR with diff < 50 LOC label May 27, 2026
@jiqing-feng
Copy link
Copy Markdown
Contributor Author

Hi @sayakpaul . Would you please review this PR? Thanks!

@sayakpaul sayakpaul requested a review from DN6 May 27, 2026 09:20
@jiqing-feng
Copy link
Copy Markdown
Contributor Author

The failed CI ("LoRA tests with PEFT main") is unrelated to this PR. The same failure is happening across other PRs as well. It appears to be a flaky/known issue with PEFT main. All other 12 checks pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S PR with diff < 50 LOC tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants