[TE] Enable deterministic mode for fused attention by AllenFarcas · Pull Request #508 · ROCm/TransformerEngine

AllenFarcas · 2026-03-27T18:37:04Z

Description

Please include a brief summary of the changes, relevant motivation and context.

Fixes https://github.com/ROCm/frameworks-internal/issues/15875

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Added deterministic functionality to fused attention
Added test for the introduced deterministic functionality

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Copilot

Pull request overview

Enables deterministic mode propagation for ROCm fused-attention backward (CK backend) and adds JAX coverage to validate bitwise reproducibility and gradient correctness when non-deterministic algorithms are disallowed.

Changes:

Forward the deterministic flag from NVTE ROCm fused-attn backward entrypoints into CK backend calls.
Add JAX tests that (on HIP/AMD) verify backward gradients are bitwise reproducible across runs and match an unfused JAX reference.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
transformer_engine/common/fused_attn_rocm/fused_attn.cpp	Passes the `deterministic` argument into CK fused-attn backward implementations (qkvpacked/kvpacked/separate).
tests/jax/test_fused_attn.py	Adds HIP-only deterministic-backward tests and imports `global_shard_guard` to ensure mesh resource context is set.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Micky774 · 2026-03-27T19:44:25Z

Unless we want to support non-deterministic CK only for the JAX integration, we should probably also add some tests to the pytorch integration side since it'll be enabled there too.

Also I think you still need to adjust

TransformerEngine/transformer_engine/pytorch/attention/dot_product_attention/utils.py

Lines 1070 to 1078 in 82617fe

    
           # TODO: remove the filtering after ck team tells us how to enable more deterministic bwd kernels 
        
           if use_fused_attention and deterministic and IS_HIP_EXTENSION: 
        
               if ( 
        
                   fused_attention_backend == FusedAttnBackend["CK"] 
        
                   and is_training 
        
               ): 
        
                   logger.debug("Disabling FusedAttention for determinism reasons") 
        
                   use_fused_attention = False 
        
                   fused_attention_backend = None #TODO: switch to AOTriton when supported

wangye805

BTW, add some deterministic testcases in pytorch side as well

wangye805 · 2026-03-27T21:22:59Z

+    if check_numerical is None:
+        check_numerical = seq_len <= 256


Why do we skip checking the numerical for cases with seqlen<=256

wangye805 · 2026-03-27T21:23:36Z

 from transformer_engine.jax.cpp_extensions.misc import is_hip_extension
 from transformer_engine.jax import autocast
-from transformer_engine.jax.sharding import MeshResource
+from transformer_engine.jax.sharding import MeshResource, global_shard_guard


Why do we need this?

wangye805 · 2026-03-27T21:25:35Z

+    if check_numerical is None:
+        check_numerical = seq_len <= 256
+    s = seq_len
+    dtype = jnp.bfloat16


Let's check for both bf16 and fp16

wangye805 · 2026-03-27T21:30:43Z

+    backend = FusedAttnHelper(
+        True, dtype, dtype, qkv_layout, AttnBiasType.NO_BIAS, attn_mask_type,
+        0.0, h_q, h_kv, s, s, d, d, (-1, -1),
+    ).get_fused_attn_backend()
+    if backend == NVTE_Fused_Attn_Backend.NVTE_No_Backend:
+        pytest.skip("No fused attention backend available for this config")
+    assert backend == NVTE_Fused_Attn_Backend.NVTE_CK, (
+        f"Expected CK backend but got {backend}."
+    )


Technically, if we specify NVTE_ALLOW_NONDETERMINISTIC_ALGO=0, the backend selection should take this env and choose deterministic backend for us, not restricting to CK. As I recall, aotriton by its nature is deterministic @xinyazhang

Well, backend selection API does not support deterministic flag. And yes, TE considers AOTriton as deterministic. The question is if we want to test it for AOTrioton

wangye805 · 2026-03-27T21:34:02Z

+    ],
+)
+def test_deterministic_bwd_gqa(attn_mask_type):
+    """GQA variant: BSHD_BSHD_BSHD with h_q != h_kv."""


Also, extend to nonGQA cases as well

wangye805 · 2026-03-27T21:40:04Z

+    _run_deterministic_bwd_case(
+        qkv_layout=QKVLayout.BSHD_BSHD_BSHD,
+        attn_mask_type=attn_mask_type,
+        b=2, seq_len=2048, h_q=12, h_kv=4, d=128,


Also, check some sequence packing cases

wangye805

My only concern right now is, aotriton backend is also determinstic. But your current PR should be okay since CK backend has higher priority than aotriton backend

wangye805 · 2026-04-23T15:11:01Z

   :Type: ``int`` (0 or 1)
   :Default: ``1``
-   :Description: Allow non-deterministic algorithms for Transformer Engine execution. When set to ``0``, only deterministic algorithms are allowed. This is relevant for both PyTorch and JAX attention implementations.
+   :Description: Allow non-deterministic algorithms for Transformer Engine execution. When set to ``0``, only deterministic algorithms are allowed. This is relevant for both PyTorch and JAX attention implementations. On AMD/HIP builds, setting this to ``0`` enables the deterministic backward pass of the CK FusedAttention backend (which uses a split-accumulator workspace for deterministic ``dQ``); on NVIDIA builds it disables FusedAttention paths that are known to be non-deterministic.


As far as I know, our aotriton backend is also deterministic. Why setting this only enables the CK backend?

Since AOTriton is always deterministic, there is no action required when NVTE_ALLOW_NONDETERMINISTIC_ALGO=0, so this envvar description is technically accuracy since only CK needs that flag to change its behavior.

I think the wording is accurate, but confusing. The behavior really isn't different from NV. I don't think the description needs to be changed from the original.

Micky774 · 2026-04-23T15:26:27Z

Currently it remains untested -- we still need to add it to the CI tests

Micky774 · 2026-04-23T17:30:25Z

Two questions regarding tests:

Do we want to have them be level 1 or 3?
Do we want to have determinism tests be mutually exclusive w/ regular tests, i.e. add a decorator to skip normal tests for AMD if the flag is set?

If we don't make them mutually exclusive, I think they should be level 3. If we do, then level 1 is fine.

cc: @wangye805

wangye805 · 2026-04-23T18:51:46Z

Two questions regarding tests:

Do we want to have them be level 1 or 3?

Do we want to have determinism tests be mutually exclusive w/ regular tests, i.e. add a decorator to skip normal tests for AMD if the flag is set?

If we don't make them mutually exclusive, I think they should be level 3. If we do, then level 1 is fine.

cc: @wangye805

Previously Ilya and I agreed that level 1 should be running things with default config, like hip cast transpose, instead of triton cast transpose. So if this deterministic flow is not set to true by default, we can put it into level 3 ci

Micky774 · 2026-04-23T19:17:53Z

   :Type: ``int`` (0 or 1)
   :Default: ``1``
-   :Description: Allow non-deterministic algorithms for Transformer Engine execution. When set to ``0``, only deterministic algorithms are allowed. This is relevant for both PyTorch and JAX attention implementations.
+   :Description: Allow non-deterministic algorithms for Transformer Engine execution. When set to ``0``, only deterministic algorithms are allowed.


Let's minimize the diff here

Micky774 · 2026-04-23T19:35:17Z

Let's make sure to wait for a green level 3 CI first

Micky774 · 2026-04-27T15:34:48Z

CI failure is unrelated

[Fix] Added functionality and test for determinism in Fused Attention.

00b1aa9

AllenFarcas requested review from ipanfilo, wangye805 and wenchenvincent as code owners March 27, 2026 18:37

AllenFarcas requested review from Copilot, ipanfilo, wangye805 and wenchenvincent and removed request for ipanfilo, wangye805 and wenchenvincent March 27, 2026 18:37

Copilot started reviewing on behalf of AllenFarcas March 27, 2026 18:38 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

Comment thread tests/jax/test_fused_attn.py

Comment thread tests/jax/test_fused_attn.py Outdated

[Fix] Removed extra space and restore env var

2eec8ff

AllenFarcas requested a review from Copilot March 27, 2026 18:50

Copilot started reviewing on behalf of AllenFarcas March 27, 2026 18:52 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

Comment thread tests/jax/test_fused_attn.py

Comment thread tests/jax/test_fused_attn.py Outdated

Comment thread tests/jax/test_fused_attn.py Outdated

Comment thread tests/jax/test_fused_attn.py Outdated

[Fix] Addressed review comments, refactored.

6ce1b9d

AllenFarcas requested a review from Copilot March 27, 2026 19:11

Copilot started reviewing on behalf of AllenFarcas March 27, 2026 19:13 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

Comment thread tests/jax/test_fused_attn.py Outdated

Comment thread tests/jax/test_fused_attn.py Outdated

Comment thread tests/jax/test_fused_attn.py Outdated

Comment thread tests/jax/test_fused_attn.py Outdated

AllenFarcas added the ci-level 1 CI test level 1 label Mar 27, 2026

wangye805 requested changes Mar 27, 2026

View reviewed changes

AllenFarcas added 3 commits April 2, 2026 16:44

Merge branch 'dev' into alfarcas/aima60-fix

9a82efa

Merge branch 'dev' into alfarcas/aima60-fix

015896c

[Fix] Addressed PR comments

bc9b7a4

Micky774 requested changes Apr 7, 2026

View reviewed changes

Comment thread tests/pytorch/attention/test_attention.py Outdated

Comment thread tests/pytorch/attention/test_attention.py Outdated

Comment thread tests/pytorch/attention/test_attention.py Outdated

AllenFarcas added 2 commits April 9, 2026 13:51

[Fix] Addressed PR comments and refactored

6c3ce73

[Fix] Fixed failing tests and issues

9bd2526

AllenFarcas requested a review from Micky774 April 22, 2026 22:13

[Fix] Addressed PR comments

b6832c2

AllenFarcas requested review from Copilot and removed request for Copilot April 22, 2026 22:40

Copilot started reviewing on behalf of AllenFarcas April 22, 2026 22:42 View session

AllenFarcas added 2 commits April 23, 2026 09:15

[Fix] Fixed running issue

fdec471

[Fix] Removed changes to qa folder

78c6968

Micky774 reviewed Apr 23, 2026

View reviewed changes

Comment thread tests/pytorch/attention/test_attention.py

Micky774 reviewed Apr 23, 2026

View reviewed changes

Comment thread tests/pytorch/attention/test_attention.py Outdated

[Fix] Addressed PR comments

de70b20

AllenFarcas requested a review from Micky774 April 23, 2026 15:06

wangye805 approved these changes Apr 23, 2026

View reviewed changes

AllenFarcas added ci-level 3 CI test level 3 and removed ci-level 1 CI test level 1 labels Apr 23, 2026

Added deterministic tests to CI

6bc5234

Added deterministic tests to CI level 3

647d07f

Micky774 reviewed Apr 23, 2026

View reviewed changes

AllenFarcas added 3 commits April 23, 2026 19:25

Fixed doc

4618eaa

Fixed doc

a166ed5

Fixed doc

2764bed

AllenFarcas requested a review from Micky774 April 23, 2026 19:26

Micky774 approved these changes Apr 23, 2026

View reviewed changes

ipanfilo approved these changes Apr 23, 2026

View reviewed changes

AllenFarcas merged commit 8943023 into dev Apr 27, 2026
2 of 3 checks passed

AllenFarcas deleted the alfarcas/aima60-fix branch April 27, 2026 18:41

Conversation

AllenFarcas commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Micky774 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangye805 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wangye805 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Micky774 commented Apr 23, 2026

Uh oh!

Micky774 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangye805 commented Apr 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Micky774 commented Apr 23, 2026

Uh oh!

Micky774 commented Apr 27, 2026

Uh oh!

AllenFarcas commented Mar 27, 2026 •

edited

Loading

Micky774 commented Mar 27, 2026 •

edited

Loading

Micky774 commented Apr 23, 2026 •

edited

Loading