Skip to content

Remove masks and update gmm function in moe.py#4082

Merged
copybara-service[bot] merged 3 commits into
mainfrom
chengnuojin-remove-mask
Jun 9, 2026
Merged

Remove masks and update gmm function in moe.py#4082
copybara-service[bot] merged 3 commits into
mainfrom
chengnuojin-remove-mask

Conversation

@NuojCheng

@NuojCheng NuojCheng commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Description

This PR cherrypicks proposed changes from @RissyRan in here, and do following things

  • Update gmm function in moe.py and add group_offset input
  • Remove unnecessary masks to further extend performance

FIXES: b/519220305

Tests

Xprof

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov

codecov Bot commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

🤖 Hi @NuojCheng, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

🤖 I'm sorry @NuojCheng, but I was unable to process your request. Please see the logs for more details.

@NuojCheng NuojCheng force-pushed the chengnuojin-remove-mask branch from ade8434 to 563a28a Compare June 5, 2026 22:30

@RissyRan RissyRan left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clean this up and add one more removal!

Comment thread src/maxtext/kernels/ragged/ragged_sort.py
Update reference HLO from CI artifact
@NuojCheng NuojCheng force-pushed the chengnuojin-remove-mask branch from 80f6d4a to 2d109a6 Compare June 9, 2026 00:16
@copybara-service copybara-service Bot merged commit d0672d6 into main Jun 9, 2026
68 of 72 checks passed
@copybara-service copybara-service Bot deleted the chengnuojin-remove-mask branch June 9, 2026 01:58
copybara-service Bot pushed a commit that referenced this pull request Jun 11, 2026
# Description

1. Refactor `manual_axis_type` conditional specification.
- It was previously introduced by [PR#3770](#3770) and [PR#3869](#3869).
- It requires tokamax > 0.12.0 (unreleased yet) with [commit](openxla/tokamax@cc374e3)
- It is used for gmm shard_map with check_vma=True. This is currently only activated for experimental run (deepseek_batchsplit and use_manual_quantization). Tests won't blocked by dependency.

2. Add unit test for `use_tokamax_gmm=True` for smoke train. Both bf16 and fp8.
- Smoke train instead of AOT as blocked by b/489205940
- Run in g3 only to save github CI/CD time

3. Reset `group offset` to None: [PR#4082](#4082) added `group_offset` to `tokamax.ragged_dot` and  `tokamax.ragged_dot_general`.

However, `group_offset` is not yet supported by tokamax:
- see [tokamax code](https://github.com/openxla/tokamax/blob/cdde910bf925d834c0c9e6cee5a488095f0381d4/tokamax/_src/ops/ragged_dot/api.py#L177-L178)
- the added unit test meet [error](http://shortn/_oy55Dkh22N)

# Tests

unit test `tokamax_gmm_test`

# Checklist

Before submitting this PR, please make sure (put X in square brackets):
- [x] I have performed a self-review of my code. For an optional AI review, add the `gemini-review` label.
- [x] I have necessary comments in my code, particularly in hard-to-understand areas.
- [x] I have run end-to-end tests tests and provided workload links above if applicable.
- [x] I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in [our documentation](https://maxtext.readthedocs.io/en/latest/development.html#adding-new-documentation-files).

PiperOrigin-RevId: 930601487
@shuningjin shuningjin mentioned this pull request Jun 11, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants