Microbenchmarking, Torch+CSV-based#478
Conversation
d9f25f2 to
ce0775a
Compare
ce0775a to
8a0ea47
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Performance Regression ReportMI325PR commit: ddd17d4 | Base:
benchmark_attention (median 1.000x, min 0.650x, max 1.390x)
benchmark_casting (median 0.998x, min 0.920x, max 1.034x)
benchmark_gemm (median 1.001x, min 0.322x, max 2.361x)
benchmark_gemm_fp8 (median 0.986x, min 0.285x, max 1.610x)
benchmark_grouped_gemm (median 1.000x, min 0.439x, max 2.438x)
benchmark_normalization (median 1.006x, min 0.633x, max 2.066x)
MI355PR commit: ddd17d4 | Base:
benchmark_attention (median 1.000x, min 0.969x, max 1.014x)
benchmark_casting (median 0.998x, min 0.898x, max 1.138x)
benchmark_gemm (median 1.001x, min 0.935x, max 1.094x)
benchmark_gemm_fp8 (median 0.988x, min 0.401x, max 1.038x)
benchmark_grouped_gemm (median 0.999x, min 0.912x, max 1.134x)
benchmark_normalization (median 0.993x, min 0.399x, max 1.490x)
|
Micky774
left a comment
There was a problem hiding this comment.
A few general comments in addition to the inline:
- Regarding copyright, some spots are 2026 only while others are 2025-2026 -- is there a specific reason, or can we be specific and only set 2026?
- It seems that
dtype=torch.bfloat16is hard-coded -- can we generalize to allow for e.g. fp16 benchmarks? - Can we document the
bench_fncontract so that it's easier for new developers to contribute additional benchmarks? - Can we have a more general
RECIPESdict similar to NV ()TransformerEngine/benchmarks/linear/benchmark_grouped_linear.py
Lines 60 to 65 in a0b88f4
- Can we add a
README.mdto document?
Changed to 2026 in 284adda.
Done in 284adda.
Added documentation in utils.py and README.
Added in 284adda.
Added in ca1f442. |
| ] | ||
|
|
||
|
|
||
| def _generate_cast_test_cases(): |
There was a problem hiding this comment.
Emm, we are already in benchmark_casting.py so _generate_test_cases() should suffice? More generally, I saw each benchmark scripts have different functions or names for test case setups. Is it possible to unify them or just follow a unified pattern?
There was a problem hiding this comment.
Renamed the cast and norm functions to _generate_test_cases in abefa95.
| def time_func(fn, method="adaptive", min_run_time=DEFAULT_MIN_RUN_TIME_SECONDS): | ||
| """Time *fn* and return elapsed milliseconds. | ||
|
|
||
| method: "adaptive" uses adaptive_autorange (good for compute-bound), | ||
| "blocked" uses blocked_autorange (good for memory-bound). | ||
| """ | ||
| timer = benchmark.Timer(stmt="fn()", globals={"fn": fn}) | ||
| if method == "blocked": | ||
| return timer.blocked_autorange(min_run_time=min_run_time).mean * 1e3 | ||
| return timer.adaptive_autorange(min_run_time=min_run_time).mean * 1e3 |
There was a problem hiding this comment.
For the csv outputs you record only the means -- I think it would be useful to be able to save the underlying samples and individual runtimes for downstream analysis.
There was a problem hiding this comment.
Thanks for the suggestion. What do you think of 39b5720, which adds a new argument (--csv-samples), and stores the samples into a separate csv file?
Micky774
left a comment
There was a problem hiding this comment.
One final comment, otherwise LGTM.
Description
See also #487.
Pytorch benchmark timing: https://docs.pytorch.org/tutorials/recipes/recipes/benchmark.html
Open questions:
Do we need to rebuild the PR branch after perf testing is done?Partly addresses https://github.com/ROCm/frameworks-internal/issues/15863
Microbenchmarking (not just) for CI.
TODOs:
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: