Skip to content

Adds ignore_order for groupBy agg test that returns multiple rows [databricks]#14044

Merged
abellina merged 1 commit intoNVIDIA:mainfrom
abellina:add_ignore_order_for_agg_test
Dec 21, 2025
Merged

Adds ignore_order for groupBy agg test that returns multiple rows [databricks]#14044
abellina merged 1 commit intoNVIDIA:mainfrom
abellina:add_ignore_order_for_agg_test

Conversation

@abellina
Copy link
Copy Markdown
Collaborator

@abellina abellina commented Dec 21, 2025

Fixes #14043

Description

The test test_avg_divide_by_zero performs a groupBy("k") which returns multiple rows (k=0 and k=1), but it does NOT use @ignore_order, which means that we've been getting lucky in the ordering. That said #14043 documents a failure where spark 3.3.0 and Databricks is returning a different order. Since the order of the keys isn't deterministic, I added an @ignore_order.

The unit test was added here #13192

Checklists

  • This PR has added documentation for new or modified features or behaviors.
  • This PR has added new tests or modified existing tests to cover new code paths.
    (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.)
  • Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description.

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
@abellina
Copy link
Copy Markdown
Collaborator Author

build

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Dec 21, 2025

Greptile Summary

This PR fixes a test flakiness issue by adding the @ignore_order decorator to test_avg_divide_by_zero. The test performs a groupBy("k") aggregation that returns multiple rows, but the ordering of grouped results is non-deterministic in Spark.

Key changes:

  • Added @ignore_order decorator to test_avg_divide_by_zero in hash_aggregate_test.py:2888
  • Fixes test failures on Databricks where GPU and CPU return different orderings for the groupBy key column (k)
  • Aligns with the pattern used by other groupBy aggregation tests in the same file

The fix is minimal, correct, and follows established conventions in the codebase. The actual aggregation logic and computed values (avg=63.0) are correct on both GPU and CPU - only the ordering differed.

Confidence Score: 5/5

  • This PR is safe to merge with no risk
  • The change is a single-line addition of a test decorator that is widely used throughout the test file for the same purpose. The fix correctly addresses a real test failure caused by non-deterministic groupBy ordering, which is expected behavior in Spark. No logic changes, no new functionality - just proper test annotation.
  • No files require special attention

Important Files Changed

Filename Overview
integration_tests/src/main/python/hash_aggregate_test.py Added @ignore_order decorator to fix non-deterministic ordering in groupBy aggregation test

Sequence Diagram

sequenceDiagram
    participant Test as test_avg_divide_by_zero
    participant Spark as Spark Engine
    participant GPU as GPU Executor
    participant CPU as CPU Executor
    
    Test->>Spark: Create DataFrame (id % 2 as k, id as v)
    Test->>Spark: Execute groupBy("k").agg(avg(...))
    
    par GPU Execution
        Spark->>GPU: Execute query
        GPU->>GPU: Group by k (0, 1)
        GPU->>GPU: Compute avg(CASE WHEN k>0...)
        Note over GPU: Returns rows in order:<br/>[k=1, avg=63.0]<br/>[k=0, avg=None]
        GPU-->>Test: GPU Result
    and CPU Execution
        Spark->>CPU: Execute query
        CPU->>CPU: Group by k (0, 1)
        CPU->>CPU: Compute avg(CASE WHEN k>0...)
        Note over CPU: Returns rows in order:<br/>[k=0, avg=None]<br/>[k=1, avg=63.0]
        CPU-->>Test: CPU Result
    end
    
    Test->>Test: Compare results with @ignore_order
    Note over Test: Order differences ignored<br/>Test passes ✓
Loading

@sameerz sameerz added the test Only impacts tests label Dec 21, 2025
Copy link
Copy Markdown
Collaborator

@sameerz sameerz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to know the root cause of the ordering change in our underlying systems. Approving given the change is in tests only, and the ordering should not matter.

@sameerz
Copy link
Copy Markdown
Collaborator

sameerz commented Dec 21, 2025

Possibly related: an upgrade to CCCL 3.2 NVIDIA/spark-rapids-jni#4094

@abellina abellina merged commit e95ea4c into NVIDIA:main Dec 21, 2025
51 checks passed
@ttnghia
Copy link
Copy Markdown
Collaborator

ttnghia commented Dec 22, 2025

Would be good to know the root cause of the ordering change in our underlying systems. Approving given the change is in tests only, and the ordering should not matter.

I've bisected the cudf commits and can confirm that this is due to rapidsai/cudf#20796, which changes the behavior of hash-partitioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test Only impacts tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] test_avg_divide_by_zero failed for OSS Spark 3.3.0 and across all Databricks runtime versions

4 participants