Skip to content

Fix a GBK test failure on DB 143+[databricks]#13803

Merged
pxLi merged 2 commits intoNVIDIA:release/25.12from
firestarman:fix-gbk-db-failure
Nov 19, 2025
Merged

Fix a GBK test failure on DB 143+[databricks]#13803
pxLi merged 2 commits intoNVIDIA:release/25.12from
firestarman:fix-gbk-db-failure

Conversation

@firestarman
Copy link
Copy Markdown
Collaborator

@firestarman firestarman commented Nov 17, 2025

close #13800

The root cause is missing the check for DB runtimes when specifying the legacy charset config for the CSV read, and should not cache the supported charset list, instead build it each time overriding the plan according to the current config.

@firestarman
Copy link
Copy Markdown
Collaborator Author

build

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Nov 17, 2025

Greptile Summary

  • Fixes caching issue where supportedCharsets was computed once and never re-evaluated when spark.sql.legacy.javaCharsets config changed at runtime
  • Updates test to correctly set legacy charset config for Databricks 14.3+ which introduced charset restrictions similar to Spark 4.0.0

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes correctly address the root cause by converting lazy vals to defs to prevent stale config caching, and the test fix properly targets DB 14.3+ which introduced charset restrictions
  • No files require special attention

Important Files Changed

Filename Overview
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCSVScan.scala Converts supportedCharsets and legacyCharsetEnabled from cached values to methods that recompute on each call, fixing caching issue with runtime config changes

Sequence Diagram

sequenceDiagram
    participant Test as "CSV Test"
    participant Config as "Spark Config"
    participant Scan as "GpuCSVScan"
    participant Method as "supportedCharsets()"
    
    Test->>Config: "Set spark.sql.legacy.javaCharsets=true"
    Note over Config: Config updated at runtime
    Test->>Scan: "Request CSV scan with GBK charset"
    Scan->>Method: "Check if GBK supported"
    Method->>Config: "Read current config value"
    Config-->>Method: "Returns true (legacy enabled)"
    Method-->>Scan: "Returns [UTF-8, US-ASCII, GBK]"
    Scan-->>Test: "GBK is supported, scan proceeds on GPU"
Loading

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

@pxLi
Copy link
Copy Markdown
Member

pxLi commented Nov 18, 2025

still

[2025-11-17T14:02:29.491Z] =========================== short test summary info ============================

[2025-11-17T14:02:29.491Z] FAILED ../../src/main/python/csv_test.py::test_csv_read_gbk_encoded_data[DATAGEN_SEED=1763381021, TZ=UTC, IGNORE_ORDER({'local': True}), ALLOW_NON_GPU(CollectLimitExec)] - pyspark.errors.exceptions.captured.IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.FileSourceScanExec

[2025-11-17T14:02:29.491Z] FileScan csv [name#6927,age#6928,city#6929,job#6930] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths)[file:/home/ubuntu/spark-rapids/integration_tests/src/test/resources/te..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<name:string,age:int,city:string,job:string>

[2025-11-17T14:02:29.491Z] = 1 failed, 39378 passed, 3096 skipped, 1113 xfailed, 1002 xpassed, 18067 warnings in 7125.17s (1:58:45) =

also please retarget this change to release/25.12 which also requires this fix, thanks

@firestarman firestarman changed the base branch from main to branch-25.12 November 18, 2025 01:46
@firestarman firestarman requested a review from a team as a code owner November 18, 2025 01:46
@firestarman firestarman changed the base branch from branch-25.12 to release/25.12 November 18, 2025 01:46
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

74 files reviewed, no comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

@firestarman firestarman marked this pull request as draft November 18, 2025 02:01
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman firestarman changed the base branch from release/25.12 to main November 18, 2025 02:39
@firestarman
Copy link
Copy Markdown
Collaborator Author

firestarman commented Nov 18, 2025

still

[2025-11-17T14:02:29.491Z] =========================== short test summary info ============================

[2025-11-17T14:02:29.491Z] FAILED ../../src/main/python/csv_test.py::test_csv_read_gbk_encoded_data[DATAGEN_SEED=1763381021, TZ=UTC, IGNORE_ORDER({'local': True}), ALLOW_NON_GPU(CollectLimitExec)] - pyspark.errors.exceptions.captured.IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.FileSourceScanExec

[2025-11-17T14:02:29.491Z] FileScan csv [name#6927,age#6928,city#6929,job#6930] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths)[file:/home/ubuntu/spark-rapids/integration_tests/src/test/resources/te..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<name:string,age:int,city:string,job:string>

[2025-11-17T14:02:29.491Z] = 1 failed, 39378 passed, 3096 skipped, 1113 xfailed, 1002 xpassed, 18067 warnings in 7125.17s (1:58:45) =

also please retarget this change to release/25.12 which also requires this fix, thanks

I verified it manually and the test can pass.
Found the root cause and updated this PR accordingly. Now it should work when running with other tests.

@firestarman firestarman changed the base branch from main to release/25.12 November 18, 2025 03:39
@firestarman
Copy link
Copy Markdown
Collaborator Author

firestarman commented Nov 18, 2025

replaced by #13808 for the release/25.12 branch.

@firestarman firestarman reopened this Nov 18, 2025
@firestarman
Copy link
Copy Markdown
Collaborator Author

reopen to keep the history, need a force push to eliminate the unexpected change involed by the base retargeting.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman firestarman marked this pull request as ready for review November 18, 2025 06:12
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

@firestarman
Copy link
Copy Markdown
Collaborator Author

build

@pxLi pxLi merged commit efb54ab into NVIDIA:release/25.12 Nov 19, 2025
65 checks passed
@firestarman firestarman deleted the fix-gbk-db-failure branch November 19, 2025 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] test_csv_read_gbk_encoded_data fail Part of the plan is not columnar class of DB 14.3 runtime

4 participants