Update dependency version JNI, private, hybrid to 26.02.0#14254
Merged
NvTimLiu merged 1 commit intorelease/26.02from Feb 6, 2026
Merged
Update dependency version JNI, private, hybrid to 26.02.0#14254NvTimLiu merged 1 commit intorelease/26.02from
NvTimLiu merged 1 commit intorelease/26.02from
Conversation
Wait for the pre-merge CI job to SUCCEED Signed-off-by: nvauto <70000568+nvauto@users.noreply.github.com>
Collaborator
Author
|
build |
Contributor
Greptile OverviewGreptile SummaryUpdated dependency versions for
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Dev as Developer
participant POM as pom.xml
participant POM13 as scala2.13/pom.xml
participant Maven as Maven Build System
participant JNI as spark-rapids-jni 26.02.0
participant Private as spark-rapids-private 26.02.0
participant Hybrid as spark-rapids-hybrid 26.02.0
Dev->>POM: Update spark-rapids-jni.version to 26.02.0
Dev->>POM: Update spark-rapids-private.version to 26.02.0
Dev->>POM: Update spark-rapids-hybrid.version to 26.02.0
Dev->>POM13: Update spark-rapids-jni.version to 26.02.0
Dev->>POM13: Update spark-rapids-private.version to 26.02.0
Dev->>POM13: Update spark-rapids-hybrid.version to 26.02.0
Note over POM,POM13: Versions changed from 26.02.0-SNAPSHOT to 26.02.0
Maven->>POM: Read dependency versions
Maven->>POM13: Read dependency versions
Maven->>JNI: Resolve spark-rapids-jni:26.02.0
Maven->>Private: Resolve spark-rapids-private:26.02.0
Maven->>Hybrid: Resolve spark-rapids-hybrid:26.02.0
JNI-->>Maven: Provide released artifact
Private-->>Maven: Provide released artifact
Hybrid-->>Maven: Provide released artifact
Maven-->>Dev: Build with release dependencies
|
Collaborator
|
This change is for in advance review, draft it until the dependency jars are released. |
Collaborator
|
build |
gerashegalov
pushed a commit
to gerashegalov/spark-rapids
that referenced
this pull request
Feb 24, 2026
* Fix GpuHashAggregateExec outputPartitioning for aliased grouping keys [databricks] (NVIDIA#14264) Fixes NVIDIA#14262 ### Description GpuHashAggregateExec incorrectly reported outputPartitioning when grouping keys were aliased in result expressions. This caused incorrect results in queries with union operations on Spark 4.1+ (NDS query 66 mentioned in the bug). Created `GpuPartitioningPreservingUnaryExecNode` trait following Spark CPU's `PartitioningPreservingUnaryExecNode` pattern `GpuHashAggregateExec` now correctly handles both `GpuAlias` and `Alias` in resultExpressions `GpuProjectExecLike` also uses the trait, eliminating code duplication ### Testing Validated with NDS query 66 locally Added integration test that passes with fix, fails without fix ### Checklists <!-- Check the items below by putting "x" in the brackets for what is done. Not all of these items may be relevant to every PR, so please check only those that apply. --> - [ ] This PR has added documentation for new or modified features or behaviors. - [x] This PR has added new tests or modified existing tests to cover new code paths. (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.) - [ ] Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description. --------- Signed-off-by: Niranjan Artal <nartal@nvidia.com> * [DOC] update for download page 2602 release [skip ci] (NVIDIA#14241) update the download doc for 2602 release Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: liyuan <yuali@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> * Fix combining small files when reading Delta tables using multi-threaded reader (NVIDIA#14271) Fixes NVIDIA#14267. ### Description The multi-threaded reader currently has the `queryUsesInputFile` flag fixed to true for Delta. This was added in NVIDIA#13491 to disable the combining of small files when reading Delta tables with deletion vectors. However, how the logic is implemented is disabling it even when there is no deletion vector. This introduced a performance regression in 25.10. This PR fixes this bug. I tested the fix manually by running NDS in my workstation and comparing performance against the main branch without the fix (c125e89). Here is the result. | | NDS run time (ms) | |---|---| | without fix | 640000 | | with fix | 342000 | I'm not sure how to add an automated test for this fix. Welcome any suggestion. ### Checklists - [ ] This PR has added documentation for new or modified features or behaviors. - [ ] This PR has added new tests or modified existing tests to cover new code paths. (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.) - [x] Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description. --------- Signed-off-by: Jihoon Son <ghoonson@gmail.com> * Update dependency version JNI, private, hybrid to 26.02.0 (NVIDIA#14254) Wait for the pre-merge CI job to SUCCEED Signed-off-by: nvauto <70000568+nvauto@users.noreply.github.com> * Update changelog for the v26.02 release [skip ci] (NVIDIA#14255) Update change log with CLI: scripts/generate-changelog --token=<GIT_TOKEN> --releases=25.12,26.02 --------- Signed-off-by: nvauto <70000568+nvauto@users.noreply.github.com> * Prepare spark-rapids release v26.02.0-rc Signed-off-by: nvauto <70000568+nvauto@users.noreply.github.com> * Preparing spark-rapids development version 26.02.0-SNAPSHOT Signed-off-by: nvauto <70000568+nvauto@users.noreply.github.com> * Fetch the PRs for each commit in parallel to generate the CHANGELOG [skip ci] (NVIDIA#14266) To fix NVIDIA#14265 Fetch the PRs using multiple threads to significantly reduce the overall execution time. ``` -------Current time cost: 210s---------- real 3m28.977s user 0m13.895s sys 0m0.153s ---------Time cost after the change: 15s-------- real 0m14.019s user 0m18.916s sys 0m0.258s ``` --------- Signed-off-by: Tim Liu <timl@nvidia.com> * Support BinaryType in GetArrayStructFields (NVIDIA#14277) Fixes NVIDIA#14276. ### Description This PR adds binary type check for GetArrayStructFields. The original code already supported it and this PR is enabling it. ### Checklists - [x] This PR has added documentation for new or modified features or behaviors. - [x] This PR has added new tests or modified existing tests to cover new code paths. (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.) - [ ] Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description. Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Exclude pythonUDF cases in SubquerySuite (NVIDIA#14280) Contribute to NVIDIA#14258. ### Description The PythonUDF tests in Subquery failed. Need more investigation to fix it. For now, just ignore the cases to pass the nightly CI.. Signed-off-by: Gary Shen <gashen@nvidia.com> * Remove the cycle between next() and readBuffersToBatch() in MultiFileCloudPartitionReaderBase (NVIDIA#14284) Fixes NVIDIA#14194 ### Description There is currently a cycle between `next()` and `readBuffersToBatch()` in `MultiFileCloudPartitionReaderBase` as they can call each other. This is not a good convention as it can cause the stack to grow too large in some cases as reported in NVIDIA#14194. This PR fixes it so that only `next()` can call `readBuffersToBatch()`. I intentionally did not touch any logic in `MultiFileCloudPartitionReaderBase` besides the call pattern of these functions. The changes can be summarized as - Removed the `next()` call in `readBuffersToBatch()`. - Add a while loop with a `continue` flag in `next()` and put the main logic in the while loop. I manually tested the fix with a TPC-DS customer table at sf=1 with 1000 empty parquet files. The empty parquet files are valid parquet files as it has the valid footer. But they do not have actual data. I verified that a simple query I used runs successfully with this fix, but fails without the fix. ### Checklists - [ ] This PR has added documentation for new or modified features or behaviors. - [ ] This PR has added new tests or modified existing tests to cover new code paths. (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.) - [ ] Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description. Signed-off-by: Jihoon Son <ghoonson@gmail.com> * Filter out blank lines when reading CSV [databricks] (NVIDIA#14281) Spark CPU's `CSVExprUtils.filterCommentAndEmpty` uses `line.trim.nonEmpty` to filter out blank lines when reading CSV. Java's `String.trim()` strips all chars ≤ `\u0020`, which includes control characters like `\x10`, `\x0e`, `\x17`, `\x0f`. The GPU CSV reader had two issues: 1. `CSVPartitionReader` used `HostLineBuffererFactory` which does not filter empty lines. Changed to use a new `FilterCsvEmptyHostLineBuffererFactory` that filters blank lines. 2. The shared `LineBufferer.isWhiteSpace` only checked for space/tab/cr/lf, which doesn't match Java's `String.trim()` behavior. Instead of modifying the shared method (which would affect the JSON reader), added a new CSV-specific factory `FilterCsvEmptyHostLineBuffererFactory` that overrides `isWhiteSpace` to `(b & 0xFF) <= 0x20`, matching Java's `String.trim()`. This keeps the JSON reader (`FilterEmptyHostLineBuffererFactory`) unchanged and avoids any regression there. **MultiLine mode handling:** Empty line filtering is only applied when `multiLine` is false. In `multiLine` mode, the file is split into physical lines by `HadoopFileLinesReader`, but empty physical lines can be legitimate data inside quoted fields (e.g., a CSV value `"integer\n\n\n"` produces empty lines between the quotes). Filtering them would corrupt the data. Spark CPU handles this differently — it uses the Univocity parser's stream-based parsing for multiLine which never goes through `filterCommentAndEmpty`. So we skip the filtering for multiLine to match CPU behavior. Added a test with a prepared CSV file containing control-char-only blank lines to verify GPU matches CPU behavior. Signed-off-by: Chong Gao <res_life@163.com> --------- Signed-off-by: Chong Gao <res_life@163.com> Co-authored-by: Chong Gao <res_life@163.com> * Exclude PythonUDF cases in JoinSuite (NVIDIA#14287) Exclude python udf test case in JoinSuites. Related to NVIDIA#14258 Signed-off-by: Gary Shen <gashen@nvidia.com> * Predicate pushdown for deletion vectors (NVIDIA#14260) Fixes NVIDIA#14259. ### Description The new cuDF [deletion vector APIs](rapidsai/cudf#19237) offer processing deletion vectors in the scan. This makes the filter exec in the query plan redundant as explained in NVIDIA#14259. This PR adds a new rule for the physical plan to completely push the deletion vector predicate to scan and remove the redundant filter exec. The new rule inspects the plan whether it has a filter exec with the deletion vector predicate. If such a filter exec is found, that filter exec is completely removed from the plan. This new rule hasn't been wired up yet, and will be in some PR later when cuDF APIs are wired up and ready to use. There is no test added in this PR though I manually tested the new rule. Some integration tests will be added in some future PR to test the rule. ### Checklists - [ ] This PR has added documentation for new or modified features or behaviors. - [ ] This PR has added new tests or modified existing tests to cover new code paths. (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.) - [ ] Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description. --------- Signed-off-by: Jihoon Son <ghoonson@gmail.com> * xfail test_get_json_object_quoted_question on Dataproc (NVIDIA#14298) Fixes NVIDIA#14290 ### Description Dataproc 2.2 (image 2.2.75-ubuntu22) backported SPARK-46761 into its Spark 3.5.3, which allows `?` in quoted JSON path names. The spark-rapids pre-4.0 shim intentionally excludes`?` from the named-path regex to match vanilla Apache Spark 3.x behavior. This causes a GPU vs CPU mismatch only on Dataproc where the CPU now accepts `?`. The proper fix already exists in the spark400 shim (PR NVIDIA#13104). We do not have a Dataproc-specific shim, so we cannot apply the fix only for Dataproc without breaking vanilla Apache Spark 3.5.3 where the CPU still rejects `?`. We will continue with the existing Apache Spark 3.5.3 behavior and xfail the test on Dataproc. - xfail `test_get_json_object_quoted_question` when `is_dataproc_runtime()` ### Checklists - [ ] This PR has added documentation for new or modified features or behaviors. - [x] This PR has added new tests or modified existing tests to cover new code paths. (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.) - [ ] Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description. Signed-off-by: Niranjan Artal <nartal@nvidia.com> * Use dynamic shared buffer size in KudoSerializedBatchIterator to reduce memory waste (NVIDIA#14288) ## Summary The current `KudoSerializedBatchIterator` allocates a fixed 20MB shared buffer when batches are small (<1MB). This can lead to significant memory waste in scenarios with many shuffle blocks and small batches. **Problem**: When `spark.sql.shuffle.partitions` is large (e.g., 1000), each reduce task reads many shuffle blocks. Each block creates a `KudoSerializedBatchIterator`, and if batches are small, each allocates a 20MB shared buffer. However, if the actual batch data is only ~100-200KB, 20MB is vastly over-allocated. Since `HostCoalesceIterator` holds slices of these buffers (and `slice()` increments ref count), the underlying 20MB cannot be freed until all slices are released. This causes excessive host memory usage and spill. **Fix**: - Reduce sample count from 10 to 5 batches (faster decision) - Dynamically calculate buffer size: `min(20MB, max(1MB, avgBatchSize * 10))` - For small batch scenarios (avg ~180KB), buffer drops to ~1.8MB instead of 20MB **Verification**: TPC-DS query93 with 1000 partitions on EMR-EKS showed host memory peak reduced from 75GB to 40GB, and spills eliminated (489 -> 0). ## Test plan - [x] Existing unit tests pass - [x] Verified with TPC-DS query93 on EMR-EKS with 1000 shuffle partitions - confirmed reduced host memory pressure and eliminated unnecessary spill Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> Co-authored-by: Cursor <cursoragent@cursor.com> * Update pom to use 26.04-SNAPSHOT, remove build.log and update CONTRIBUTING.md Signed-off-by: Niranjan Artal <nartal@nvidia.com> * Update license header Signed-off-by: Niranjan Artal <nartal@nvidia.com> --------- Signed-off-by: Niranjan Artal <nartal@nvidia.com> Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: Jihoon Son <ghoonson@gmail.com> Signed-off-by: nvauto <70000568+nvauto@users.noreply.github.com> Signed-off-by: Tim Liu <timl@nvidia.com> Signed-off-by: Haoyang Li <haoyangl@nvidia.com> Signed-off-by: Gary Shen <gashen@nvidia.com> Signed-off-by: Chong Gao <res_life@163.com> Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> Co-authored-by: Jenkins Automation <70000568+nvauto@users.noreply.github.com> Co-authored-by: liyuan <84758614+nvliyuan@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Jihoon Son <ghoonson@gmail.com> Co-authored-by: Peixin Li <pxLi@nyu.edu> Co-authored-by: Tim Liu <timl@nvidia.com> Co-authored-by: Haoyang Li <haoyangl@nvidia.com> Co-authored-by: Gary Shen <gashen@nvidia.com> Co-authored-by: Chong Gao <chongg@nvidia.com> Co-authored-by: Chong Gao <res_life@163.com> Co-authored-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Wait for the pre-merge CI job to SUCCEED