Skip to content

PARQUET-1866: Replace Hadoop ZSTD with JNI-ZSTD#793

Merged
gszadovszky merged 7 commits intoapache:masterfrom
shangxinli:master_new
Jun 3, 2020
Merged

PARQUET-1866: Replace Hadoop ZSTD with JNI-ZSTD#793
gszadovszky merged 7 commits intoapache:masterfrom
shangxinli:master_new

Conversation

@shangxinli
Copy link
Copy Markdown
Contributor

Make sure you have checked all steps below.

Jira

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

@shangxinli shangxinli force-pushed the master_new branch 3 times, most recently from e5ebcac to 4c264fc Compare May 22, 2020 02:08
Comment thread parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/ZstdCodec.java Outdated
Comment thread parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestZstdCodec.java Outdated
@shangxinli
Copy link
Copy Markdown
Contributor Author

shangxinli commented May 26, 2020

@luben, Do you have time to review the code?

@luben
Copy link
Copy Markdown

luben commented May 26, 2020

LGTM

@shangxinli
Copy link
Copy Markdown
Contributor Author

@gszadovszky Do you have time for another look?

Comment thread parquet-hadoop/README.md Outdated
Comment thread parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestZstandardCodec.java Outdated
Comment thread parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestZstandardCodec.java Outdated
@dongjoon-hyun
Copy link
Copy Markdown
Member

Thank you, @shangxinli and all!
cc @dbtsai

@dbtsai
Copy link
Copy Markdown
Member

dbtsai commented Jun 2, 2020

+1 @shangxinli and thank you for this contribution.

This will allow users who are on order versions of hadoop that don't support native ZSTD to use ZSTD compression in Parquet, and also, users don't have to go through the very complicated hadoop native installation. For developers, we will be able to easily test this out in different local envs.

cc @rdblue

@dbtsai
Copy link
Copy Markdown
Member

dbtsai commented Jun 2, 2020

@shangxinli do we have benchmark comparing to native hadoop codec both in size and speed? Thanks.

Comment thread parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestZstandardCodec.java Outdated
Comment thread parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestZstandardCodec.java Outdated
@gszadovszky gszadovszky merged commit dc40b59 into apache:master Jun 3, 2020
@shangxinli
Copy link
Copy Markdown
Contributor Author

@shangxinli do we have benchmark comparing to native hadoop codec both in size and speed? Thanks.

Hi @dbtsai, I didn't because I don't have Hadoop host installed with ZSTD. @luben, did you ever compare it with Hadoop ZSTD?

@luben
Copy link
Copy Markdown

luben commented Jun 5, 2020

@shangxinli : I haven't benchmarked

shangxinli added a commit to shangxinli/parquet-mr that referenced this pull request Jun 8, 2020
@shangxinli shangxinli deleted the master_new branch June 11, 2020 18:58
shangxinli added a commit to shangxinli/parquet-mr that referenced this pull request Jun 15, 2020
shangxinli added a commit to shangxinli/parquet-mr that referenced this pull request Jun 15, 2020
emmanuelguerin pushed a commit to criteo-forks/parquet-mr that referenced this pull request Sep 1, 2020
Original pull request: apache#793

Change-Id: Iccb6a643f06664a4626b0f5e219de6c907743b94
emmanuelguerin pushed a commit to criteo-forks/parquet-mr that referenced this pull request Sep 1, 2020
Original pull request: apache#793

Change-Id: Iccb6a643f06664a4626b0f5e219de6c907743b94
bluesheeptoken added a commit to bluesheeptoken/kafka-connect-storage-common that referenced this pull request May 4, 2023
To add zstd-jni and support zstd-jni compression easily in connectors.
cf: apache/parquet-java#793
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants