Skip to content

HIVE-29554: Refactor MTableColumnStatistics and MPartitionColumnStatistics to avoid code duplication#6420

Open
thomasrebele wants to merge 1 commit intoapache:masterfrom
thomasrebele:tr/HIVE-29554
Open

HIVE-29554: Refactor MTableColumnStatistics and MPartitionColumnStatistics to avoid code duplication#6420
thomasrebele wants to merge 1 commit intoapache:masterfrom
thomasrebele:tr/HIVE-29554

Conversation

@thomasrebele
Copy link
Copy Markdown
Contributor

HIVE-29554

What changes were proposed in this pull request?

Extract a common superclass MColumnStatistics from MPartitionColumnStatistics and MColumnStatistics and combine some methods that use the latter two.

Why are the changes needed?

There is a lot of code duplication around column statistics, which increases the risk of diverging code and partially applied fixes.

Does this PR introduce any user-facing change?

No

How was this patch tested?

  1. I've run Hive from the master branch (13f3208), created a table with some values, and executed analyze table tab1 compute statistics for columns;. I verified the statistics with describe formatted tab1 a;.
  2. I stopped Hive, and run Hive with the changes of the PR. I checked that the statistics are shown correctly. I inserted more values into the table and executed the ANALYZE TABLE ... command. I verified the statistics.
  3. I stopped Hive, and run Hive from the master branch again. I verified the statistics. I inserted more values into the table and executed the ANALYZE TABLE ... command. I verified the statistics.

I started and run the DESCRIBE FORMATTED, ANALYZE TABLE, DESCRIBE FORMATTED sequence for

  • a metastore server with the changes from the PR, and
  • a hiveserver and beeline from the master branch

and vice versa, i.e.,

  • a metastore server from the master branch, and
  • a hiveserver and beeline with the changes from the PR

All commands worked as expected.

@sonarqubecloud
Copy link
Copy Markdown

this.avgColLen = avgColLen;
}

public void setDateStats(Long numNulls, Long numNDVs, byte[] bitVector, byte[] histogram, Long lowValue, Long highValue) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are Sonar warnings about the length of the line. The original methods in MPartitionColumnStatistics and MTableColumnStatistics had them as well. In case I make an update to the PR, I'll fix it, otherwise I would just leave that line (and setTimestampStats) as-is.

Copy link
Copy Markdown
Contributor Author

@thomasrebele thomasrebele Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sonar warns about the "cognitive complexity" limit. The limit has been exceeded in the original implementation as well. I think refactoring the methods is out-of-scope for this PR, so I'll just leave them as-is.

@thomasrebele thomasrebele marked this pull request as ready for review April 13, 2026 14:40
@dengzhhu653 dengzhhu653 requested a review from Copilot April 15, 2026 02:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors metastore column-statistics persistence models to remove duplicated fields/methods by introducing a shared MColumnStatistics superclass, and consolidates converter logic to operate on the common base type.

Changes:

  • Introduce MColumnStatistics as a shared superclass for table/partition column stats and move common fields/methods there.
  • Update JDO metadata to model inheritance and remove duplicated field mappings from the subclasses.
  • Consolidate conversion/copy logic in StatObjectConverter and update call sites to use the unified APIs.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
standalone-metastore/metastore-server/src/main/resources/package.jdo Adds MColumnStatistics JDO class with subclass-table inheritance and removes duplicated field mappings from table/partition stats classes.
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java Makes table column stats extend MColumnStatistics, keeping only the table reference.
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java Makes partition column stats extend MColumnStatistics, keeping only the partition reference.
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/model/MColumnStatistics.java New shared superclass containing the common stats fields and helper setters/getters.
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java Reuses shared copy/merge logic via MColumnStatistics and unifies get*ColumnStatisticsObj into getColumnStatisticsObj.
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java Updates callers to use StatObjectConverter.getColumnStatisticsObj(...) for both table and partition stats.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants