HIVE-29554: Refactor MTableColumnStatistics and MPartitionColumnStatistics to avoid code duplication#6420
HIVE-29554: Refactor MTableColumnStatistics and MPartitionColumnStatistics to avoid code duplication#6420thomasrebele wants to merge 1 commit intoapache:masterfrom
Conversation
4afea6e to
a26c6ed
Compare
…stics to avoid code duplication
a26c6ed to
3137af8
Compare
|
| this.avgColLen = avgColLen; | ||
| } | ||
|
|
||
| public void setDateStats(Long numNulls, Long numNDVs, byte[] bitVector, byte[] histogram, Long lowValue, Long highValue) { |
There was a problem hiding this comment.
There are Sonar warnings about the length of the line. The original methods in MPartitionColumnStatistics and MTableColumnStatistics had them as well. In case I make an update to the PR, I'll fix it, otherwise I would just leave that line (and setTimestampStats) as-is.
There was a problem hiding this comment.
Sonar warns about the "cognitive complexity" limit. The limit has been exceeded in the original implementation as well. I think refactoring the methods is out-of-scope for this PR, so I'll just leave them as-is.
There was a problem hiding this comment.
Pull request overview
Refactors metastore column-statistics persistence models to remove duplicated fields/methods by introducing a shared MColumnStatistics superclass, and consolidates converter logic to operate on the common base type.
Changes:
- Introduce
MColumnStatisticsas a shared superclass for table/partition column stats and move common fields/methods there. - Update JDO metadata to model inheritance and remove duplicated field mappings from the subclasses.
- Consolidate conversion/copy logic in
StatObjectConverterand update call sites to use the unified APIs.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| standalone-metastore/metastore-server/src/main/resources/package.jdo | Adds MColumnStatistics JDO class with subclass-table inheritance and removes duplicated field mappings from table/partition stats classes. |
| standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java | Makes table column stats extend MColumnStatistics, keeping only the table reference. |
| standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java | Makes partition column stats extend MColumnStatistics, keeping only the partition reference. |
| standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/model/MColumnStatistics.java | New shared superclass containing the common stats fields and helper setters/getters. |
| standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java | Reuses shared copy/merge logic via MColumnStatistics and unifies get*ColumnStatisticsObj into getColumnStatisticsObj. |
| standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java | Updates callers to use StatObjectConverter.getColumnStatisticsObj(...) for both table and partition stats. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.



HIVE-29554
What changes were proposed in this pull request?
Extract a common superclass MColumnStatistics from MPartitionColumnStatistics and MColumnStatistics and combine some methods that use the latter two.
Why are the changes needed?
There is a lot of code duplication around column statistics, which increases the risk of diverging code and partially applied fixes.
Does this PR introduce any user-facing change?
No
How was this patch tested?
analyze table tab1 compute statistics for columns;. I verified the statistics withdescribe formatted tab1 a;.I started and run the DESCRIBE FORMATTED, ANALYZE TABLE, DESCRIBE FORMATTED sequence for
and vice versa, i.e.,
All commands worked as expected.