Skip to content

refactor: change PartitionSpec to be independent of Schema#299

Merged
wgtmac merged 3 commits intoapache:mainfrom
HeartLinked:partition_spec_1
Nov 11, 2025
Merged

refactor: change PartitionSpec to be independent of Schema#299
wgtmac merged 3 commits intoapache:mainfrom
HeartLinked:partition_spec_1

Conversation

@HeartLinked
Copy link
Copy Markdown
Contributor

@HeartLinked HeartLinked commented Nov 7, 2025

  • Removing schema_ member from PartitionSpec, Making PartitionType() method require a Schema parameter instead of using the stored schema.
  • Add table_schema_ in ManifestEntryAdapter.
  • Fix bug in ManifestEntryAdapter::init(v1/v2/v3) to use table_schema in metadata_["schema"].
  • Fix error in related test.

Comment thread src/iceberg/json_internal.cc
Copy link
Copy Markdown
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main feedback is that we need to keep the schema parameter because we eventually need it to create bound partition spec.

Comment thread src/iceberg/json_internal.cc Outdated
/// \param[out] default_spec_id The default partition spec ID.
/// \param[out] partition_specs The list of partition specs.
Status ParsePartitionSpecs(const nlohmann::json& json, int8_t format_version,
const std::shared_ptr<Schema>& current_schema,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, let's keep it.

Comment thread src/iceberg/json_internal.cc Outdated
for (const auto& spec_json : spec_array) {
ICEBERG_ASSIGN_OR_RAISE(auto spec,
PartitionSpecFromJson(current_schema, spec_json));
ICEBERG_ASSIGN_OR_RAISE(auto spec, PartitionSpecFromJson(spec_json));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment thread src/iceberg/json_internal.cc
Comment thread src/iceberg/json_internal.cc Outdated
Comment thread src/iceberg/manifest_adapter.h Outdated
Comment thread src/iceberg/manifest_writer.cc Outdated
Comment thread src/iceberg/partition_spec.cc Outdated
Comment thread src/iceberg/partition_spec.h Outdated
Comment thread src/iceberg/table_scan.cc Outdated
std::shared_ptr<Schema> schema) {
if (fields_.empty()) {
return nullptr;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation allows using partition_type_ cache based on tableschema_, but the given schema parameter may belong to a different schema version, which could lead to correctness issues.
Suggest adding schema_id to the cache key to ensure consistency.

such as:
std::unordered_map<int32_t, std::shared_ptr> partition_type_cache_;

Comment thread src/iceberg/json_internal.cc
Comment thread src/iceberg/manifest_adapter.h Outdated
Comment thread src/iceberg/manifest_writer.cc Outdated
Comment thread src/iceberg/json_internal.cc
Comment thread src/iceberg/json_internal.h Outdated
Comment thread src/iceberg/manifest_writer.h Outdated
Comment thread src/iceberg/partition_spec.h Outdated
Comment thread src/iceberg/partition_spec.h Outdated
Comment thread src/iceberg/table_scan.cc Outdated
Comment thread src/iceberg/v1_metadata.cc Outdated
Comment thread src/iceberg/table_scan.cc Outdated
@wgtmac wgtmac changed the title refactor: decouple PartitionSpec from Schema refactor: change PartitionSpec to be independent of Schema Nov 10, 2025
Comment thread src/iceberg/manifest_writer.cc Outdated
Comment thread src/iceberg/manifest_writer.cc Outdated
Copy link
Copy Markdown
Collaborator

@zhjwpku zhjwpku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment thread src/iceberg/manifest_writer.cc
Comment thread src/iceberg/manifest_writer.cc
Comment thread src/iceberg/manifest_writer.cc Outdated
@wgtmac wgtmac merged commit bb02a15 into apache:main Nov 11, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants