Skip to content

[bug] partial OVERWRITE operation writes the wrong snapshot summary metrics #1845

@kevinjqliu

Description

@kevinjqliu

Apache Iceberg version

main (development)

Please describe the bug 🐞

Snapshot OVERWRITE operation can calculate the wrong summary fields when the table is partially updated.

update_snapshot_summaries assumes that all OVERWRITE operations are full table overwrite

truncate_full_table=self._operation == Operation.OVERWRITE,

if truncate_full_table and summary.operation == Operation.OVERWRITE and previous_summary is not None:
summary = _truncate_table_summary(summary, previous_summary)

This is likely an oversight when we implemented partial write.

Thankfully the table/transaction's overwrite function is currently implemented as a delete+append.

The only place where OVERWRITE operation is used is during partial deletes.

with self.update_snapshot(snapshot_properties=snapshot_properties).overwrite() as overwrite_snapshot:

Original thread apache/iceberg-go#356 (comment) (thanks @arnaudbriche and @zeroshade )

Partial overwrite reproduced in #1840

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions