Apache Iceberg version
main (development)
Please describe the bug 🐞
Snapshot OVERWRITE operation can calculate the wrong summary fields when the table is partially updated.
update_snapshot_summaries assumes that all OVERWRITE operations are full table overwrite
|
truncate_full_table=self._operation == Operation.OVERWRITE, |
|
if truncate_full_table and summary.operation == Operation.OVERWRITE and previous_summary is not None: |
|
summary = _truncate_table_summary(summary, previous_summary) |
This is likely an oversight when we implemented partial write.
Thankfully the table/transaction's overwrite function is currently implemented as a delete+append.
The only place where OVERWRITE operation is used is during partial deletes.
|
with self.update_snapshot(snapshot_properties=snapshot_properties).overwrite() as overwrite_snapshot: |
Original thread apache/iceberg-go#356 (comment) (thanks @arnaudbriche and @zeroshade )
Partial overwrite reproduced in #1840
Willingness to contribute
Apache Iceberg version
main (development)
Please describe the bug 🐞
Snapshot
OVERWRITEoperation can calculate the wrong summary fields when the table is partially updated.update_snapshot_summariesassumes that allOVERWRITEoperations are full table overwriteiceberg-python/pyiceberg/table/update/snapshot.py
Line 239 in 322ebdd
iceberg-python/pyiceberg/table/snapshots.py
Lines 358 to 359 in 322ebdd
This is likely an oversight when we implemented partial write.
Thankfully the table/transaction's
overwritefunction is currently implemented as a delete+append.The only place where
OVERWRITEoperation is used is during partial deletes.iceberg-python/pyiceberg/table/__init__.py
Line 678 in 322ebdd
Original thread apache/iceberg-go#356 (comment) (thanks @arnaudbriche and @zeroshade )
Partial overwrite reproduced in #1840
Willingness to contribute