Skip to content

to_geotiff: support predictor=3 (floating-point predictor) on CPU write path #1313

@brendancol

Description

@brendancol

Summary

xrspatial.geotiff.to_geotiff accepts predictor: bool, which maps only to TIFF predictor 2 (horizontal differencing). TIFF predictor 3 (floating-point predictor) is not reachable from the CPU write path even though all the building blocks already exist in the codebase.

For float32 / float64 rasters (elevation, climate, model output), predictor=3 typically produces noticeably better deflate/zstd ratios than predictor=2 because it byte-swizzles before differencing. Code in the wild that interoperates with rasterio/GDAL routinely uses predictor=3 for float TIFFs, and the current API forces that workflow back to rasterio.

Current state

Public API at xrspatial/geotiff/__init__.py:402 exposes predictor: bool. The writer hardcodes:

# xrspatial/geotiff/_writer.py:597
pred_val = 2 if (predictor and compression != COMPRESSION_NONE) else 1

Same pattern at _writer.py:1169 for the GPU writer's tag emission.

What already exists

  • CPU encoder: fp_predictor_encode in xrspatial/geotiff/_compression.py:499
  • GPU encoder kernel: _fp_predictor_encode_kernel in xrspatial/geotiff/_gpu_decode.py:1688 (already wired into the GPU encode path at _gpu_decode.py:2260-2264)
  • CPU decoder: fp_predictor_decode (used at _reader.py:271)
  • GPU decoder: _fp_predictor_decode_kernel (used at _gpu_decode.py:1345, 1592)

So the read side and the GPU write side already round-trip predictor=3. The CPU writer just never calls fp_predictor_encode.

Proposed change

Widen the public arg to accept the predictor value directly:

predictor: bool | int = False
  • False / 0 → no predictor (current default)
  • True / 2 → horizontal differencing (current True behavior; preserved)
  • 3 → floating-point predictor; valid only for float dtypes

Inside the writers:

  • Branch on pred_val when calling the encoder (predictor_encode for 2, fp_predictor_encode for 3) at _writer.py:339, _writer.py:405, _writer.py:1040.
  • Emit the chosen pred_val in the TIFF tag at _writer.py:597 and _writer.py:1169 rather than forcing 2.
  • Validate: raise if predictor=3 is requested with an integer dtype.

Acceptance

  • New tests for CPU write + read round-trip of float32 and float64 data with predictor=3 under deflate and zstd.
  • predictor=3 + integer dtype raises a clear error.
  • Existing predictor=True tests stay green (semantics unchanged).
  • File written with predictor=3 is byte-readable by GDAL/rasterio (compare against a rasterio reference).

Out of scope

JPEG/LZW interactions; multi-band float predictor=3 is already validated on the read side (#1247) but new write tests should cover it explicitly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions