Skip to content

Geoid kernels in reproject/_vertical.py can hit numba's workqueue concurrency abort under dask #3142

@brendancol

Description

@brendancol

Follow-up to #3093 / PR #3111.

The macos-3.14 CI abort on PR #3111 came from launching numba parallel=True kernels concurrently from multiple host threads: numba's default workqueue threading layer is not threadsafe and terminates the process. PR #3111 serialized the launches in _projections.py (try_numba_transform, transform_points) behind a module lock.

_vertical.py has the same pattern and was left out of that PR to keep it focused:

  • _interp_geoid_batch and _interp_geoid_2d are @njit(parallel=True).
  • _apply_vertical_shift_dask wraps _apply_vertical_shift_numpy in map_blocks, so dask's threaded scheduler can run two blocks at once, each launching _interp_geoid_2d concurrently.
  • geoid_height / geoid_height_raster are public and can be called from user threads directly.

Nothing in the test suite currently triggers this (the vertical-shift dask tests appear to run with few enough concurrent blocks to dodge the race), but it is the same abort waiting to happen. Same fix shape as PR #3111: a module-level lock around the kernel launches, plus a subprocess regression test with NUMBA_THREADING_LAYER=workqueue forced.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions