Is your feature request related to a problem or challenge?
Spark users want to read data from a DataFusion TableProvider as a native Spark DataSourceV2. Today there is no first-class path; options are either a bespoke per-operation JNI surface (more native surface to maintain) or copying data out of process.
Describe the solution you'd like
A Spark DataSourceV2 connector that places the native boundary at a standard ADBC driver. Spark talks to the upstream arrow-adbc Java driver manager (adbc-core + adbc-driver-jni), which loads a native DataFusion ADBC cdylib and returns arrow-java ArrowReaders consumed zero-copy as ArrowColumnVectors on the cluster-provided Arrow. This reuses the upstream ADBC bindings rather than reproducing them.
Scope:
adbc-datafusion format registered as a DataSourceV2; schema probed on the driver.
- Projection / filter / limit pushdown via Substrait, with a SQL fallback.
- Multi-partition reads (
executePartitioned / readPartition) and a target_partitions option.
- Per-executor connection pool to amortize driver/database setup across task slots.
- An example DataFusion ADBC driver cdylib plus end-to-end (PySpark) coverage.
Describe alternatives you've considered
A plain-C scan ABI + hand-written JNI shim (discussed on #103 / #104). The ADBC approach reuses standard, separately-reviewed bindings and a stable driver contract instead.
Additional context
Implemented in #111.
Is your feature request related to a problem or challenge?
Spark users want to read data from a DataFusion
TableProvideras a native SparkDataSourceV2. Today there is no first-class path; options are either a bespoke per-operation JNI surface (more native surface to maintain) or copying data out of process.Describe the solution you'd like
A Spark
DataSourceV2connector that places the native boundary at a standard ADBC driver. Spark talks to the upstream arrow-adbc Java driver manager (adbc-core+adbc-driver-jni), which loads a native DataFusion ADBC cdylib and returns arrow-javaArrowReaders consumed zero-copy asArrowColumnVectors on the cluster-provided Arrow. This reuses the upstream ADBC bindings rather than reproducing them.Scope:
adbc-datafusionformat registered as aDataSourceV2; schema probed on the driver.executePartitioned/readPartition) and atarget_partitionsoption.Describe alternatives you've considered
A plain-C scan ABI + hand-written JNI shim (discussed on #103 / #104). The ADBC approach reuses standard, separately-reviewed bindings and a stable driver contract instead.
Additional context
Implemented in #111.