Skip to content

Ext_proc filter#12792

Open
kannanjgithub wants to merge 388 commits into
grpc:masterfrom
kannanjgithub:ext-proc
Open

Ext_proc filter#12792
kannanjgithub wants to merge 388 commits into
grpc:masterfrom
kannanjgithub:ext-proc

Conversation

@kannanjgithub
Copy link
Copy Markdown
Contributor

@kannanjgithub kannanjgithub commented May 4, 2026

Implements ext_proc filter from A93 (internal design doc)
Metrics aren't implemented yet.
Includes commits from unmerged Filter API enhancements and channel caching PRs.
Only the ExternaProcessingFilter.java, ExternaProcessingFilterTest.java and the envoy xds proto import and generated code need to be reviewed.
Rebasing commit history caused all received and merged commits to show my name as the committer, ignore all commits for which I'm not shown as the author.

…xtProcStreamSendsMetadata to use real dataPlaneChannel
…sHeadersAndCallIsBuffered to use real dataPlaneChannel
…henMutationsAreAppliedAndCallIsActivated to use real dataPlaneChannel
…sActivatedImmediately to use real dataPlaneChannel and ServerInterceptor
…henMessagesAreDiscarded to use real dataPlaneChannel
…ExtProcAndSuperHalfCloseIsDeferred to use real dataPlaneChannel
…nSuperHalfCloseIsCalled to use real dataPlaneChannel
…stsAreForwardedImmediately to use real dataPlaneChannel
…sHeadersAndCallIsBuffered to use real dataPlaneChannel
…henMutationsAreAppliedAndCallIsActivated to use real dataPlaneChannel
…sActivatedImmediately to use real dataPlaneChannel
…henMutatedBodyIsForwardedToDataPlane to use real dataPlaneChannel
…henMutatedBodyIsForwardedToDataPlane to use real dataPlaneChannel
…henMessagesAreDiscarded to use real dataPlaneChannel
…ExtProcAndSuperHalfCloseIsDeferred to use real dataPlaneChannel
…nSuperHalfCloseIsCalled to use real dataPlaneChannel
…thenMutatedBodyIsDeliveredToClient to use real dataPlaneChannel
…thenClientListenerCloseIsPropagated to use real dataPlaneChannel
@kannanjgithub kannanjgithub marked this pull request as draft May 4, 2026 11:39
# Conflicts:
#	xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java
@kannanjgithub kannanjgithub marked this pull request as ready for review May 4, 2026 12:39
@kannanjgithub kannanjgithub requested a review from sauravzg May 4, 2026 12:39
Fix style and warning as errors.
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java
@sauravzg
Copy link
Copy Markdown
Collaborator

sauravzg commented May 6, 2026

Reviewed approximately ~500 LOC of source(not tests), which currently mostly only covers the config part of things. Sending comments incrementally to kick start progress, since it seems that I'll need probably about 3 days to review all of source.

Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
@sauravzg
Copy link
Copy Markdown
Collaborator

sauravzg commented May 7, 2026

Reviewed about another 300 LOC, which covers static utilities and the interceptors. Speed was slower than expected, but will try to catch up.

Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java
@kannanjgithub kannanjgithub force-pushed the ext-proc branch 3 times, most recently from c75ba11 to 64010fd Compare May 8, 2026 16:13
private long serverTrailersStartNanos;

private volatile Metadata requestHeaders;
final AtomicBoolean activated = new AtomicBoolean(false);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using an atomic reference enum with a state machine driver here?

The current set of 10 atomic bools, results in potentially 2^10 representable states. I don't think we have that many and should try to compress the state machine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced state machine states now:

private enum ExtProcStreamState {
  ACTIVE,
  DRAINING,
  COMPLETED,
  FAILED
}

private enum DataPlaneCallState {
  IDLE,
  ACTIVE,
  CLOSED
}

this.backendService = checkNotNull(backendService, "backendService");
}

private void activateCall() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Somewhat echoing go/java-practices/class-structure - Maybe we should try to put the public member that uses a private method before the private method to enhance readability.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make more sense when there is a single caller of the private methods. But there are many calling sites for activateCall.

}
}

private boolean checkCompressionSupport(BodyResponse bodyResponse) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function name feels slightly misleading. Given that it starts with check and returns a boolean, I'd expect it to not have any side effects, but it does.

My rec would be to either rename this method appropriately (optionally documenting the side effects) or split this method to reflect check and side effect behavior separately.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to validateCompressionSupport and added Javadoc to explain the side-effects if validation fails.

StatusRuntimeException ex = Status.UNAVAILABLE
.withDescription("gRPC message compression not supported in ext_proc")
.asRuntimeException();
if (!extProcStreamCompleted.get() && extProcClientCallRequestObserver != null) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This member seems to be synchronized in other places. Should we have it here as well or is this a conscious decision?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, it needs to be synchronized as well. Done.

}
headersToModify.add(HeaderValueOption.create(
headerValue,
HeaderValueOption.HeaderAppendAction.valueOf(protoOption.getAppendAction().name()),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be slightly risky IMO.
I am not sure we want to couple our internal enum to xds proto enum, so I'd prefer an explicit switch case if possible.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valueOf will fail with an IllegalArgumentException if the enum with the specified name is not present, so it will be caught by unit test failures if a mismatch occurs in the future.
Besides, with backward compatibility in mind, enum names are not subject to change.

@Override
public void beforeStart(ClientCallStreamObserver<ProcessingRequest> requestStream) {
synchronized (streamLock) {
extProcClientCallRequestObserver = requestStream;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of things here.

  • It seems like we aren't respecting control flow before calling on next for the stream and processing pending buffer
  • It seems like this is dead code. BeforeStart is called synchronously so pendingRequests should in theory always be empty.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why buffering is required - Buffering is still required when grpc creates a DelayedClientCall for the ext_proc call: While the interceptor execution starts on the application thread, the extProcStub is created using a channel from cachedChannelManager.getChannel(...). If the external processor's gRPC channel is not yet connected, or name resolution/load balancing is still in progress, gRPC-Java wraps the call in a DelayedClientCall. A DelayedClientCall buffers and delays the actual stream start. In this case, stub.process(...) will return before the stream is created and before beforeStart() is called.
The application thread can resume execution and can immediately start calling sendMessage() or halfClose().
At this point, extProcClientCallRequestObserver is still null. Without the pendingProcessingRequests buffer, all request headers, body messages, and half-closes would be completely lost or trigger a crash.

About respecting ext_proc flow control:
The isReady behavior of the data plane call uses also the ext_proc call's isReady state, and it would apply in this buffering case as well. In fact the any buffering into pendingProcessingRequests will already be a violation of flow control by the calling application because isReady would be returning false in that period because extProcClientCallRequestObserver would be null.

public void beforeStart(ClientCallStreamObserver<ProcessingRequest> requestStream) {
synchronized (streamLock) {
extProcClientCallRequestObserver = requestStream;
while (!pendingProcessingRequests.isEmpty()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questiosns on of why we need pendingProcessing.... The design doc seems to mention a couple of cases

  • observability mode ext proc slowness
  • request drain.

But the grfc for observability mode requests flow control over buffering: https://github.com/markdroth/proposal/blob/6ee533dac7d9d1b71ad46ba8826d2a3b3cdba313/A93-xds-ext-proc.md#flow-control . This should be somewhat easy to enforce, where we push problem to ext proc server, if our caller doesn't respect flow control.

Similarly, flow control is recommended solution for draining as well at https://github.com/markdroth/proposal/blob/6ee533dac7d9d1b71ad46ba8826d2a3b3cdba313/A93-xds-ext-proc.md#early-termination-of-ext_proc-stream . I don't really have a good solution for this if the caller doesn't respect flow control, we'll have to buffer to avoid out of order messages. Maybe let's discuss with Mark on the grfc about this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flow control is already implemented via isReady and onReady overriding of the dataplane rpc to take into account ext_proc stream's existence and readiness.
I have now also implemented buffering request messages during the ext_proc draining time period.

.asRuntimeException());
return;
}
Metadata target = wrappedListener.trailersOnly.get()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encapsulation instead of member access.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}
// 3. Client Trailers
else if (response.hasRequestTrailers()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be handling this? The grfc doesn't talk about this.

Would this also make proceedWithClose dead code?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.
There are other usages of proceedWithClose.

if (extProcStreamCompleted.compareAndSet(false, true)) {
synchronized (streamLock) {
if (extProcClientCallRequestObserver != null) {
extProcClientCallRequestObserver.onError(t);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education is this valid behavior?

I'd assume, when the stream has error, it calls our onError , but it seems we are propagating it back to the request stream. Am I missing something here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you are right, it is both unnecessary and illegal since the call is already closed. Removed it now.
I have now introduced a new private method internalOnError for cases that do require calling onError on the ext_proc request observer, such as during ext_proc protocol violations.

@sauravzg
Copy link
Copy Markdown
Collaborator

sauravzg commented May 11, 2026

Reviewed the extproc response handling. Will be reviewing rest of the client call private methods today.

…lers

- Separate internally triggered protocol errors from server-side gRPC stream
  onError callbacks.
- Introduce `internalOnError(Throwable)` to propagate client cancellation
  only when the error is triggered internally by the client filter.
- Avoid calling `extProcClientCallRequestObserver.onError(t)` inside the
  StreamObserver's `onError(t)` callback since the stream is already
  terminated by the gRPC framework.
- Remove unused and dead `response.hasRequestTrailers()` handling block.
… machines

Migrates the ExternalProcessorFilter's state management in DataPlaneClientCall
from 10+ independent AtomicBoolean flags (such as activated, notifiedApp,
extProcStreamCompleted, extProcStreamFailed, and drainingExtProcStream) to two
disciplined AtomicReference state machines. This eliminates invalid concurrent
state combinations, prevents race conditions during cleanup, and simplifies
overall lifecycle readability.

- Introduced ExtProcStreamState (IDLE, DRAINING, COMPLETED, FAILED) and
  DataPlaneCallState (IDLE, ACTIVE, CLOSED) enums.
- Implemented atomic CAS transition helpers (markExtProcStreamCompleted,
  markExtProcStreamFailed, markDataPlaneCallClosed) and query methods.
- Updated activateCall, internalOnError, and onClose callbacks to coordinate
  lifecycle events and seamlessly support fail-open (failureModeAllow) behavior
  without redundant cancellations.
1. Introduced buffering queue: Added pendingDrainingMessages inside DataPlaneClientCall.
2. Buffered during draining: Updated sendMessage(InputStream message) to detect if isExtProcStreamDraining() is true, buffering any outgoing application messages into pendingDrainingMessages rather than forwarding them to ext_proc.
3. Drained upon completion: Added drainPendingDrainingMessages() which drains the queue directly to the upstream raw call. This method is invoked synchronously during stream completion (handleFailOpen).

Added unit test givenDrainingStream_whenAppSends_thenBufferedAndDelivered. This test verifies that:
1. Outgoing data plane messages sent while the ext_proc stream is in the DRAINING state are not forwarded to the ext_proc observer.
2. Once the ext_proc stream reaches completion, all buffered messages are successfully drained and received by the upstream data plane receiver.
# Conflicts:
#	xds/src/main/java/io/grpc/xds/Filter.java
#	xds/src/test/java/io/grpc/xds/StatefulFilter.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.