Skip to content

fix(RosActionNode): swallow UnknownGoalHandleError in cancelGoal#129

Open
falfab wants to merge 1 commit intoBehaviorTree:humblefrom
falfab:fix/cancel-goal-unknown-goal-handle-race
Open

fix(RosActionNode): swallow UnknownGoalHandleError in cancelGoal#129
falfab wants to merge 1 commit intoBehaviorTree:humblefrom
falfab:fix/cancel-goal-unknown-goal-handle-race

Conversation

@falfab
Copy link
Copy Markdown

@falfab falfab commented Apr 29, 2026

Summary

RosActionNode::cancelGoal() calls async_get_result() and async_cancel_goal() on the rclcpp_action client without exception handling. When the action server completes the cancel handshake before the BT thread reaches those calls, the client's result callback has already erased the goal from its internal registry (goal_handles_.erase(...) inside the make_result_aware lambda in rclcpp_action/client.hpp). Both calls then throw rclcpp_action::exceptions::UnknownGoalHandleError ("Goal handle is not known to this client."), which escapes halt(), lands in TreeExecutionServer's top-level catch(const std::exception&), and causes the outer /bt_execution goal to be aborted instead of canceled. Downstream code that distinguishes SUCCESS / CANCELED / ABORTED ends up with the wrong signal on a clean operator cancel.

Closes #18. Also addresses the duplicate report in #110.

Change

Wrap the three calls in cancelGoal() with a narrow try { ... } catch (const rclcpp_action::exceptions::UnknownGoalHandleError& e) { RCLCPP_DEBUG(...) }.

The catch is intentionally narrow:

  • Only UnknownGoalHandleError is caught. Any other exception coming out of rclcpp_action (network-layer RCL errors, invalid handle, etc.) is still propagated. Those are real bugs and should continue to fail loudly.
  • The pre-accept path is untouched. @robin-mueller raised a legitimate concern in Cancellation of unkown goal not detectable #57 / Fix cancel goal with empty goal handle #53 that users may want to observe cancelGoal() failing when the goal response has not arrived yet. That case goes through the existing if (!goal_handle_) branch (RCLCPP_WARN("cancelGoal called on an empty goal_handle")), which this PR does not change. The new catch only fires on the goal_handle_-populated path, i.e. after the server has already terminated the goal.

Test

Adds a gtest regression under behaviortree_ros2/test/ that reproduces the race:

  • Fake action server with a detached thread that waits up to ~50 ms for a cancel request, then synchronously calls goal_handle->canceled(result).
  • BT leaf subclass of RosActionNode<Sleep>.
  • Test ticks the tree until the server has accepted the goal, then calls tree.haltTree() and asserts no exception leaks.

Verified by stashing the source fix and re-running the test: without the patch it terminates with "Goal handle is not known to this client.", the exact throw this PR fixes. With the patch it passes in ~220 ms.

This is also the first gtest added to the repo. Happy to adjust naming / layout conventions if you have a preference.

Prior art

When the action server completes the cancel handshake before the BT
thread reaches async_get_result / async_cancel_goal, the rclcpp_action
client's result callback has already erased the goal handle from its
internal registry. Both calls then throw UnknownGoalHandleError, which
propagates out of halt() and causes TreeExecutionServer to abort() the
outer goal instead of canceling it.

Catch only UnknownGoalHandleError so other rclcpp_action errors
(network, invalid handle, etc.) still propagate as real failures. The
pre-accept path (goal_handle_ not yet populated) is untouched — it
continues to hit the existing "cancelGoal called on an empty
goal_handle" branch.

Adds a gtest regression under behaviortree_ros2/test/ that reproduces
the race by synchronizing the server's canceled() call with receipt of
the cancel request. Without the fix the test terminates with
"Goal handle is not known to this client."

Closes BehaviorTree#18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error when canceling action during halt()

1 participant