Skip to content

fix(ci): test failures caused by runner downgrade — blacksmith-4vcpu vs ubuntu-latest #129

@terisuke

Description

@terisuke

Root Cause Analysis

Symptom

Two tests consistently fail in our fork's CI but pass in upstream:

  1. prompt submitted during an active run is included in the next LLM input
  2. hook.execute > runHook > timeout returns pass

Root Cause: Runner Performance Degradation

Upstream Our Fork
Runner blacksmith-4vcpu-ubuntu-2404 ubuntu-latest
vCPUs 4 (dedicated Blacksmith) 2 (shared GitHub)
Trigger pull_request pull_request_target

The upstream CI uses Blacksmith 4-vCPU dedicated runners which provide consistent, fast execution. Our fork switched to ubuntu-latest in PR #64 to resolve fork CI permission issues (pull_request_target).

The two failing tests are timing-sensitive:

  • prompt-during-run: relies on async message persistence happening within tight windows
  • hook timeout: sleep 10 + 200ms hook kill + process cleanup must complete within test harness timeout

On 4-vCPU dedicated runners, these timing windows are met. On 2-vCPU shared runners, they are not.

Evidence

  • Upstream v1.3.17 CI: ALL GREEN (Blacksmith runner)
  • Our fork CI: consistent unit failures on same test code
  • Local tests (M-series Mac): ALL PASS
  • The test code itself is identical between upstream and fork

Fix Options

Option A (Recommended): Restore Blacksmith runners

  • Change ubuntu-latest back to blacksmith-4vcpu-ubuntu-2404 in .github/workflows/test.yml
  • Requires Blacksmith plan for the Cor-Incorporated org
  • Resolves all timing issues without code changes

Option B: Increase timeouts (current workaround in PR #127)

  • Hook timeout: 5s → 15s
  • Prompt poll interval: 20ms → 50ms, inner timeout: 5s → 8s
  • Effect.sleep barriers added before/after gate resolution
  • Mitigates but doesn't fully resolve the prompt-during-run race

Option C: Skip timing-sensitive tests in CI

  • Mark with test.skip when process.env.CI
  • Not recommended — hides real regressions

Recommendation

Pursue Option A as the permanent fix. If Blacksmith isn't available for the fork, Option B is acceptable with the understanding that the prompt-during-run test may still intermittently fail on slow runners.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions