Skip to content

ci: add daily audit suites with 5 rotating recipes and scheduled workflow#543

Open
andreatgretel wants to merge 4 commits intomainfrom
andreatgretel/feat/daily-audit-suites
Open

ci: add daily audit suites with 5 rotating recipes and scheduled workflow#543
andreatgretel wants to merge 4 commits intomainfrom
andreatgretel/feat/daily-audit-suites

Conversation

@andreatgretel
Copy link
Copy Markdown
Contributor

📋 Summary

Add a daily agentic CI system that runs rotating code health audits on weekdays, catching quality drift that existing CI doesn't cover (no C901/ANN/BLE ruff rules, no cross-reference validation, no transitive dep analysis, no docs-vs-code accuracy checks). Each audit runs as a Claude Code agent on the self-hosted runner, guided by a recipe, and reports findings to the GitHub Actions step summary.

Closes #472

🔗 Related Issue

Closes #472

🔄 Changes

✨ Added

🔧 Changed

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • agentic-ci-daily.yml - New workflow with contents: write and pull-requests: write permissions. Write access is intentional to support future recipe-driven PRs, but all current recipes are read-only audits.
  • Executable smoke checks in test-health/recipe.md and code-quality/recipe.md - These run real Python against the installed packages. Fixed canaries are deterministic; creative checks are agent-designed each run.
  • Runner memory schema in _runner.md - Defines the JSON contract for cross-run state persistence including TTL rules for known_issues.

🧪 Testing

  • make check-all passes (ruff lint + format)
  • Unit tests added/updated — N/A, no testable Python logic (recipes are markdown instructions, workflow is YAML)
  • E2E tests — N/A, requires self-hosted runner with Claude CLI. Can be validated via workflow_dispatch after merge.

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated — N/A, this is CI infrastructure

Add the daily maintenance infrastructure (Phase 2+3 of the agentic CI
plan). A new workflow runs one audit suite per weekday via day-of-week
rotation, with runner memory persisted via actions/cache.

Recipes: docs-and-references (Mon), dependencies (Tue), structure (Wed),
code-quality (Thu), test-health (Fri). Each targets gaps that CI and ruff
don't cover: cross-reference validation, transitive dep analysis, lazy
import compliance, complexity trends, and test-to-source mapping.

Reports go to the Actions step summary. Code changes use /create-pr.
Add executable smoke checks to test-health and code-quality recipes
that exercise real code paths (config build, validate, import timing,
registry completeness, error hierarchy, input rejection) without
needing an LLM provider. Checks are split into fixed canaries (same
every run) and creative checks (agent varies inputs each run).

Harden runner memory: define JSON schema in _runner.md with TTL and
size rules, validate state file after agent runs, only update
last_run on success, drop unused audit-log.md. Add make install-dev
workflow step so recipes can run Python against the installed packages.
Fix issues found by Codex review:
- Fix test paths: tests/ does not exist at repo root, use
  packages/*/tests/ and packages/data-designer/tests/test_import_perf.py
- Remove DataDesigner(model_providers=[]) from smoke checks - raises
  NoModelProvidersError; keep config-layer checks only
- Fix audit step gating: remove continue-on-error, use step outcome
  to gate runner memory update (|| true + continue-on-error made the
  step always "succeed", defeating the success() condition)
Fix heredoc with indented EOF terminator that never terminates - replace
with printf. Run state validation on all outcomes (not just success) so
corrupted state from a failed audit is caught before caching. Only stamp
last_run when audit succeeds. Align test-health lazy import section with
its own Constraints (report count only, don't duplicate structure audit).

Also fixes datetime.utcnow() deprecation and shell variable injection
in Python string by using os.environ instead.
@andreatgretel andreatgretel requested a review from a team as a code owner April 14, 2026 14:40
@github-actions
Copy link
Copy Markdown
Contributor

PR Review: #543 - ci: add daily audit suites with 5 rotating recipes and scheduled workflow

Reviewer: Agentic CI
Date: 2026-04-14
PR Author: andreatgretel
Base: main
Files changed: 9 (+1378, -13)


Summary

This PR introduces a daily agentic CI system that runs rotating code health audits on weekdays. It adds:

  • A GitHub Actions workflow (agentic-ci-daily.yml) with day-of-week suite rotation, workflow_dispatch override, per-suite concurrency, and runner memory via actions/cache
  • Five new recipe files for Monday-Friday audits: docs-and-references, dependencies, structure, code-quality, and test-health
  • Updates to the shared _runner.md with environment docs, runner memory schema, and PR creation instructions
  • CODEOWNERS entry for .agents/recipes/
  • Plan file updates marking Phases 2-4 deliverables as complete

The design is well-structured: each recipe targets gaps that existing CI (ruff, pytest, Dependabot) doesn't cover, with clear delineation of responsibilities.


Findings

Workflow (agentic-ci-daily.yml)

[Low] No validation of workflow_dispatch suite input
Line 35: The OVERRIDE input is used directly without validating it matches a known suite name. An invalid value like suite=typo would produce a matrix with ["typo"], which would then fail at the "Recipe not found" check (line 163). This is a soft landing (the step errors clearly), but a validation step in determine-suite would fail faster and with a clearer message. Consider adding a check against the known suite list.

[Info] contents: write + pull-requests: write permissions on a scheduled workflow
Lines 17-18: As noted in the PR description, write permissions are intentional for future recipe-driven PRs. All current recipes are read-only audits. This is acceptable but worth noting for security-conscious reviewers: the Claude agent running in these jobs has write access to the repo. The trust boundary is the recipe prompt itself.

[Info] Pre-flight API check sends a real request
Lines 134-147: The pre-flight check sends an actual messages API call with max_tokens: 5. This is a reasonable health check, but it does consume a small amount of API quota on every run. The --max-time 10 timeout is appropriate.

[Low] Top-level concurrency group may block parallel all runs unnecessarily
Line 21-22: The top-level concurrency: group: agentic-ci-daily with cancel-in-progress: false means only one workflow run can execute at a time. The per-suite concurrency (line 72) handles parallelism within a run. However, if someone triggers workflow_dispatch while a scheduled run is in progress, the dispatch will queue behind it. This is likely the desired behavior (avoid resource contention on the self-hosted runner), but it means all runs serialize at the workflow level even though suites could run in parallel. The matrix strategy handles parallelism correctly within a single run, so this is only a concern for overlapping dispatches.

[Info] Frontmatter stripping sed command
Line 170: sed '1,/^---$/{ /^---$/,/^---$/d }' - Tested and confirmed this correctly strips YAML frontmatter from recipe files while preserving the body content.

Recipes (general observations)

[Info] Well-scoped separation of concerns
Each recipe clearly states what CI already enforces and what the recipe targets. This prevents duplicate work and keeps the audit focused. The "What CI already enforces / What CI does NOT enforce" pattern is excellent for guiding the agent.

[Info] Runner memory integration is consistent
All five recipes follow the same pattern: read runner-state.json, skip known issues, update baselines, compare trends. The memory schema is well-defined in _runner.md.

[Low] Recipe frontmatter declares permissions: contents: write but recipes are read-only
All five recipe frontmatter blocks include permissions: contents: write, but the constraints sections all say "Do not modify any files. This is a read-only audit." The frontmatter permissions appear to be metadata only (not enforced by the workflow), but the inconsistency could confuse future recipe authors. Consider either removing the permissions field from read-only recipes or adding a comment explaining it's for future use.

Recipe: code-quality/recipe.md

[Info] Executable checks are well-designed
The split between fixed canaries (deterministic, run as-written) and creative checks (agent-designed, varied each run) is a good pattern. The API reference with DataDesignerConfigBuilder usage examples reduces agent guesswork.

[Low] grep -rn -A1 "except" for swallowed exceptions is noisy
Line 139: This grep pattern will match all except clauses, not just bare ones. Combined with the -A1 and grep -B1 "pass$\|continue$" pipe, it will produce false positives on legitimate except SomeError: pass patterns that are intentional no-ops. The agent is expected to filter these, but the noise level may waste turns.

Recipe: dependencies/recipe.md

[Info] Good focus on what Dependabot can't do
The transitive dependency gap analysis (checking that each package declares what it directly imports) is high-value and not covered by any standard tool.

Recipe: structure/recipe.md

[Info] Correct handling of TYPE_CHECKING exclusions
Lines 734-735: The recipe correctly instructs the agent to exclude TYPE_CHECKING blocks from import boundary violation analysis.

[Info] Honest about expected clean state
Line 737: "As of the last audit, import boundaries were clean. If this section has no findings, that's expected - it's a guardrail, not a bug finder." This calibrates expectations well.

Recipe: test-health/recipe.md

[Info] Conservative hollow test detection
The recipe provides clear positive and negative examples for hollow test patterns and requires the agent to read test function bodies before flagging. This should minimize false positives.

[Low] find + while read for future annotations check is duplicated
Line 766-771 in the structure recipe and implicitly expected in test-health's import performance section both check for future annotations. The test-health recipe does note "refer to Wednesday's structure audit" for lazy import details, which is good, but the overlap could be more explicitly managed.

Recipe: docs-and-references/recipe.md

[Info] Smart prioritization
The recipe prioritizes by user impact: interface package first, then engine, then config. Documentation pages are sampled by code-symbol density rather than read exhaustively. This is cost-effective.

Plan file (plans/472/agentic-ci-plan.md)

[Info] Accurate status updates
Phases 2, 3, and 4 deliverables are correctly marked as complete where applicable. The "Recipe runner script" item was appropriately updated to reflect that functionality was built into the workflow rather than a separate script. Phase 4 items (testing framework, metrics dashboard, memory compaction) remain open.

CODEOWNERS

[Info] Redundant but explicit
The new .agents/recipes/ entry assigns the same team (@NVIDIA-NeMo/data_designer_reviewers) that already owns *. This is redundant today but documents intent and future-proofs against CODEOWNERS changes.


Verdict

Approve with minor suggestions.

This is a well-designed agentic CI system. The recipes are thorough, well-scoped, and avoid duplicating existing CI coverage. The workflow is clean with good error handling (config check, pre-flight, recipe validation, memory resilience). The runner memory schema enables cross-run trend tracking.

The findings are all low-severity or informational. The two most actionable items:

  1. Validate workflow_dispatch suite input in the determine-suite job to fail fast on typos rather than letting them reach the recipe lookup step.
  2. Align recipe frontmatter permissions with actual behavior - either remove permissions: contents: write from read-only recipes or add a comment explaining it's for future recipe-driven PRs.

Neither is a blocker.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 14, 2026

Greptile Summary

Adds a scheduled weekday CI workflow that runs one of five rotating agentic audit suites (docs, dependencies, structure, code-quality, test-health) per day via the Claude CLI on a self-hosted runner, with per-suite runner-state persistence through actions/cache. The implementation is solid — the cache key pattern, concurrency guards, state validation on every run, and CODEOWNERS protection are all well-considered. The only notable gap is that the top-level permissions block grants contents: write and pull-requests: write to every job including the lightweight determine-suite job; scoping write permissions to the audit job only would better follow least-privilege.

Confidence Score: 5/5

  • Safe to merge — one P2 permissions hygiene suggestion, no logic or correctness issues.
  • All findings are P2 style/security-hygiene suggestions. The workflow logic (rotation, concurrency, caching, state validation, output routing) is correct, and the recipe instructions are well-scoped and internally consistent.
  • .github/workflows/agentic-ci-daily.yml — top-level permissions should be scoped per-job.

Important Files Changed

Filename Overview
.github/workflows/agentic-ci-daily.yml New scheduled workflow with day-of-week suite rotation, matrix parallelism, runner memory via actions/cache, and API pre-flight check; top-level write permissions apply to all jobs including the lightweight determine-suite job.
.agents/recipes/_runner.md Adds environment docs, runner-state.json schema with TTL/size rules, and updates PR creation instructions to use /create-pr; schema is clear and well-constrained.
.agents/recipes/code-quality/recipe.md Thursday audit covering C901 complexity, exception hygiene, type annotation coverage, and TODO aging; fixed/creative executable checks are well-specified with clear success/failure criteria.
.agents/recipes/dependencies/recipe.md Tuesday audit covering transitive dependency gaps, cross-package version consistency, unused deps, and pinning review; constraints correctly scope the audit to what Dependabot cannot do.
.agents/recipes/docs-and-references/recipe.md Monday audit for docstring/signature drift, broken internal links, stale architecture references, and docs site accuracy; well-scoped to avoid duplicating ruff checks.
.agents/recipes/structure/recipe.md Wednesday audit for import boundary violations, lazy import compliance, future annotations, and dead exports; correctly excludes TYPE_CHECKING blocks and documents the expected clean baseline.
.agents/recipes/test-health/recipe.md Friday audit with solid fixed canaries (import verification, timing budget, registry completeness) and well-documented creative smoke checks with a correct note about NoModelProvidersError limiting what can be tested.
.github/CODEOWNERS Adds CODEOWNERS entry for .agents/recipes/ to require core-team review on recipe changes; appropriate given recipes control what the agent executes with write permissions.
plans/472/agentic-ci-plan.md Phase 2, 3, and CODEOWNERS deliverables marked complete; accurate reflection of what's been implemented in this PR.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([schedule: weekdays 08:00 UTC\nor workflow_dispatch]) --> B[determine-suite job\nubuntu-latest]
    B --> C{OVERRIDE input?}
    C -->|none| D[date -u +%u\nMon-Fri rotation]
    C -->|specific suite| E[suites = override]
    C -->|all| F[suites = all 5]
    D --> G[suites = single suite]
    E --> H{suites != empty?}
    G --> H
    F --> H
    H -->|no - weekend| I([skip])
    H -->|yes| J[audit job matrix\nself-hosted agentic-ci runner]
    J --> K[Restore runner memory\nactions/cache]
    K --> L[make install-dev\n.venv/bin on PATH]
    L --> M[Pre-flight: claude CLI\n+ API connectivity check]
    M --> N[Run audit recipe\n_runner.md + recipe body\nsed frontmatter strip\ntemplate substitution]
    N --> O[claude --model ...\n-p PROMPT\n--max-turns 30]
    O --> P[Update runner memory\nvalidate JSON\nstamp last_run on success]
    P --> Q[Write job summary\n/tmp/audit-SUITE.md\n+ agent log]
    P -->|always| Q
    M -->|fail| P
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: .github/workflows/agentic-ci-daily.yml
Line: 16-18

Comment:
**Overly broad permissions on `determine-suite`**

The top-level `permissions` block grants `contents: write` and `pull-requests: write` to every job in the workflow, including `determine-suite`, which only runs `date` and string manipulation. Following least-privilege, write access should be scoped to the `audit` job where PRs could actually be created.

```yaml
# Remove top-level permissions block, then add per-job scopes:
jobs:
  determine-suite:
    permissions:
      contents: read
    ...

  audit:
    permissions:
      contents: write
      pull-requests: write
    ...
```

This limits the blast radius if the `ubuntu-latest` `determine-suite` job is ever compromised via a supply-chain attack on a future action added to it.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "ci: fix review findings - heredoc, state..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: agentic CI - automated PR reviews and scheduled maintenance

1 participant