ci: add daily audit suites with 5 rotating recipes and scheduled workflow by andreatgretel · Pull Request #543 · NVIDIA-NeMo/DataDesigner

andreatgretel · 2026-04-14T14:40:28Z

📋 Summary

Add a daily agentic CI system that runs rotating code health audits on weekdays, catching quality drift that existing CI doesn't cover (no C901/ANN/BLE ruff rules, no cross-reference validation, no transitive dep analysis, no docs-vs-code accuracy checks). Each audit runs as a Claude Code agent on the self-hosted runner, guided by a recipe, and reports findings to the GitHub Actions step summary.

Closes #472

🔗 Related Issue

Closes #472

🔄 Changes

✨ Added

.github/workflows/agentic-ci-daily.yml - Scheduled workflow with day-of-week suite rotation (Mon-Fri), per-suite concurrency, runner memory via actions/cache, make install-dev environment setup, and workflow_dispatch override (including "all" to run everything in parallel)
.agents/recipes/docs-and-references/recipe.md - Monday: docstring vs signature drift, broken internal links, architecture doc references, docs site content accuracy
.agents/recipes/dependencies/recipe.md - Tuesday: transitive dependency gaps, cross-package version consistency, unused deps, version pinning review
.agents/recipes/structure/recipe.md - Wednesday: import boundary violations, lazy import compliance, future annotations, dead exports
.agents/recipes/code-quality/recipe.md - Thursday: complexity hotspots (C901), exception hygiene, type annotation coverage, TODO aging, executable quality checks (error hierarchy, input validation)
.agents/recipes/test-health/recipe.md - Friday: test-to-source mapping, hollow test detection, import performance, executable smoke checks (fixed canaries + creative agent-varied checks), test isolation verification

🔧 Changed

.agents/recipes/_runner.md - Added environment docs (.venv/bin on PATH), runner memory JSON schema with TTL and size rules, updated PR creation instructions to use /create-pr skill
.github/CODEOWNERS - Added .agents/recipes/ ownership entry
plans/472/agentic-ci-plan.md - Marked Phase 2, 3, and 4 deliverables as complete

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

agentic-ci-daily.yml - New workflow with contents: write and pull-requests: write permissions. Write access is intentional to support future recipe-driven PRs, but all current recipes are read-only audits.
Executable smoke checks in test-health/recipe.md and code-quality/recipe.md - These run real Python against the installed packages. Fixed canaries are deterministic; creative checks are agent-designed each run.
Runner memory schema in _runner.md - Defines the JSON contract for cross-run state persistence including TTL rules for known_issues.

🧪 Testing

make check-all passes (ruff lint + format)
Unit tests added/updated — N/A, no testable Python logic (recipes are markdown instructions, workflow is YAML)
E2E tests — N/A, requires self-hosted runner with Claude CLI. Can be validated via workflow_dispatch after merge.

✅ Checklist

Follows commit message conventions
Commits are signed off (DCO)
Architecture docs updated — N/A, this is CI infrastructure

Add the daily maintenance infrastructure (Phase 2+3 of the agentic CI plan). A new workflow runs one audit suite per weekday via day-of-week rotation, with runner memory persisted via actions/cache. Recipes: docs-and-references (Mon), dependencies (Tue), structure (Wed), code-quality (Thu), test-health (Fri). Each targets gaps that CI and ruff don't cover: cross-reference validation, transitive dep analysis, lazy import compliance, complexity trends, and test-to-source mapping. Reports go to the Actions step summary. Code changes use /create-pr.

Add executable smoke checks to test-health and code-quality recipes that exercise real code paths (config build, validate, import timing, registry completeness, error hierarchy, input rejection) without needing an LLM provider. Checks are split into fixed canaries (same every run) and creative checks (agent varies inputs each run). Harden runner memory: define JSON schema in _runner.md with TTL and size rules, validate state file after agent runs, only update last_run on success, drop unused audit-log.md. Add make install-dev workflow step so recipes can run Python against the installed packages.

Fix issues found by Codex review: - Fix test paths: tests/ does not exist at repo root, use packages/*/tests/ and packages/data-designer/tests/test_import_perf.py - Remove DataDesigner(model_providers=[]) from smoke checks - raises NoModelProvidersError; keep config-layer checks only - Fix audit step gating: remove continue-on-error, use step outcome to gate runner memory update (|| true + continue-on-error made the step always "succeed", defeating the success() condition)

Fix heredoc with indented EOF terminator that never terminates - replace with printf. Run state validation on all outcomes (not just success) so corrupted state from a failed audit is caught before caching. Only stamp last_run when audit succeeds. Align test-health lazy import section with its own Constraints (report count only, don't duplicate structure audit). Also fixes datetime.utcnow() deprecation and shell variable injection in Python string by using os.environ instead.

github-actions · 2026-04-14T14:45:06Z

PR Review: #543 - ci: add daily audit suites with 5 rotating recipes and scheduled workflow

Reviewer: Agentic CI
Date: 2026-04-14
PR Author: andreatgretel
Base: main
Files changed: 9 (+1378, -13)

Summary

This PR introduces a daily agentic CI system that runs rotating code health audits on weekdays. It adds:

A GitHub Actions workflow (agentic-ci-daily.yml) with day-of-week suite rotation, workflow_dispatch override, per-suite concurrency, and runner memory via actions/cache
Five new recipe files for Monday-Friday audits: docs-and-references, dependencies, structure, code-quality, and test-health
Updates to the shared _runner.md with environment docs, runner memory schema, and PR creation instructions
CODEOWNERS entry for .agents/recipes/
Plan file updates marking Phases 2-4 deliverables as complete

The design is well-structured: each recipe targets gaps that existing CI (ruff, pytest, Dependabot) doesn't cover, with clear delineation of responsibilities.

Findings

Workflow (`agentic-ci-daily.yml`)

[Low] No validation of workflow_dispatch suite input
Line 35: The OVERRIDE input is used directly without validating it matches a known suite name. An invalid value like suite=typo would produce a matrix with ["typo"], which would then fail at the "Recipe not found" check (line 163). This is a soft landing (the step errors clearly), but a validation step in determine-suite would fail faster and with a clearer message. Consider adding a check against the known suite list.

[Info] contents: write + pull-requests: write permissions on a scheduled workflow
Lines 17-18: As noted in the PR description, write permissions are intentional for future recipe-driven PRs. All current recipes are read-only audits. This is acceptable but worth noting for security-conscious reviewers: the Claude agent running in these jobs has write access to the repo. The trust boundary is the recipe prompt itself.

[Info] Pre-flight API check sends a real request
Lines 134-147: The pre-flight check sends an actual messages API call with max_tokens: 5. This is a reasonable health check, but it does consume a small amount of API quota on every run. The --max-time 10 timeout is appropriate.

[Low] Top-level concurrency group may block parallel all runs unnecessarily
Line 21-22: The top-level concurrency: group: agentic-ci-daily with cancel-in-progress: false means only one workflow run can execute at a time. The per-suite concurrency (line 72) handles parallelism within a run. However, if someone triggers workflow_dispatch while a scheduled run is in progress, the dispatch will queue behind it. This is likely the desired behavior (avoid resource contention on the self-hosted runner), but it means all runs serialize at the workflow level even though suites could run in parallel. The matrix strategy handles parallelism correctly within a single run, so this is only a concern for overlapping dispatches.

[Info] Frontmatter stripping sed command
Line 170: sed '1,/^---$/{ /^---$/,/^---$/d }' - Tested and confirmed this correctly strips YAML frontmatter from recipe files while preserving the body content.

Recipes (general observations)

[Info] Well-scoped separation of concerns
Each recipe clearly states what CI already enforces and what the recipe targets. This prevents duplicate work and keeps the audit focused. The "What CI already enforces / What CI does NOT enforce" pattern is excellent for guiding the agent.

[Info] Runner memory integration is consistent
All five recipes follow the same pattern: read runner-state.json, skip known issues, update baselines, compare trends. The memory schema is well-defined in _runner.md.

[Low] Recipe frontmatter declares permissions: contents: write but recipes are read-only
All five recipe frontmatter blocks include permissions: contents: write, but the constraints sections all say "Do not modify any files. This is a read-only audit." The frontmatter permissions appear to be metadata only (not enforced by the workflow), but the inconsistency could confuse future recipe authors. Consider either removing the permissions field from read-only recipes or adding a comment explaining it's for future use.

Recipe: `code-quality/recipe.md`

[Info] Executable checks are well-designed
The split between fixed canaries (deterministic, run as-written) and creative checks (agent-designed, varied each run) is a good pattern. The API reference with DataDesignerConfigBuilder usage examples reduces agent guesswork.

[Low] grep -rn -A1 "except" for swallowed exceptions is noisy
Line 139: This grep pattern will match all except clauses, not just bare ones. Combined with the -A1 and grep -B1 "pass$\|continue$" pipe, it will produce false positives on legitimate except SomeError: pass patterns that are intentional no-ops. The agent is expected to filter these, but the noise level may waste turns.

Recipe: `dependencies/recipe.md`

[Info] Good focus on what Dependabot can't do
The transitive dependency gap analysis (checking that each package declares what it directly imports) is high-value and not covered by any standard tool.

Recipe: `structure/recipe.md`

[Info] Correct handling of TYPE_CHECKING exclusions
Lines 734-735: The recipe correctly instructs the agent to exclude TYPE_CHECKING blocks from import boundary violation analysis.

[Info] Honest about expected clean state
Line 737: "As of the last audit, import boundaries were clean. If this section has no findings, that's expected - it's a guardrail, not a bug finder." This calibrates expectations well.

Recipe: `test-health/recipe.md`

[Info] Conservative hollow test detection
The recipe provides clear positive and negative examples for hollow test patterns and requires the agent to read test function bodies before flagging. This should minimize false positives.

[Low] find + while read for future annotations check is duplicated
Line 766-771 in the structure recipe and implicitly expected in test-health's import performance section both check for future annotations. The test-health recipe does note "refer to Wednesday's structure audit" for lazy import details, which is good, but the overlap could be more explicitly managed.

Recipe: `docs-and-references/recipe.md`

[Info] Smart prioritization
The recipe prioritizes by user impact: interface package first, then engine, then config. Documentation pages are sampled by code-symbol density rather than read exhaustively. This is cost-effective.

Plan file (`plans/472/agentic-ci-plan.md`)

[Info] Accurate status updates
Phases 2, 3, and 4 deliverables are correctly marked as complete where applicable. The "Recipe runner script" item was appropriately updated to reflect that functionality was built into the workflow rather than a separate script. Phase 4 items (testing framework, metrics dashboard, memory compaction) remain open.

CODEOWNERS

[Info] Redundant but explicit
The new .agents/recipes/ entry assigns the same team (@NVIDIA-NeMo/data_designer_reviewers) that already owns *. This is redundant today but documents intent and future-proofs against CODEOWNERS changes.

Verdict

Approve with minor suggestions.

This is a well-designed agentic CI system. The recipes are thorough, well-scoped, and avoid duplicating existing CI coverage. The workflow is clean with good error handling (config check, pre-flight, recipe validation, memory resilience). The runner memory schema enables cross-run trend tracking.

The findings are all low-severity or informational. The two most actionable items:

Validate workflow_dispatch suite input in the determine-suite job to fail fast on typos rather than letting them reach the recipe lookup step.
Align recipe frontmatter permissions with actual behavior - either remove permissions: contents: write from read-only recipes or add a comment explaining it's for future recipe-driven PRs.

Neither is a blocker.

greptile-apps · 2026-04-14T14:46:10Z

Greptile Summary

Adds a scheduled weekday CI workflow that runs one of five rotating agentic audit suites (docs, dependencies, structure, code-quality, test-health) per day via the Claude CLI on a self-hosted runner, with per-suite runner-state persistence through actions/cache. The implementation is solid — the cache key pattern, concurrency guards, state validation on every run, and CODEOWNERS protection are all well-considered. The only notable gap is that the top-level permissions block grants contents: write and pull-requests: write to every job including the lightweight determine-suite job; scoping write permissions to the audit job only would better follow least-privilege.

Confidence Score: 5/5

Safe to merge — one P2 permissions hygiene suggestion, no logic or correctness issues.
All findings are P2 style/security-hygiene suggestions. The workflow logic (rotation, concurrency, caching, state validation, output routing) is correct, and the recipe instructions are well-scoped and internally consistent.
.github/workflows/agentic-ci-daily.yml — top-level permissions should be scoped per-job.

Important Files Changed

Filename	Overview
.github/workflows/agentic-ci-daily.yml	New scheduled workflow with day-of-week suite rotation, matrix parallelism, runner memory via actions/cache, and API pre-flight check; top-level write permissions apply to all jobs including the lightweight determine-suite job.
.agents/recipes/_runner.md	Adds environment docs, runner-state.json schema with TTL/size rules, and updates PR creation instructions to use /create-pr; schema is clear and well-constrained.
.agents/recipes/code-quality/recipe.md	Thursday audit covering C901 complexity, exception hygiene, type annotation coverage, and TODO aging; fixed/creative executable checks are well-specified with clear success/failure criteria.
.agents/recipes/dependencies/recipe.md	Tuesday audit covering transitive dependency gaps, cross-package version consistency, unused deps, and pinning review; constraints correctly scope the audit to what Dependabot cannot do.
.agents/recipes/docs-and-references/recipe.md	Monday audit for docstring/signature drift, broken internal links, stale architecture references, and docs site accuracy; well-scoped to avoid duplicating ruff checks.
.agents/recipes/structure/recipe.md	Wednesday audit for import boundary violations, lazy import compliance, future annotations, and dead exports; correctly excludes TYPE_CHECKING blocks and documents the expected clean baseline.
.agents/recipes/test-health/recipe.md	Friday audit with solid fixed canaries (import verification, timing budget, registry completeness) and well-documented creative smoke checks with a correct note about NoModelProvidersError limiting what can be tested.
.github/CODEOWNERS	Adds CODEOWNERS entry for .agents/recipes/ to require core-team review on recipe changes; appropriate given recipes control what the agent executes with write permissions.
plans/472/agentic-ci-plan.md	Phase 2, 3, and CODEOWNERS deliverables marked complete; accurate reflection of what's been implemented in this PR.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([schedule: weekdays 08:00 UTC\nor workflow_dispatch]) --> B[determine-suite job\nubuntu-latest]
    B --> C{OVERRIDE input?}
    C -->|none| D[date -u +%u\nMon-Fri rotation]
    C -->|specific suite| E[suites = override]
    C -->|all| F[suites = all 5]
    D --> G[suites = single suite]
    E --> H{suites != empty?}
    G --> H
    F --> H
    H -->|no - weekend| I([skip])
    H -->|yes| J[audit job matrix\nself-hosted agentic-ci runner]
    J --> K[Restore runner memory\nactions/cache]
    K --> L[make install-dev\n.venv/bin on PATH]
    L --> M[Pre-flight: claude CLI\n+ API connectivity check]
    M --> N[Run audit recipe\n_runner.md + recipe body\nsed frontmatter strip\ntemplate substitution]
    N --> O[claude --model ...\n-p PROMPT\n--max-turns 30]
    O --> P[Update runner memory\nvalidate JSON\nstamp last_run on success]
    P --> Q[Write job summary\n/tmp/audit-SUITE.md\n+ agent log]
    P -->|always| Q
    M -->|fail| P

Prompt To Fix All With AI

This is a comment left during a code review.
Path: .github/workflows/agentic-ci-daily.yml
Line: 16-18

Comment:
**Overly broad permissions on `determine-suite`**

The top-level `permissions` block grants `contents: write` and `pull-requests: write` to every job in the workflow, including `determine-suite`, which only runs `date` and string manipulation. Following least-privilege, write access should be scoped to the `audit` job where PRs could actually be created.

```yaml
# Remove top-level permissions block, then add per-job scopes:
jobs:
  determine-suite:
    permissions:
      contents: read
    ...

  audit:
    permissions:
      contents: write
      pull-requests: write
    ...
```

This limits the blast radius if the `ubuntu-latest` `determine-suite` job is ever compromised via a supply-chain attack on a future action added to it.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "ci: fix review findings - heredoc, state..." | Re-trigger Greptile}

andreatgretel added 4 commits April 14, 2026 03:50

andreatgretel requested a review from a team as a code owner April 14, 2026 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add daily audit suites with 5 rotating recipes and scheduled workflow#543

ci: add daily audit suites with 5 rotating recipes and scheduled workflow#543
andreatgretel wants to merge 4 commits intomainfrom
andreatgretel/feat/daily-audit-suites

andreatgretel commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

greptile-apps bot commented Apr 14, 2026

Confidence Score: 5/5

Flowchart

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andreatgretel commented Apr 14, 2026

📋 Summary

🔗 Related Issue

🔄 Changes

✨ Added

🔧 Changed

🔍 Attention Areas

🧪 Testing

✅ Checklist

Uh oh!

github-actions bot commented Apr 14, 2026

PR Review: #543 - ci: add daily audit suites with 5 rotating recipes and scheduled workflow

Summary

Findings

Workflow (agentic-ci-daily.yml)

Recipes (general observations)

Recipe: code-quality/recipe.md

Recipe: dependencies/recipe.md

Recipe: structure/recipe.md

Recipe: test-health/recipe.md

Recipe: docs-and-references/recipe.md

Plan file (plans/472/agentic-ci-plan.md)

CODEOWNERS

Verdict

Uh oh!

greptile-apps bot commented Apr 14, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Workflow (`agentic-ci-daily.yml`)

Recipe: `code-quality/recipe.md`

Recipe: `dependencies/recipe.md`

Recipe: `structure/recipe.md`

Recipe: `test-health/recipe.md`

Recipe: `docs-and-references/recipe.md`

Plan file (`plans/472/agentic-ci-plan.md`)