Skip to content

test: expand real-world JSON corpus with production API samples#158

Merged
membphis merged 2 commits into
mainfrom
test/expand-realworld-json-corpus
Jun 2, 2026
Merged

test: expand real-world JSON corpus with production API samples#158
membphis merged 2 commits into
mainfrom
test/expand-realworld-json-corpus

Conversation

@membphis
Copy link
Copy Markdown
Collaborator

@membphis membphis commented Jun 2, 2026

Summary

Expand fixture coverage to include common production API response patterns, improving confidence in real-world compatibility.

Changes

Add 3 new fixtures representing distinct API patterns:

Fixture Size Source Why It Matters
citm_catalog.json 1.7MB simdjson benchmark Event ticketing with deep nesting, unicode-heavy
k8s_openapi.json 924KB Kubernetes official OpenAPI schema, $ref-heavy, recursive structures
github_prs.json 295KB GitHub REST API PR responses, many optional fields, nested user/repo objects

Acceptance Criteria

  • Add 3-5 new fixture files representing distinct API patterns
  • Each fixture added to tests/fixtures/manifest.json with appropriate checks
  • Fixtures parse correctly in both EAGER and LAZY modes
  • At least one fixture >100KB for large-payload coverage (k8s_openapi: 924KB, citm_catalog: 1.7MB)
  • Document fixture sources and licensing in tests/fixtures/README.md

Testing

cargo test --release --test manifest_fixtures

Closes #152

Summary by CodeRabbit

  • Documentation

    • Added comprehensive fixture sources and licenses documentation for test data.
  • Tests

    • Added new benchmark test fixtures with validation checks for catalog data, OpenAPI specifications, and GitHub pull request data.

Add 3 new fixtures to improve coverage of common production API patterns:

- citm_catalog.json (1.7MB): simdjson benchmark, event ticketing with
  deep nesting and unicode
- k8s_openapi.json (924KB): Kubernetes OpenAPI spec, $ref-heavy schema
- github_prs.json (295KB): GitHub REST API PR responses, many optional fields

All fixtures added to manifest.json with appropriate checks and CI gates.
Document fixture sources and licenses in README.md.

Closes #152
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

Warning

Review limit reached

@membphis, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 53 minutes and 32 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 140c27c6-a8be-449c-829c-67b78cf8719d

📥 Commits

Reviewing files that changed from the base of the PR and between ab6d306 and d07b247.

📒 Files selected for processing (1)
  • tests/fixtures/README.md
📝 Walkthrough

Walkthrough

Three new production API test fixtures are added to expand real-world JSON corpus coverage: citm_catalog (cinema ticket catalog), k8s_openapi (Kubernetes OpenAPI schema), and github_prs (GitHub pull request list). The manifest defines their paths, formats, sizes, structural characteristics, and validation checks; a new README table documents each fixture's source and license.

Changes

Test Fixture Expansion

Layer / File(s) Summary
New test fixtures and source documentation
tests/fixtures/manifest.json, tests/fixtures/README.md
Three new fixtures (citm_catalog, k8s_openapi, github_prs) are added to the manifest with path, source reference, format, size, structural density, workloads, and structural checks. The README documents fixture filenames mapped to their sources and licenses in a table.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Three fixtures, shiny and new,
From APIs in the wild, bold and true—
Kubernetes, GitHub, and cinema show,
Real-world JSON for testing's glow!
Licensed and documented with care,
The test suite blooms beyond compare! 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main objective: expanding the JSON corpus with production API samples. It is specific, concise, and clearly reflects the primary change.
Linked Issues check ✅ Passed The PR addresses all key objectives from #152: adds 3 new fixtures (citm_catalog, k8s_openapi, github_prs) covering distinct API patterns, updates manifest.json with checks, documents sources/licensing in README.md, and includes fixtures >100KB.
Out of Scope Changes check ✅ Passed All changes are in scope: fixture documentation in README.md and manifest.json updates align with issue #152 requirements. No unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/expand-realworld-json-corpus

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/fixtures/manifest.json`:
- Around line 155-169: The manifest entry for dataset id "github_prs" contains a
real user identifier at check path "[0].user.login"; update the underlying
fixture data (github_prs.json) to redact any user/account fields (e.g., replace
"membphis" with a neutral placeholder like "redacted_user_0") and then update
the corresponding check in the checks array (the check with "path":
"[0].user.login") to expect the new sanitized value; also scan the same fixture
for other user/account fields and sanitize them and their checks similarly so no
real PII remains.

In `@tests/fixtures/README.md`:
- Line 117: Update the README entry for the fixture named `github_prs.json` to
state the concrete redistribution basis instead of the vague phrase "Public API
response, no PII": either replace that cell with the specific license/terms that
permit storing/redistributing the captured GitHub REST API v3 response (e.g.,
GitHub Terms of Service section X, or an explicit CC/BSD-like license applied to
the fixture) or remove/replace `github_prs.json` with a fixture that has clear
redistribution rights; ensure the README row for `github_prs.json` references
the exact terms or the alternative fixture name so reviewers can verify legal
permissibility.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f0651692-a29f-4657-9e60-1db845badbde

📥 Commits

Reviewing files that changed from the base of the PR and between 715b83e and ab6d306.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • tests/fixtures/README.md
  • tests/fixtures/data/github_prs.json
  • tests/fixtures/data/k8s_openapi.json
  • tests/fixtures/manifest.json

Comment on lines +155 to +169
"id": "github_prs",
"path": "tests/fixtures/data/github_prs.json",
"source": "GitHub REST API v3 (public repo, MIT)",
"payload_type": "rest_api",
"format": "json",
"size_bytes": 294616,
"structural_density": "medium",
"workloads": ["parse_access", "decode_access"],
"ci": ["pr", "scheduled"],
"checks": [
{ "path": "", "type": "array", "len": 15 },
{ "path": "[0].number", "type": "number", "value": 157 },
{ "path": "[0].state", "type": "string", "value": "closed" },
{ "path": "[0].user.login", "type": "string", "value": "membphis" },
{ "path": "[0].base.repo.full_name", "type": "string", "value": "api7/lua-qjson" }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Sanitize the GitHub user identifiers before checking this fixture in.

Line 168 hard-codes a real user.login value ("membphis"), so this corpus now contains a public user identifier. That conflicts with issue #152's requirement to sanitize PII and makes the fixture unsuitable as a “no PII” sample. Please redact user/account fields in tests/fixtures/data/github_prs.json and update the checks to match the sanitized values.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/fixtures/manifest.json` around lines 155 - 169, The manifest entry for
dataset id "github_prs" contains a real user identifier at check path
"[0].user.login"; update the underlying fixture data (github_prs.json) to redact
any user/account fields (e.g., replace "membphis" with a neutral placeholder
like "redacted_user_0") and then update the corresponding check in the checks
array (the check with "path": "[0].user.login") to expect the new sanitized
value; also scan the same fixture for other user/account fields and sanitize
them and their checks similarly so no real PII remains.

Comment thread tests/fixtures/README.md Outdated
Address review feedback: replace vague "Public API response, no PII"
with explicit reference to GitHub Terms of Service.
@membphis membphis merged commit 83c26ad into main Jun 2, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test: expand real-world JSON corpus with production API samples

1 participant