Skip to content

test: expand real-world JSON corpus with production API samples #152

@membphis

Description

@membphis

Background

qjson's current real-world test data is limited to:

  • twitter.json (simdjson corpus) — unicode-heavy social media
  • small_api.json / medium_resp.json — synthetic chat completion payloads
  • amazon_cellphones.ndjson — product data

Production JSON parsers encounter diverse API response shapes. V8 and simdjson maintain broader corpora to catch compatibility edge cases.

Goal

Expand fixture coverage to include common production API response patterns, improving confidence in real-world compatibility.

Proposed Additions

Source Payload Type Why It Matters
GitHub API Issue/PR responses Deeply nested, many optional fields, markdown in strings
Stripe API Payment objects Decimal-sensitive numbers, nested metadata
Kubernetes Resource manifests Deep nesting, large arrays, YAML-originated quirks
AWS API EC2/S3 responses XML-to-JSON conversion artifacts, verbose structures
OpenAPI/Swagger Schema definitions $ref heavy, recursive structures

Acceptance Criteria

  • Add 3-5 new fixture files representing distinct API patterns
  • Each fixture added to tests/fixtures/manifest.json with appropriate checks
  • Fixtures parse correctly in both EAGER and LAZY modes
  • At least one fixture >100KB for large-payload coverage
  • Document fixture sources and licensing in tests/fixtures/README.md

Technical Notes

  • Prefer fixtures with permissive licenses (MIT, Apache, public domain)
  • Sanitize any PII from real API responses before committing
  • Consider adding a ci: ["pr"] entry for smaller fixtures to catch regressions early

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions