Skip to content

docs: document missing metrics and fix gean repo link#397

Merged
pablodeymo merged 3 commits into
mainfrom
docs/metrics-and-gean-link
May 28, 2026
Merged

docs: document missing metrics and fix gean repo link#397
pablodeymo merged 3 commits into
mainfrom
docs/metrics-and-gean-link

Conversation

@MegaRedHand
Copy link
Copy Markdown
Collaborator

Summary

Reconciles docs/metrics.md with the implementation and the pinned leanMetrics spec. 11 implemented metrics were undocumented; 2 spec metrics ethlambda doesn't implement were missing entirely.

Metrics doc changes

  • New Block Production Metrics section (all in spec, supported): lean_block_aggregated_payloads, lean_block_building_payload_aggregation_time_seconds, lean_block_building_time_seconds, lean_block_building_success_total, lean_block_building_failures_total
  • Fork-Choice: lean_node_sync_status (status=idle,syncing,synced)
  • Network: lean_gossip_mesh_peers (client label)
  • State Transition: spec's lean_justified_slot / lean_finalized_slot added as unsupported (ethlambda only emits the latest_* variants)
  • Custom (non-leanMetrics): new Storage subsection (lean_table_bytes) and Attestation Aggregate Coverage subsection (lean_attestation_aggregate_coverage_validators / _subnets / _diff_validators)

Verified existing rows for drift: finalizations_total, node_info, and reqresp protocol labels all match the code.

Other

  • coverage.rs: TODO flagging the reserved-but-unpopulated per-subnet (subnet_N) breakdown, matching the note in the doc
  • introduction.md: fix stale gean repo URL (devlongs/gean -> geanlabs/gean), consistent with the CLAUDE.md fix in docs: fix gean repository URL in CLAUDE.md #393

Test plan

Docs + one comment change only; no code behavior affected.

The metrics doc had drifted from the implementation. Add the 11 metrics
that were implemented but undocumented and reconcile the doc with the
pinned leanMetrics spec:

- New Block Production Metrics section (block_aggregated_payloads,
  block_building_{time,payload_aggregation_time}_seconds,
  block_building_{success,failures}_total)
- lean_node_sync_status (Fork-Choice), lean_gossip_mesh_peers (Network)
- spec's lean_justified_slot / lean_finalized_slot listed as unsupported
  (ethlambda only emits the latest_* variants)
- Custom (non-leanMetrics) Storage and Attestation Aggregate Coverage
  subsections for lean_table_bytes and the coverage gauges

Also flag the reserved per-subnet coverage breakdown with a TODO in
coverage.rs, and fix the stale gean repository URL in introduction.md
(devlongs/gean -> geanlabs/gean).
@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

Overall Assessment: Low-risk documentation PR with minor inconsistencies. No consensus-critical code changes.

Specific Feedback

1. Documentation Consistency (metrics.md)

Issue: Inconsistent table schemas. The "State Transition" table includes a "Supported" column (✅/❌), but the new "Block Production", "Storage", and "Attestation Aggregate Coverage" sections omit this column.

Recommendation: Standardize all metric tables. Either:

  • Add "Supported" columns to the new tables, or
  • Remove the column from "State Transition" if tracking unsupported metrics is not valuable

File: docs/metrics.md (lines 33-44, 91-106, 115-128)

2. Ambiguous Metric Definitions (metrics.md)

Issue: The distinction between lean_latest_justified_slot (✅) and lean_justified_slot (❌) is unclear. In Ethereum consensus, "latest" typically refers to the highest known message, while the non-prefixed variant would imply the current state's justified checkpoint. Documenting unsupported metrics without explanation creates confusion.

Recommendation: Either:

  • Remove the unsupported metrics (❌) until implemented, or
  • Add a footnote explaining that these represent future planned metrics for current-state checkpoint tracking

File: docs/metrics.md (lines 66-69)

3. Histogram Bucket Range (metrics.md)

Issue: lean_block_building_time_seconds uses buckets up to 1s maximum (0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 1). Block building under load or during state regeneration may exceed this.

Recommendation: Verify this is sufficient for the minimalist client scope. Consider adding a 2s or +Inf bucket to avoid clipping slow observations.

File: docs/metrics.md (line 38)

4. Terminology Precision (metrics.md)

Issue: lean_block_building_failures_total description references "exception in build_block". Rust uses Result types, not exceptions.

Recommendation: Change description to: "Failed block builds (error/panic in build_block)"

File: docs/metrics.md (line 41)

5. TODO Comment (coverage.rs)

Issue: The TODO comment (line 59) mentions future per-subnet breakdown work but lacks issue tracking.

Recommendation: Link to a GitHub issue or internal ticket ID to prevent this from becoming orphaned technical debt.

File: crates/blockchain/src/coverage.rs (lines 59-62)

6. URL Update Verification (introduction.md)

Acknowledgment: The gean repository URL update (devlongs/geangeanlabs/gean) appears to be a legitimate organization transfer. No issues.

File: docs/introduction.md (line 48)

Consensus/Security Notes

  • No algorithmic changes: This PR contains only documentation and comments
  • No SSZ/encoding changes: No serialization logic modified
  • No fork-choice modifications: The coverage.rs change is purely observational (metrics)
  • Metric safety: The attestation coverage metrics described in metrics.md appear to track aggregate visibility (validator counts) rather than sensitive cryptographic material, which is appropriate for observability

Nitpicks

  • Table alignment: The "Attestation Aggregate Coverage" table uses inconsistent markdown alignment compared to other tables (column pipes don't align visually)
  • Label cardinality: The lean_gossip_mesh_peers metric uses client=<name>_<N> labels. Ensure <N> is bounded (e.g., max subnet count) to prevent unbounded cardinality in Prometheus

Verdict: Approve after addressing the "Supported" column inconsistency and clarifying the justified/finalized slot metric distinction.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

  1. docs/metrics.md: lean_block_building_failures_total is documented as “exception in build_block”, but the counter is also incremented on proposer-signing failure and on local process_block failure, not just block construction. See crates/blockchain/src/lib.rs, lib.rs:412, and lib.rs:430. The description should be broadened to avoid misleading operators.

  2. docs/metrics.md: lean_table_bytes is described as updating “On each slot”, but the implementation only refreshes it from process_block(), after a block is processed. See crates/blockchain/src/lib.rs. On empty slots or when no block is imported, the gauge will retain its previous value.

  3. docs/metrics.md: the new attestation coverage section says these gauges are sampled “once per slot” / “Each slot”, but that is not true for all paths. emit_post_block_coverage() explicitly skips updates when the canonical head block does not carry that round’s votes, leaving prior values in place, and proposal_combined only emits during local block production. See crates/blockchain/src/coverage.rs and crates/blockchain/src/lib.rs.

No consensus, fork-choice, attestation-validation, SSZ, or memory-safety concerns showed up in the Rust change itself; the only code delta is a TODO comment in crates/blockchain/src/coverage.rs.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have everything I need to write the review.


PR #397 Review: docs: document missing metrics and fix gean repo link

Overall: Docs-only PR. Changes are accurate and well-organized. One pre-existing documentation drift was missed during the stated "verified existing rows" pass.


Accuracy verification

All new metric entries verified against the implementation:

Block Production Metrics (all new):

  • lean_block_aggregated_payloads — Histogram, buckets [1, 2, 4, 8, 16, 32, 64, 128]
  • lean_block_building_payload_aggregation_time_seconds — Histogram, buckets [0.1, 0.25, 0.5, 0.75, 1, 2, 3, 4]
  • lean_block_building_time_seconds — Histogram, buckets [0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 1]
  • lean_block_building_success_total / _failures_total — IntCounters, no labels ✅

Fork-Choice: lean_node_sync_statusIntGaugeVec with ["status"] label; actual enum values in metrics.rs:436-442 are idle, syncing, synced, matching the documented label set ✅

State Transition: lean_justified_slot / lean_finalized_slot marked ❌ — confirmed absent from the codebase ✅

Network: lean_gossip_mesh_peersIntGaugeVec with ["client"] label ✅

Custom subsections (Storage, Attestation Aggregate Coverage):

  • lean_table_bytesIntGaugeVec with ["table"]
  • Coverage section labels match ATTESTATION_AGGREGATE_COVERAGE_SECTIONS constant exactly (timely, late, block, combined, agg_start_new, proposal_combined) ✅
  • Diff directions match ATTESTATION_AGGREGATE_COVERAGE_DIFF_DIRECTIONS (block_only, timely_only) ✅

Pre-existing drift missed during verification

lean_committee_signatures_aggregation_time_seconds in docs/metrics.md (line 60) lists buckets as 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 1, but metrics.rs:351 registers it as:

vec![0.05, 0.1, 0.25, 0.5, 0.75, 1.0, 2.0, 3.0, 4.0]

The documented buckets are an order of magnitude smaller than the implementation and are missing the 2, 3, 4 second tail. This is a pre-existing error not introduced here, but the PR's drift verification pass (which checked finalizations_total, node_info, and reqresp labels) didn't catch it. Worth fixing as a follow-up since this PR is already reconciling the docs.


coverage.rs TODO comment

The comment is accurate. cov_record calls set_attestation_aggregate_coverage_validators(section, "combined", ...) — only the combined subnet label is ever written, while has_subnet (which maps vid % committee_count to subnet occupancy) could drive per-subnet label emissions. The comment correctly identifies what data is available vs. what's reported.


URL fix

geanlabs/gean is correct and consistent with the CLAUDE.md fix from #393.


Summary

The new metric additions are accurate. Approve after optionally fixing the lean_committee_signatures_aggregation_time_seconds bucket discrepancy (pre-existing, not introduced here).


Automated review by Claude (Anthropic) · sonnet · custom prompt

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 28, 2026

Greptile encountered an error while reviewing this PR. Please reach out to support@greptile.com for assistance.

Address automated review feedback (Codex/Claude/Kimi):

- block_building_failures_total: broaden description; it also counts
  block-root signing and local process_block failures, not just build
- table_bytes: updated after each processed block (not every slot); it
  retains its previous value on empty slots
- attestation aggregate coverage: emission is gated by source data, so
  document the per-section timing instead of claiming "each slot"
- committee_signatures_aggregation_time_seconds: fix pre-existing bucket
  drift (doc listed 0.005..1; impl is 0.05,0.1,0.25,0.5,0.75,1,2,3,4)
@pablodeymo pablodeymo merged commit a31d1ac into main May 28, 2026
5 checks passed
@pablodeymo pablodeymo deleted the docs/metrics-and-gean-link branch May 28, 2026 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants