Skip to content

fix: reset metrics registry per benchmark cycle#48695

Merged
xinlian12 merged 1 commit intoAzure:mainfrom
xinlian12:fix/benchmark-cycle2-metrics
Apr 6, 2026
Merged

fix: reset metrics registry per benchmark cycle#48695
xinlian12 merged 1 commit intoAzure:mainfrom
xinlian12:fix/benchmark-cycle2-metrics

Conversation

@xinlian12
Copy link
Copy Markdown
Member

@xinlian12 xinlian12 commented Apr 6, 2026

Problem

When running the benchmark with -cycles 2+, the Cosmos SDK calls registry.clear() + registry.close() via ClientTelemetryMetrics.remove() when a CosmosClient is destroyed between cycles. Since the benchmark passed the same CompositeMeterRegistry to every cycle's client, closing cycle 1's client wiped all meters from the shared registry. Cycle 2's CosmosClient created new meters (with a new ClientCorrelationId), but the registry was in a broken state — no CSV data was written for cycle 2.

Root Cause

ClientTelemetryMetrics.remove(registry) calls:

  1. registry.clear() — removes all meter instances from the registry
  2. registry.close() — shuts down the registry so it stops accepting new meters

Both propagate through the CompositeMeterRegistry to all children, destroying the SimpleMeterRegistry that CsvMetricsReporter reads from.

Fix

Create a fresh, disposable metrics pipeline per cycle:

  1. Each cycle creates its own CompositeMeterRegistry + SimpleMeterRegistry + CsvMetricsReporter
  2. The LoggingMeterRegistry (console output) is added to each cycle's registry but disconnected before shutdown so the SDK's clear()/close() doesn't destroy it
  3. Reporters are flushed before client shutdown to capture all data
  4. The SDK can safely destroy the disposable cycle registry — nothing shared is affected
  5. Each cycle's CSV files start with fresh counters (count=0)

Verified

Tested with -cycles 2 -settleTimeMs 30000 on 1t-c16-ReadThroughput-http1:

  • Before fix: Only ClientCorrelationId.00001 CSV files existed (cycle 1 data only)
  • After fix: Both 00001 and 00002 CSV files exist with complete, independent data
Cycle 1 (00001) Cycle 2 (00002)
CSV rows 20 20
Total ops 1,531,107 1,290,714
Mean latency 1.85ms 2.20ms
Starts fresh count=0 count=0

@github-actions github-actions bot added the Cosmos label Apr 6, 2026
@xinlian12 xinlian12 changed the title fix: preserve metrics across benchmark cycles fix: reset metrics registry per benchmark cycle Apr 6, 2026
@xinlian12 xinlian12 marked this pull request as ready for review April 6, 2026 20:07
@xinlian12 xinlian12 requested review from a team and kirankumarkolli as code owners April 6, 2026 20:07
Copilot AI review requested due to automatic review settings April 6, 2026 20:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes multi-cycle benchmark runs in azure-cosmos-benchmark where closing a cycle’s CosmosClient would clear()/close() the shared Micrometer registry (via ClientTelemetryMetrics.remove(...)), preventing subsequent cycles from emitting CSV metrics.

Changes:

  • Moves metrics registry + destination reporter creation into the per-cycle lifecycle loop to ensure each cycle uses a fresh, disposable registry.
  • Keeps console logging (LoggingMeterRegistry) shared across cycles, and detaches it before client shutdown to avoid being cleared/closed by the SDK.
  • Stops (flushes) CSV/CosmosDB reporters before client shutdown so final interval data is written before the registry is destroyed.

When running with -cycles 2+, the Cosmos SDK calls registry.clear()
and registry.close() on client shutdown, destroying all meters.
This caused cycle 2 metrics to be completely lost.

Fix: create a fresh CompositeMeterRegistry, SimpleMeterRegistry, and
CsvMetricsReporter for each cycle. Reporters are flushed before client
shutdown, then the loggingRegistry is disconnected from the disposable
cycle registry so the SDK can safely destroy it. Each cycle starts
with clean counters (count=0).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@xinlian12 xinlian12 force-pushed the fix/benchmark-cycle2-metrics branch from a8ca8e4 to ee7dfe2 Compare April 6, 2026 20:17
@xinlian12
Copy link
Copy Markdown
Member Author

The failed tests: c.a.c.CosmosContainerChangeFeedTest.CosmosContainerChangeFeedTest -> does not related to changes in this PR

@xinlian12
Copy link
Copy Markdown
Member Author

/check-enforcer override

@xinlian12 xinlian12 merged commit beeb599 into Azure:main Apr 6, 2026
27 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants