Skip to content

Fix #25800: Agent status not showing true last 5 runs#26206

Merged
Rohit0301 merged 9 commits intomainfrom
issue-25800
Mar 6, 2026
Merged

Fix #25800: Agent status not showing true last 5 runs#26206
Rohit0301 merged 9 commits intomainfrom
issue-25800

Conversation

@harshach
Copy link
Copy Markdown
Collaborator

@harshach harshach commented Mar 4, 2026

Describe your changes:

Fixes #25800

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

  • Backend fixes:
    • Added listBetweenTimestampsByOrderWithLimit() method to EntityTimeSeriesDAO for fetching limited time-series results
    • Added batch operation getLatestExtensionBatch() to fetch latest statuses for multiple pipelines in single query
    • Modified listPipelineStatus() to return 5 latest runs by default when no time range provided
    • Updated IngestionPipelineRepository to batch-fetch pipeline statuses for improved performance
  • Frontend changes:
    • Replaced timestamp-based queries with limit=5 parameter in ingestion pipeline status calls
    • Updated IngestionListTable, IngestionRecentRuns, and test utilities to use new limit-based API
  • Tests and UI:
    • Added comprehensive test for 5-run display behavior in ServiceIngestion.spec.ts
    • Minor formatting fixes in TestSuiteSummaryWidget

This will update automatically on new commits.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 4, 2026

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 65%
66% (57257/86744) 45.62% (30173/66131) 48.41% (9066/18725)


/* Get the status of the external application by converting the configuration so that it can be
* served like an App configuration */
public ResultList<PipelineStatus> listExternalAppStatus(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: App runs endpoint silently limited to 5 when no time range given

listExternalAppStatus delegates to the 3-arg listPipelineStatus(fqn, startTs, endTs), which now passes limit=null to the 4-arg overload. When startTs and endTs are both null (a valid API call), resolvePipelineStatusLimit returns DEFAULT_RECENT_RUN_LIMIT = 5, silently capping the external app run history.

This is an unintended behavioral regression for the /api/v1/apps/name/{name}/runs endpoint — callers that omit time range params previously received all runs but will now get at most 5.

The fix is to either have listExternalAppStatus call the 4-arg overload with an explicit null limit that bypasses the default, or adjust resolvePipelineStatusLimit so it's only applied at the ingestion pipeline status endpoint level (in the resource layer or via a dedicated flag).

Suggested fix:

public ResultList<PipelineStatus> listExternalAppStatus(
    String ingestionPipelineFQN, Long startTs, Long endTs) {
  // Pass Integer.MAX_VALUE to bypass the default 5-run limit
  return listPipelineStatus(ingestionPipelineFQN, startTs, endTs, startTs == null && endTs == null ? null : null)
      .map(...)
}

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Mar 6, 2026

🔍 CI failure analysis for 8bedc64: The `playwright-ci-postgresql (6, 6)` CI failure consists of 1 hard failure and 4 flaky tests, all unrelated to this PR's ingestion pipeline status changes. Previous runs also showed similar unrelated flaky failures across multiple jobs.

Issue

The playwright-ci-postgresql (6, 6) job (id: 66010223575) failed with 1 hard failure and 4 flaky tests. None of these are related to this PR's changes (ingestion pipeline status limit=5, batch fetch, frontend API parameter updates).


Current Failure: playwright-ci-postgresql (6, 6)

Hard Failure:

  • playwright/e2e/Pages/Lineage.spec.ts:466 — Element [data-testid="lineage-node-...-pw-table-with/slash-..."] not found (5s timeout). Special character / in table name not rendering correctly in DOM.

Flaky Tests:

  • Lineage.spec.ts:106 — 300s timeout exceeded; tooltip overlay (ant-tooltip) blocking click on add-pipeline button
  • ServiceEntity.spec.ts:139 — Tier update not persisting; expected Tier5 but received Tier1
  • Tag.spec.ts:571 — Restricted entity pw-mlmodel-e9bbe94f visible when it should be hidden (permission filtering)
  • Users.spec.ts:570page.goto 60s timeout navigating to http://localhost:8585/ (server load)

None of these tests exercise ingestion pipeline status functionality.


Previously Observed Failures (other jobs, same commit)

  • playwright-ci-postgresql (4, 6): 1 failed, 6 flaky — custom properties, domains, entity tier updates; timing/flakiness issues unrelated to this PR.
  • Integration Tests (shard-2, 3.10): Docker build timed out downloading DB2 iAccess driver from public.dhe.ibm.com:443 — transient external network failure.
  • maven-sonarcloud-ci: Detected project binding: ERROR — SonarCloud configuration/auth issue unrelated to this PR.
  • py-run-tests (3.11): 2 Oracle topology unit tests fail (test_yield_stored_procedure, test_yield_stored_package) due to FQN mismatch — pre-existing issue.
  • Integration Tests (shard-2, 3.11): Segfault (exit code -11) during coverage report generation — infrastructure issue.
  • integration-tests-mysql-elasticsearch: Connection pool shut down / NDManager has been closed already in DJL embedding subsystem — known flaky CI race condition.

Conclusion

All failures are infrastructure, network, flakiness, or pre-existing issues unrelated to this PR's changes. Retrying the affected jobs should resolve transient failures.

Code Review ⚠️ Changes requested 2 resolved / 3 findings

Fixes the agent status query to return the true last 5 runs instead of loading all statuses from the database. However, the listExternalAppStatus endpoint silently limits results to 5 when no time range is given, and subsequent filtering may return 0 results unexpectedly.

⚠️ Bug: App runs endpoint silently limited to 5 when no time range given

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/IngestionPipelineRepository.java:683 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/IngestionPipelineRepository.java:674 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/IngestionPipelineRepository.java:695

listExternalAppStatus delegates to the 3-arg listPipelineStatus(fqn, startTs, endTs), which now passes limit=null to the 4-arg overload. When startTs and endTs are both null (a valid API call), resolvePipelineStatusLimit returns DEFAULT_RECENT_RUN_LIMIT = 5, silently capping the external app run history.

This is an unintended behavioral regression for the /api/v1/apps/name/{name}/runs endpoint — callers that omit time range params previously received all runs but will now get at most 5.

The fix is to either have listExternalAppStatus call the 4-arg overload with an explicit null limit that bypasses the default, or adjust resolvePipelineStatusLimit so it's only applied at the ingestion pipeline status endpoint level (in the resource layer or via a dedicated flag).

Suggested fix
  public ResultList<PipelineStatus> listExternalAppStatus(
      String ingestionPipelineFQN, Long startTs, Long endTs) {
    // Pass Integer.MAX_VALUE to bypass the default 5-run limit
    return listPipelineStatus(ingestionPipelineFQN, startTs, endTs, startTs == null && endTs == null ? null : null)
        .map(...)
  }
✅ 2 resolved
Bug: Paging cursors get Base64-encoded "null" string when timestamps absent

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/IngestionPipelineRepository.java:633
When startTs and endTs are null (the new default path), String.valueOf(startTs) at line 635 returns the literal string "null" rather than a null reference. The ResultList constructor (line 92-106 of ResultList.java) then Base64-encodes this string into the paging.before and paging.after fields instead of leaving them as null.

This means the API response will contain:

"paging": {
  "before": "bnVsbA==",
  "after": "bnVsbA==",
  "total": 5
}

instead of:

"paging": {
  "before": null,
  "after": null,
  "total": 5
}

Clients that check for non-null cursors to determine if more pages exist could attempt pagination with these bogus cursor values, leading to errors or unexpected behavior.

Performance: Default path loads all pipeline statuses from DB to return 5

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/IngestionPipelineRepository.java:609
When no startTs/endTs are provided (the new default path used by the UI), effectiveStartTs is set to Long.MIN_VALUE and effectiveEndTs to Long.MAX_VALUE (lines 609-610). This causes getResultsFromAndToTimestamps to fetch every pipeline status record from the database, deserialize them all into PipelineStatus objects, sort in memory, and then discard all but 5.

For long-running pipelines (e.g., hourly pipelines accumulating ~8,700 records/year), this is wasteful—especially since this endpoint is called per-pipeline on page load (the UI fires one request per visible pipeline in IngestionListTable).

The underlying DAO already supports an OrderBy.DESC parameter, so the DB results are already in descending timestamp order. Consider pushing the limit down to the database query (e.g., adding a LIMIT clause or a new DAO method) so only the needed rows are fetched and deserialized. Alternatively, since the DB already returns results in DESC order, you could at minimum avoid loading more than limit + queuedCount rows from the DB when a limit is resolved.

🤖 Prompt for agents
Code Review: Fixes the agent status query to return the true last 5 runs instead of loading all statuses from the database. However, the `listExternalAppStatus` endpoint silently limits results to 5 when no time range is given, and subsequent filtering may return 0 results unexpectedly.

1. ⚠️ Bug: App runs endpoint silently limited to 5 when no time range given
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/IngestionPipelineRepository.java:683, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/IngestionPipelineRepository.java:674, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/IngestionPipelineRepository.java:695

   `listExternalAppStatus` delegates to the 3-arg `listPipelineStatus(fqn, startTs, endTs)`, which now passes `limit=null` to the 4-arg overload. When `startTs` and `endTs` are both `null` (a valid API call), `resolvePipelineStatusLimit` returns `DEFAULT_RECENT_RUN_LIMIT = 5`, silently capping the external app run history.
   
   This is an unintended behavioral regression for the `/api/v1/apps/name/{name}/runs` endpoint — callers that omit time range params previously received all runs but will now get at most 5.
   
   The fix is to either have `listExternalAppStatus` call the 4-arg overload with an explicit `null` limit that bypasses the default, or adjust `resolvePipelineStatusLimit` so it's only applied at the ingestion pipeline status endpoint level (in the resource layer or via a dedicated flag).

   Suggested fix:
   public ResultList<PipelineStatus> listExternalAppStatus(
       String ingestionPipelineFQN, Long startTs, Long endTs) {
     // Pass Integer.MAX_VALUE to bypass the default 5-run limit
     return listPipelineStatus(ingestionPipelineFQN, startTs, endTs, startTs == null && endTs == null ? null : null)
         .map(...)
   }

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Mar 6, 2026

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Mar 6, 2026

@Rohit0301 Rohit0301 merged commit 6b91aee into main Mar 6, 2026
63 of 70 checks passed
@Rohit0301 Rohit0301 deleted the issue-25800 branch March 6, 2026 12:45
Rohit0301 added a commit that referenced this pull request Mar 6, 2026
* Fix #25800: Agent status not showing true last 5 runs

* Fix #25800: Agent status not showing true last 5 runs

* added playwright tests

* address comments

---------

Co-authored-by: Rohit Jain <60229265+Rohit0301@users.noreply.github.com>
Co-authored-by: Rohit0301 <rj03012002@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent status not showing true last 5 runs

4 participants