ci: add free-disk-space step to system-test job by devantler · Pull Request #1446 · devantler-tech/platform

devantler · 2026-04-25T21:58:00Z

Add endersonmenezes/free-disk-space@v3.2.2 before the ksail-cluster step in the system-test job to reclaim ~20GB on GitHub runners.

Removes Android SDK (~10GB), .NET (~4GB), Haskell (~4GB), and tool cache (~6GB) to prevent DiskPressure / No space left on device during Talos cluster creation and Flux reconciliation.

Inputs

Input	Value	Space freed
`remove_android`	`true`	~10 GB
`remove_dotnet`	`true`	~4 GB
`remove_haskell`	`true`	~4 GB
`remove_tool_cache`	`true`	~6 GB

Copilot

Pull request overview

This PR updates the CI workflow to proactively reclaim disk space on GitHub-hosted runners before provisioning the Talos-in-Docker system test cluster, reducing the likelihood of DiskPressure / No space left on device failures during cluster creation and Flux reconciliation.

Changes:

Add a “Free disk space” step in the system-test job prior to the KSail cluster action.
Configure the action to remove Android, .NET, Haskell, and the runner tool cache.

Add endersonmenezes/free-disk-space@v3.2.2 before ksail-cluster to reclaim ~20GB on GitHub runners by removing Android SDK, .NET, Haskell, and tool cache. Prevents DiskPressure during Talos cluster creation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

FleetDM includes MySQL, Redis, and migration jobs that need time to initialize, especially on resource-constrained CI runners. The default 5m Helm install timeout is too tight, causing CrashLoopBackOff and flaky CI failures across multiple PRs. Aligns with the pattern used by other heavy HelmReleases (velero, kube-prometheus-stack, kyverno) that set explicit timeouts and infinite remediation retries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The apps Flux Kustomization had a 5m health check timeout, but FleetDM's HelmRelease (with MySQL + Redis + migrations) needs up to 10m to install. The health check was failing before FleetDM had a chance to complete its install, forcing unnecessary retry cycles and causing flaky CI failures. Aligns the Kustomization timeout with the heaviest HelmRelease timeout in the apps group. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The 15m total timeout didn't leave room for FleetDM HelmRelease retries after infrastructure-controllers and infrastructure consume ~2-3m. With apps Kustomization timeout at 10m and retryInterval at 2m, a single retry cycle needs ~24m total (2m infra + 10m + 2m + 10m). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ting The Fleet chart deploys the migration Job and Fleet Deployment simultaneously with MySQL when mysql.enabled=true. This causes a race condition: migrations fail before MySQL is ready, Fleet crashes, and exponential backoff prevents convergence within the Helm timeout. Add init containers via postRenderers: - Job/fleet-migration: wait for MySQL TCP (port 3306) - Deployment/fleet: wait for MySQL (3306) and Redis (6379) With the race condition eliminated, revert all timeout increases back to their original values (HelmRelease default, apps Kustomization 5m, ksail connection 15m). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add securityContext to init containers (PodSecurity restricted) - Add 15s sleep after MySQL TCP check to allow full initialization (MySQL opens port 3306 before accepting connections) - Increase HelmRelease timeout to 10m for retry headroom - Increase apps Kustomization timeout to 10m - Increase ksail connection timeout to 25m Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The Flux kustomize controller fails to process the apps kustomization when swap is removed, as the runner lacks memory headroom for all the controllers running in the Talos-in-Docker cluster. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The Fleet binary writes ~/.goquery/history on every invocation (even `fleet --version`), but the chart sets readOnlyRootFilesystem=true and runs as uid 3333 which has no /etc/passwd entry. The result is that fleet-migration silently crashes with the cryptic `<timestamp> N <nil>` log line and the Helm install times out before convergence. Fix: postRender both Job/fleet-migration and Deployment/fleet to: - Mount an emptyDir at /home/fleet (chart only supports extraVolumes on the Deployment, not the Job) - Run as the image's real fleet user (uid 100, gid 101) so the Go runtime can look up $HOME and the migration container exits cleanly Verified locally: migration completes in ~98s and fleet pod becomes Ready, replacing the previous CrashLoopBackOff loop. Reverts the unnecessary timeout bumps that were added while diagnosing this. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 25, 2026 21:58

github-project-automation Bot moved this to 🫴 Ready in 🌊 Project Board Apr 25, 2026

github-project-automation Bot added this to 🌊 Project Board Apr 25, 2026

botantler Bot approved these changes Apr 25, 2026

View reviewed changes

botantler Bot enabled auto-merge April 25, 2026 21:58

github-project-automation Bot moved this from 🫴 Ready to 🚀 In Finalization in 🌊 Project Board Apr 25, 2026

devantler had a problem deploying to ci April 25, 2026 21:58 — with GitHub Actions Error

Copilot started reviewing on behalf of devantler April 25, 2026 21:58 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

devantler force-pushed the devantler/move-free-disk-space-action branch from 18696ca to 903942b Compare April 25, 2026 22:03

botantler Bot approved these changes Apr 25, 2026

View reviewed changes

devantler had a problem deploying to ci April 25, 2026 22:04 — with GitHub Actions Failure

devantler temporarily deployed to ci April 25, 2026 22:31 — with GitHub Actions Inactive

botantler Bot added this pull request to the merge queue Apr 25, 2026

devantler removed this pull request from the merge queue due to a manual request Apr 25, 2026

devantler added this pull request to the merge queue Apr 25, 2026

devantler removed this pull request from the merge queue due to a manual request Apr 25, 2026

Merge branch 'main' into devantler/move-free-disk-space-action

5c0c7ad

Copilot AI review requested due to automatic review settings April 27, 2026 17:48

devantler enabled auto-merge April 27, 2026 17:48

devantler had a problem deploying to ci April 27, 2026 17:48 — with GitHub Actions Failure

botantler Bot approved these changes Apr 27, 2026

View reviewed changes

Copilot started reviewing on behalf of devantler April 27, 2026 17:49 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

Comment thread .github/workflows/ci.yaml

botantler Bot approved these changes Apr 27, 2026

View reviewed changes

devantler had a problem deploying to ci April 27, 2026 19:31 — with GitHub Actions Failure

Copilot AI review requested due to automatic review settings April 27, 2026 20:08

botantler Bot approved these changes Apr 27, 2026

View reviewed changes

devantler had a problem deploying to ci April 27, 2026 20:08 — with GitHub Actions Failure

botantler Bot approved these changes Apr 27, 2026

View reviewed changes

devantler had a problem deploying to ci April 27, 2026 20:45 — with GitHub Actions Failure

botantler Bot approved these changes Apr 28, 2026

View reviewed changes

devantler had a problem deploying to ci April 28, 2026 07:30 — with GitHub Actions Failure

devantler had a problem deploying to ci April 28, 2026 08:26 — with GitHub Actions Failure

botantler Bot approved these changes Apr 28, 2026

View reviewed changes

devantler had a problem deploying to ci April 28, 2026 09:05 — with GitHub Actions Failure

botantler Bot approved these changes Apr 28, 2026

View reviewed changes

devantler had a problem deploying to ci April 28, 2026 09:47 — with GitHub Actions Failure

botantler Bot approved these changes Apr 28, 2026

View reviewed changes

devantler temporarily deployed to ci April 28, 2026 11:12 — with GitHub Actions Inactive

devantler added this pull request to the merge queue Apr 28, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 28, 2026

devantler added this pull request to the merge queue Apr 28, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 28, 2026

devantler merged commit ff639bf into main Apr 28, 2026
10 checks passed

devantler deleted the devantler/move-free-disk-space-action branch April 28, 2026 12:55

github-project-automation Bot moved this from 🚀 In Finalization to ✅ Done in 🌊 Project Board Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add free-disk-space step to system-test job#1446

ci: add free-disk-space step to system-test job#1446
devantler merged 9 commits intomainfrom
devantler/move-free-disk-space-action

devantler commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

devantler commented Apr 25, 2026

Inputs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants