Skip to content

chore(docker): wire http_proxy / https_proxy / no_proxy ARGs through builds#519

Merged
SoundMindsAI merged 3 commits into
mainfrom
chore_dockerfile_http_proxy_args
Jun 16, 2026
Merged

chore(docker): wire http_proxy / https_proxy / no_proxy ARGs through builds#519
SoundMindsAI merged 3 commits into
mainfrom
chore_dockerfile_http_proxy_args

Conversation

@SoundMindsAI

@SoundMindsAI SoundMindsAI commented Jun 16, 2026

Copy link
Copy Markdown
Owner

Summary

This PR bundles two related Dockerfile-flexibility changes:

1. Wire http_proxy / https_proxy / no_proxy ARGs through builds (commit 03cb1e3c)

  • Add three new build args to both Dockerfile and ui/Dockerfile so corp installs behind an HTTP proxy can route apt / PyPI / npm fetches at build time and outbound HTTP (OpenAI, GitHub, registered ES/OpenSearch/Solr) at runtime through that proxy.
  • Both case variants (http_proxy + HTTP_PROXY, etc.) are ENV'd in every stage because Linux tooling is split: apt + curl prefer lowercase; uv + pip + requests accept either; npm + pnpm prefer uppercase.
  • Compose forwards all three through every service's build.args block (migrate / api / worker / ui).
  • Defaults stay empty — unchanged behavior when the vars are unset.

The no_proxy gotcha — Compose service names

Without postgres,redis,elasticsearch,opensearch,solr,api,worker,migrate in no_proxy, the worker's HTTP call to http://elasticsearch:9200 (and similar in-network calls) gets routed through the corporate proxy, which has no path to those Compose-internal hostnames. The recommended .env.example default bakes them in alongside:

  • 169.254.169.254 — EC2 / cloud metadata service
  • 10.0.0.0/8 — internal VPC traffic
  • localhost, 127.0.0.1 — local

2. Rename UV_REGISTRYGHCR_REGISTRY (commit 9a9f1a0c)

UV_REGISTRY (added in PR #517) read like an ARG for uv-managed dependencies (PyPI mirror?) when it actually controls only the registry prefix for the ghcr.io/astral-sh/uv tooling image. Renamed to GHCR_REGISTRY — describes what upstream it overrides (the GHCR namespace), matches BASE_REGISTRY's naming style, and stays accurate if any future GHCR-hosted image is added.

The breaking-change surface is intentionally tiny: PR #517 merged hours ago and no operator has had time to bake UV_REGISTRY into their .env yet. The state.md entry for PR #517 intentionally retains the original UV_REGISTRY wording (accurate for what shipped at that merge); the upcoming state finalization will note the rename.

Three usage patterns

# Default — no proxy, no registry override, unchanged behavior
make up

# One-shot inline
BASE_REGISTRY=fcr.fmr.com/ \
GHCR_REGISTRY=fcr.fmr.com/ \
http_proxy=http://http.proxy.fmr.com:8000 \
https_proxy=http://http.proxy.fmr.com:8000 \
no_proxy=fmr.com,.fmrcloud.com,localhost,127.0.0.1,10.0.0.0/8,169.254.169.254,postgres,redis,elasticsearch,opensearch,solr,api,worker,migrate \
make up

# Persistent — uncomment + edit the relevant blocks in .env, then `make up`

What's documented

  • The existing ## Corporate registry proxy support § in docs/01_architecture/deployment.md gains a new ### Corporate HTTP proxy (apt / PyPI / npm + runtime egress) subsection covering the three env vars, the no_proxy Compose-service-names gotcha, and a pointer to the deeper Artifactory-mirror case (not currently supported by the Dockerfiles).
  • deployment.md's GHCR_REGISTRY row description is generalized away from "the uv COPY-from stage" to "every GHCR-hosted image … currently used by the uv-source alias stage; any future GHCR image lands under the same prefix."
  • .env.example documents all three proxy vars with a copy-pasteable default no_proxy.
  • docs/03_runbooks/local-dev.md troubleshooting bullet updated to use the new GHCR_REGISTRY name.

Test plan

  • docker buildx build --check -f Dockerfile . — clean.
  • docker buildx build --check -f ui/Dockerfile ui — clean.
  • docker compose config — default-empty case: all 12 proxy build-arg slots (4 services × 3 vars) resolve to ""; BASE_REGISTRY + GHCR_REGISTRY resolve to "" + ghcr.io/.
  • docker compose --env-file <override> config — override case: all slots propagate the override value.
  • pytest backend/tests/unit/test_dockerfile_runtime_stage.py — 3/3 pass.
  • CI docker job — buildx of API image with default ARGs.
  • CI docker-ui job — buildx of UI image with default ARGs.

🤖 Generated with Claude Code

SoundMindsAI and others added 2 commits June 16, 2026 11:03
…builds

Add three new build args to both Dockerfile and ui/Dockerfile so corporate
installs behind an HTTP proxy can route apt / PyPI / npm fetches at build
time AND outbound HTTP from the runtime container (OpenAI, GitHub, cluster
ES/OpenSearch/Solr HTTP) through that proxy. Empty defaults preserve
current behavior; both case variants (http_proxy + HTTP_PROXY etc.) are
written by the ENV blocks because Linux tooling is split on the convention.

Compose forwards the three env vars into every service's build.args block
(migrate / api / worker / ui), so 'http_proxy=... make up' or setting them
in '.env' works end-to-end.

The Compose-service-names gotcha is documented loudly. Without
'postgres,redis,elasticsearch,opensearch,solr,api,worker,migrate' in
'no_proxy', the worker's HTTP call to 'http://elasticsearch:9200' (and
similar in-network HTTP) gets routed through the corporate proxy, which
has no path to those Compose-internal hostnames. The recommended
.env.example default bakes them in alongside 169.254.169.254
(cloud-metadata) and 10.0.0.0/8 (internal VPC).

Bundles in lockstep:
- New 'Corporate HTTP proxy (apt / PyPI / npm + runtime egress)'
  subsection in docs/01_architecture/deployment.md covering the env vars,
  the no_proxy Compose-service-names gotcha, and the deeper
  Artifactory-mirror case (not currently supported by the Dockerfiles —
  pointers only).
- .env.example documents all three with a copy-pasteable default no_proxy.

Validated:
- 'docker buildx build --check' clean on both Dockerfiles
- 'docker compose config' resolves all 12 build-arg slots (4 services x 3
  vars) for both default-empty and override paths
- backend/tests/unit/test_dockerfile_runtime_stage.py: 3/3 pass

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
UV_REGISTRY (added in PR #517) read like an ARG for uv-managed
dependencies when it actually controls only the registry prefix for the
ghcr.io/astral-sh/uv tooling image. Rename to GHCR_REGISTRY — describes
*what* upstream it overrides (the GHCR namespace), matches BASE_REGISTRY's
naming style (the Docker Hub equivalent), and stays accurate if any future
GHCR-hosted image is added.

Touches 5 files: Dockerfile, docker-compose.yml (3 build.args blocks —
migrate / api / worker; ui doesn't reference GHCR), .env.example, and the
two corp-proxy doc surfaces (deployment.md table row + override example;
local-dev.md troubleshooting bullet). The deployment.md description column
was also generalized — previously coupled the name back to "the
astral-sh/uv COPY-from stage", now reads as "every GHCR-hosted image …
currently used by the uv-source alias stage; any future GHCR image lands
under the same prefix."

Breaking-change footprint is intentionally tiny: PR #517 merged hours ago
and no operator has had time to bake UV_REGISTRY into their .env yet.
state.md's PR #517 historical entry intentionally retains the original
'UV_REGISTRY' wording (accurate for what shipped at that merge); the
upcoming state finalization for this PR will note the rename.

Validated:
- 'docker buildx build --check' clean on both Dockerfiles
- 'docker compose config' resolves the new name in all three backend
  services' build.args (migrate / api / worker)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces corporate HTTP proxy support for both build-time and runtime egress across the backend and frontend services, updating the Dockerfiles, Docker Compose configuration, and deployment documentation. The feedback highlights that baking proxy settings into the Docker images via ENV is an anti-pattern; instead, these should be dynamically passed at runtime via the environment section in docker-compose.yml. Additionally, it is recommended to include host.docker.internal in the default no_proxy lists to ensure local LLM development works seamlessly behind a proxy.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread Dockerfile Outdated
Comment on lines +102 to +107
ENV http_proxy=${http_proxy} \
https_proxy=${https_proxy} \
no_proxy=${no_proxy} \
HTTP_PROXY=${http_proxy} \
HTTPS_PROXY=${https_proxy} \
NO_PROXY=${no_proxy}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Baking build-time proxy configuration into the Docker image via ENV is a known anti-pattern. Docker automatically treats proxy variables (http_proxy, https_proxy, no_proxy, etc.) as special build-time arguments that are available during RUN instructions but are purposely not persisted in the final image to keep it environment-agnostic.

By explicitly defining ENV http_proxy=${http_proxy}, you are overriding this behavior and hardcoding the builder's proxy settings into the image. Furthermore, because these variables are not defined in the environment section of docker-compose.yml, any runtime container started from a pre-built image will not have these proxy settings at runtime.

Recommendation:

  1. Remove the ENV block from the Dockerfile.
  2. Pass these variables dynamically at runtime by adding them to the environment section of each service in docker-compose.yml:
    environment:
      - http_proxy
      - https_proxy
      - no_proxy
      - HTTP_PROXY
      - HTTPS_PROXY
      - NO_PROXY

Comment thread ui/Dockerfile Outdated
Comment on lines +62 to +67
ENV http_proxy=${http_proxy} \
https_proxy=${https_proxy} \
no_proxy=${no_proxy} \
HTTP_PROXY=${http_proxy} \
HTTPS_PROXY=${https_proxy} \
NO_PROXY=${no_proxy}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Baking build-time proxy configuration into the UI image via ENV is an anti-pattern. Please remove this ENV block and rely on runtime environment variables passed via docker-compose.yml instead.

Comment thread ui/Dockerfile Outdated
Comment on lines +89 to +97
ARG http_proxy
ARG https_proxy
ARG no_proxy
ENV http_proxy=${http_proxy} \
https_proxy=${https_proxy} \
no_proxy=${no_proxy} \
HTTP_PROXY=${http_proxy} \
HTTPS_PROXY=${https_proxy} \
NO_PROXY=${no_proxy}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The runner stage does not run any build-time commands (like npm install or apt-get) that require a proxy. Declaring these ARGs and ENVs in the runner stage is purely for runtime proxying, which should be handled dynamically via docker-compose.yml's environment section rather than being baked into the image.

Comment thread docker-compose.yml
Comment on lines +73 to +75
http_proxy: ${http_proxy:-}
https_proxy: ${https_proxy:-}
no_proxy: ${no_proxy:-}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Passing these as build.args is correct for build-time proxying (e.g., apt-get and uv sync). However, to support runtime proxying (especially when using pre-built images or when running without rebuilding), these variables must also be declared in the environment section of the services.

Add them to the environment section of migrate, api, worker, and ui services:

    environment:
      - http_proxy
      - https_proxy
      - no_proxy
      - HTTP_PROXY
      - HTTPS_PROXY
      - NO_PROXY

Comment thread .env.example Outdated
# which convention it reads.
# http_proxy=http://http.proxy.your-corp.com:8000
# https_proxy=http://http.proxy.your-corp.com:8000
# no_proxy=your-corp.com,.your-corp-cloud.com,localhost,127.0.0.1,10.0.0.0/8,169.254.169.254,postgres,redis,elasticsearch,opensearch,solr,api,worker,migrate

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When developers run local LLMs (such as Ollama, LM Studio, or vLLM) behind a corporate proxy, they configure OPENAI_BASE_URL to point to http://host.docker.internal. Without adding host.docker.internal to the no_proxy list, requests to the local LLM will be incorrectly routed to the corporate proxy and fail.

Include host.docker.internal in the default no_proxy list to ensure local LLM development works seamlessly out of the box.

# no_proxy=your-corp.com,.your-corp-cloud.com,localhost,127.0.0.1,10.0.0.0/8,169.254.169.254,host.docker.internal,postgres,redis,elasticsearch,opensearch,solr,api,worker,migrate

Comment thread Dockerfile
Comment on lines +53 to +55
# no_proxy=your-corp.com,.your-corp-cloud.com,localhost,127.0.0.1,
# 10.0.0.0/8,169.254.169.254,
# postgres,redis,elasticsearch,opensearch,solr,api,worker,migrate

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Add host.docker.internal to the documented no_proxy example in the Dockerfile comments to keep it in sync with .env.example and support local LLM development behind a proxy.

#   no_proxy=your-corp.com,.your-corp-cloud.com,localhost,127.0.0.1,
#            10.0.0.0/8,169.254.169.254,host.docker.internal,
#            postgres,redis,elasticsearch,opensearch,solr,api,worker,migrate

Comment thread docs/01_architecture/deployment.md Outdated
# In .env
http_proxy=http://http.proxy.your-corp.com:8000
https_proxy=http://http.proxy.your-corp.com:8000
no_proxy=your-corp.com,.your-corp-cloud.com,localhost,127.0.0.1,10.0.0.0/8,169.254.169.254,postgres,redis,elasticsearch,opensearch,solr,api,worker,migrate

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update the documentation example to include host.docker.internal in the no_proxy list to ensure consistency with .env.example.

Suggested change
no_proxy=your-corp.com,.your-corp-cloud.com,localhost,127.0.0.1,10.0.0.0/8,169.254.169.254,postgres,redis,elasticsearch,opensearch,solr,api,worker,migrate
no_proxy=your-corp.com,.your-corp-cloud.com,localhost,127.0.0.1,10.0.0.0/8,169.254.169.254,host.docker.internal,postgres,redis,elasticsearch,opensearch,solr,api,worker,migrate

… + compose environment

Accept all 7 Gemini findings on PR #519:

HIGH (1-4) — anti-pattern: Dockerfile ENV bakes proxy URL into image, and
explicit ARG declarations were unnecessary.

The fix: rely on BuildKit's predefined proxy ARGs. Docker treats
`http_proxy` / `https_proxy` / `no_proxy` (plus uppercase + FTP/ALL
variants) as predefined ARGs — BuildKit forwards them from --build-arg
into every RUN step's environment automatically, with no `ARG` declaration
needed, and intentionally excludes them from `docker history` so the proxy
URL never gets baked into the image. The previous explicit
`ARG http_proxy=` + `ENV http_proxy=...` blocks in both Dockerfiles were
both redundant AND counterproductive (the ENV baked the URL into the
runtime image, making it environment-coupled).

Removed:
- Backend Dockerfile: global ARGs + base-stage ARG+ENV block. Kept the
  explanatory comment.
- ui/Dockerfile: global ARGs + deps-stage ARG+ENV block + runner-stage
  ARG+ENV block. Kept the explanatory comment.

Added to docker-compose.yml:
- `environment:` block proxy entries on migrate / api / worker / ui
  services (six entries each — both case variants). Reads from .env via
  `${http_proxy:-}` etc. Empty default = no proxy = unchanged behavior.
- Updated the build.args comment to explain the new architecture
  (build-time = BuildKit predefined; runtime = environment: block).

MEDIUM (5-7) — `host.docker.internal` missing from default `no_proxy`.

Without it, local-LLM development (Ollama / LM Studio / vLLM via
`OPENAI_BASE_URL=http://host.docker.internal:…`) breaks behind a corp
proxy because the proxy intercepts the local-machine call. Added to:
- .env.example default `no_proxy` line + the gotcha block
- Dockerfile explanatory comment
- deployment.md override example + gotcha paragraph

Also added an "Architecture: build-time vs runtime" paragraph to
deployment.md linking to Docker's predefined-ARGs reference and
explaining the dual-path design.

Validated:
- 'docker buildx build --check' clean on both Dockerfiles
- 'docker compose config' resolves proxy vars in all 4 services'
  build.args (build-time) AND environment: (runtime) — both default-empty
  and override paths
- backend/tests/unit/test_dockerfile_runtime_stage.py: 3/3 pass

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
@SoundMindsAI

Copy link
Copy Markdown
Owner Author

Adjudication of Gemini Code Assist review

Per CLAUDE.md ("Before considering a PR ready to merge"). 7 findings — all accepted, all addressed in 77f81542.

# File:Line Severity Verdict Resolution
1 Dockerfile:107 HIGH Accept Removed Dockerfile ARG + ENV blocks; rely on BuildKit predefined ARGs
2 ui/Dockerfile:67 HIGH Accept Same — removed deps-stage ARG + ENV
3 ui/Dockerfile:97 HIGH Accept Removed runner-stage ARG + ENV
4 docker-compose.yml:75 HIGH Accept Added environment: proxy block to all 4 services
5 .env.example:152 MED Accept Added host.docker.internal to default no_proxy
6 Dockerfile:55 MED Accept Added host.docker.internal to comment example
7 deployment.md:238 MED Accept Added host.docker.internal to docs example + gotcha paragraph

The crucial technical correction (Findings 1-4)

Docker treats http_proxy / https_proxy / no_proxy (+ uppercase + FTP/ALL variants) as predefined ARGs: BuildKit forwards them from --build-arg into every RUN step's environment automatically — no ARG declaration needed in the Dockerfile — and intentionally excludes them from docker history so the proxy URL never gets baked into the image.

My initial design used explicit ARG http_proxy= + ENV http_proxy=${http_proxy} blocks in both Dockerfiles, which were both redundant (BuildKit handles it without them) AND counterproductive (the ENV baked the URL into the runtime image, making it environment-coupled and persistent in docker inspect).

The corrected architecture:

  • Build-time: Compose build.args: (already wired) → BuildKit predefined-ARG forwarding → RUN steps see the value. No ARG/ENV in Dockerfile.
  • Runtime: Compose environment: block on each service → container env at startup. Image stays portable; runtime settings live in compose where they belong.

Both paths read the same ${http_proxy:-} from .env, so a single http_proxy=… line in .env covers both build-time and runtime.

Findings 5-7 — host.docker.internal

Sharp catch on local-LLM dev. An operator using Ollama / LM Studio / vLLM via OPENAI_BASE_URL=http://host.docker.internal:11434 behind a corp proxy would have their local-machine LLM calls intercepted by the proxy without this exemption. Added to:

  • .env.example default no_proxy line + the gotcha block (with "Common gotchas" subsection listing cloud-metadata, VPC, and host.docker.internal cases)
  • Dockerfile explanatory comment
  • deployment.md override example + the gotcha paragraph
  • Plus a new "Architecture: build-time vs runtime" paragraph in deployment.md linking to Docker's predefined-ARGs reference

Validated

  • docker buildx build --check clean on both Dockerfiles
  • docker compose config resolves proxy vars in all 4 services' build.args (build-time) AND environment: (runtime) — both default-empty and override paths
  • backend/tests/unit/test_dockerfile_runtime_stage.py — 3/3 pass

PR is ready for merge once CI on 77f81542 goes green.

@SoundMindsAI

Copy link
Copy Markdown
Owner Author

Adjudication of 2 follow-up Gemini findings on 77f81542

Both pinned to 77f81542 — same SHA as branch HEAD — but pointing at fixes that are already present elsewhere in the same commit. Both rejected as stale (bot false positives).

# File:Line Severity Verdict Counter-evidence
A docker-compose.yml:79 ("add environment: proxy block") HIGH Reject — stale All 4 services already have http_proxy/https_proxy/no_proxy + uppercase in their environment: blocks. See docker-compose.yml:144-149 (api), :93-98 (migrate), :226-231 (worker), :271-276 (ui). grep -c "http_proxy: \${http_proxy:-}" docker-compose.yml = 8 (4 × build.args + 4 × environment), confirming both layers are wired.
B Dockerfile:60 ("add host.docker.internal to comment") MED Reject — stale host.docker.internal already present in the cited comment at Dockerfile:59 (in the no_proxy=… example), plus lines 64, 67, 68 as part of the gotcha explanation. grep -n "host.docker.internal" Dockerfile returns 4 matches.

Both look like the bot re-scanned the build.args block in isolation and flagged based on it alone, without cross-checking the environment: block 60 lines further down (Finding A) and without scanning the full no_proxy=… comment line (Finding B). Pattern matches the project's known "Gemini re-flags resolved findings on the new SHA" failure mode.

No file changes; merging.

@SoundMindsAI SoundMindsAI merged commit 4ff6e33 into main Jun 16, 2026
19 checks passed
@SoundMindsAI SoundMindsAI deleted the chore_dockerfile_http_proxy_args branch June 16, 2026 15:46
SoundMindsAI added a commit that referenced this pull request Jun 16, 2026
Update state.md current-branch / execution context to reflect the
4ff6e33 merge, prepend a one-line entry to "Last 5 merges", drop the
now-6th row (bug_seed_meaningful_demos_silent_bulk_errors PR #482) into
the state_history.md older-entries reference, and add the full reasoning
entry to state_history.md.

The narrative covers the Gemini-driven design correction (Dockerfile
ARG+ENV blocks removed in favor of BuildKit predefined-ARGs for
build-time + compose environment: for runtime), the UV_REGISTRY →
GHCR_REGISTRY rename rationale, the `host.docker.internal` local-LLM
gotcha, the no_proxy Compose-service-names gotcha, and the two
stale-rejected follow-up Gemini findings.

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request Jun 16, 2026
…) (#522)

Update state.md current-branch + execution context, prepend the new
one-line entry to "Last 5 merges", drop the now-6th row
(chore_overnight_result_card_screenshot PR #492) into the older-entries
reference, and add the full reasoning entry to state_history.md.

The narrative covers the live corp-firewall reproducer (`make up` failing
at line 1 with 403 on docker.io/docker/dockerfile:1.7), the architectural
reason ARGs cannot fix it (syntax directive parses before ARGs are in
scope), the safety analysis (no BuildKit-1.7+ features used), and frames
this as the third PR completing today's corp-proxy install story alongside
#517 + #519.

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request Jun 16, 2026
…gnostics + runbook (#523)

* chore(docker): optional corp CA cert via build secret (TLS interception)

Add optional corporate CA certificate support for installs behind a
corp HTTPS proxy that performs TLS interception (re-signs traffic with
an internal CA the container does not trust). Empty file = no-op
(OSS users unaffected); operators behind a TLS-intercepting proxy
drop their PEM cert at ./secrets/corp_ca.crt and re-run 'make up'.

The error signatures this fixes:
- npm/pnpm: SELF_SIGNED_CERT_IN_CHAIN
- curl/openssl: unable to get local issuer certificate
- Python (requests/httpx/openai): CERTIFICATE_VERIFY_FAILED
- Go: x509: certificate signed by unknown authority

Mechanism. The cert is mounted as a BuildKit secret via
'--mount=type=secret,id=corp_ca,target=/tmp/corp_ca.crt,required=false'
in both Dockerfiles. When non-empty, copied to
/usr/local/share/ca-certificates/corp_ca.crt and `update-ca-certificates`
rebuilds /etc/ssl/certs/ca-certificates.crt. Every HTTPS tool in the
container then trusts the corp CA at BOTH build time AND runtime
(the cert content is baked into the system trust bundle).

Touched:
- Dockerfile: install block in base stage after the apt 'ca-certificates'
  install (inherited by deps + runtime via 'FROM base'; one block covers
  all backend build steps + runtime egress).
- ui/Dockerfile: install block in deps stage (build-time, before
  'npm install -g pnpm@9' — the canonical TLS-interception failure
  point); SECOND install in runner stage (runtime egress; runner is a
  fresh FROM, doesn't inherit from deps).
- docker-compose.yml: new top-level 'secrets.corp_ca' entry pointing at
  './secrets/corp_ca.crt'; 'build.secrets: - corp_ca' on all 4 services
  (migrate / api / worker / ui).
- scripts/install.sh: auto-create empty './secrets/corp_ca.crt'
  placeholder so Compose's secrets validation doesn't fail at startup.
- .env.example: explanatory block describing the symptom + the file
  path (no env var — the cert is a file, not a string).
- docs/01_architecture/deployment.md: new 'Corporate TLS interception'
  subsection with error signatures table, mechanism, one-time setup,
  verification, and cross-reference to the upcoming runbook.
- docs/03_runbooks/local-dev.md: troubleshooting bullet pointing to
  the runbook for the TLS-interception case.

Validated:
- 'docker buildx build --check' clean on both Dockerfiles
- 'docker compose config --quiet' clean (new build.secrets block parses)
- backend/tests/unit/test_dockerfile_runtime_stage.py: 3/3 pass

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* chore(install): add diagnose_build_failure wrapper to install.sh

When 'docker compose build' fails inside scripts/install.sh, wrap the
output and scan for known corp-network failure signatures, then print
an actionable diagnostic block pointing at the specific runbook section.
Underlying tool errors (SELF_SIGNED_CERT_IN_CHAIN from npm, "403
Forbidden" from registry-1.docker.io, "Could not resolve host" from
curl) are technically correct but operationally useless — a developer
seeing them does not know to drop a corp CA cert at ./secrets/corp_ca.crt
or set BASE_REGISTRY in .env. This wrapper closes the gap.

Three detection patterns (each prints a tailored fix + runbook pointer):

1. TLS interception (corp HTTPS proxy with internal CA)
   Matches: SELF_SIGNED_CERT_IN_CHAIN, self-signed cert in chain,
            unable to get local issuer certificate, CERTIFICATE_VERIFY_FAILED,
            certificate verify failed, x509: certificate signed by unknown
   Hint: drop corp CA at ./secrets/corp_ca.crt

2. Container registry blocked (BASE_REGISTRY / GHCR_REGISTRY not set
   or set to wrong path)
   Matches: failed to resolve source metadata for docker.io,
            registry-1.docker.io 401/403, no such host (docker.io|ghcr.io)
   Hint: set BASE_REGISTRY + GHCR_REGISTRY in .env

3. Outbound HTTP blocked (apt/PyPI/npm can't reach upstream)
   Matches: Could not resolve host, Temporary failure resolving,
            dial tcp.*no such host, Connection refused/timed out,
            ETIMEDOUT, ECONNREFUSED
   Hint: set http_proxy + https_proxy + no_proxy in .env

Fail-open fallback: if no pattern matches, print a generic pointer at
docs/03_runbooks/corporate-network-install.md (the FAQ) and
docs/03_runbooks/local-dev.md "Stack will not start" (general
troubleshooting). The runbook file is added in a follow-up commit on
this same branch so the pointer resolves the moment this PR merges.

Implementation. Wraps the existing 'docker compose build' call with
'docker compose build 2>&1 | tee "$build_log"' captured to a temp file;
the 'if !' guard catches the pipeline exit (set -e + pipefail is on);
on failure, runs the three pattern detectors then 'exit 1'. The wrapper
preserves existing CI behavior — 'RELYLOOP_SKIP_BUILD=1' still skips the
build entirely.

Validated:
- 'bash -n scripts/install.sh' clean
- Smoke-tested all 3 detection paths + the fallback against synthetic
  log fixtures: each pattern fires the right diagnostic block; the
  fallback fires when none match

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* docs(runbook): corporate-network-install symptom-first FAQ

New runbook for operators running 'make up' from inside a corporate
network. Symptom-first layout (paste error block → find section → follow
fix). Covers every corp-network failure mode we've seen during today's
work (PRs #517 / #519 / #521 / this PR's earlier commits):

§1 Registry pull failures
   - "403 Forbidden" / "401 Unauthorized" from registry-1.docker.io
   - "failed to resolve source metadata for docker.io/..."
   - "no such host: registry-1.docker.io" / "no such host: ghcr.io"
   - Fix: BASE_REGISTRY + GHCR_REGISTRY in .env
   - Artifactory layout disambiguation (unified vs split per upstream)

§2 TLS verification errors
   - SELF_SIGNED_CERT_IN_CHAIN (npm/pnpm)
   - "unable to get local issuer certificate" (curl/openssl)
   - "x509: certificate signed by unknown authority" (Go)
   - CERTIFICATE_VERIFY_FAILED (Python)
   - Fix: drop corp CA at ./secrets/corp_ca.crt
   - Four ways to find the corp CA (IT, Chrome/Edge, Firefox, openssl probe)
   - Verification commands

§3 Egress / DNS failures
   - "Could not resolve host"
   - "Connection refused" / "Connection timed out" / ETIMEDOUT / ECONNREFUSED
   - Fix: http_proxy + https_proxy + no_proxy in .env
   - The no_proxy three-category checklist (Compose service names,
     host.docker.internal, cloud-metadata + VPC)

§4 Worker stays "unhealthy" after make up succeeds
   - Cause: no_proxy missing Compose service names
   - Fix: add postgres,redis,elasticsearch,opensearch,solr,api,worker,migrate

§5 Runtime calls to OpenAI / GitHub fail
   - Cause A: host env vars set instead of .env
   - Cause B: TLS interception on the runtime path
   - Verification commands for both

Plus quick decision tree at the top, a verifying-your-full-config
one-shot, and cross-refs to deployment.md (architecture) and local-dev.md
(general troubleshooting).

The runbook is referenced by:
- scripts/install.sh's diagnose_build_failure (commit B) — all three
  detection branches plus the fallback point at specific sections.
- docs/01_architecture/deployment.md "Corporate TLS interception" §
  (added in commit A) — for symptom lookup.
- docs/03_runbooks/local-dev.md "Stack will not start" troubleshooting
  bullets (existing pattern, extended in commit A and again in commit D).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* docs(claude-md): index corporate-network-install runbook

Adds a row to the "Key Runbooks" table pointing at the new
docs/03_runbooks/corporate-network-install.md (added in commit C of
this branch). Keeps the table the canonical "where do I look when ..."
index — without this row, the new runbook is only discoverable by
the in-doc cross-references (deployment.md + local-dev.md) and by
the install.sh diagnostic output. Adding it to CLAUDE.md makes it
discoverable to anyone reading the project's top-level conventions.

The other doc surfaces (deployment.md "Corporate TLS interception" §,
local-dev.md "Stack will not start" troubleshooting bullets) were
updated in commit A of this branch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* fix(install): wrap docker compose build in function to satisfy CI guard

CI guard scripts/ci/verify_install_builds_all_services.sh enforces that
scripts/install.sh contains a top-level `docker compose build` line via
regex `^[[:space:]]*docker compose build( .*)?$`. My earlier wrapper
(commit fe8d5b4 on this branch) used `if ! docker compose build 2>&1 |
tee "$build_log"; then`, which fails the anchor because of the `if !`
prefix.

Fix: move the bare `docker compose build` invocation into a wrapper
function `do_compose_build()`; the calling site then runs
`do_compose_build 2>&1 | tee "$build_log" || build_status=$?` and checks
PIPESTATUS[0] to detect failure. The bare line inside the function body
satisfies the guard regex; the calling-site pipe is unchanged in
behavior (same output capture, same diagnostic-on-failure path).

Validated:
- `bash scripts/ci/verify_install_builds_all_services.sh` — OK (no-args = builds all)
- `bash -n scripts/install.sh` — clean
- Diagnostic still fires on synthetic TLS-interception log fixture

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* chore(docker): adjudicate Gemini — accept 3 ACCEPT + 1 PARTIAL ACCEPT

PR #523 Gemini review (4 MEDIUM findings on f34a278):

Findings 2-4 (Dockerfile:119, ui/Dockerfile:66, ui/Dockerfile:92) —
ACCEPT. The 'update-ca-certificates 2>&1 | tail -1' chain was a tidiness
optimization (the command prints info about each CA processed); but the
RUN shell is /bin/sh with no pipefail, so a failed update-ca-certificates
exit would be silently masked by tail's exit code 0. Real correctness
issue — invalid corp CA cert would have shipped an image with the cert
NOT in the trust store but the build "succeeded." Dropped the pipe in
all 3 places; full update-ca-certificates output now prints (verbose >
silent failures).

Finding 1 (scripts/install.sh:253) — PARTIAL ACCEPT.

ACCEPT the substance: 'trap EXIT' is the right cleanup mechanism — robust
under Ctrl-C / signals / unexpected exits, eliminates the duplicate
'rm -f' calls. Added 'trap "rm -f \"$build_log\"" EXIT' right after the
mktemp; removed both stale 'rm -f' calls.

REJECT the suggested code as-written: Gemini's suggestion regresses to
'if ! docker compose build 2>&1 | tee "$build_log"; then' on a bare line,
which is the EXACT pattern the verify_install_builds_all_services CI
guard rejects (commit f34a278 on this branch fixed this by moving the
'docker compose build' invocation into a do_compose_build() function so
the bare line satisfies the guard regex). The function wrapper stays;
only the trap-based cleanup was adopted from the suggestion.

Validated:
- bash scripts/ci/verify_install_builds_all_services.sh — OK (no-args)
- bash -n scripts/install.sh — clean
- docker buildx build --check both Dockerfiles — clean

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

---------

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request Jun 16, 2026
)

Update state.md current-branch + execution context to reflect the
6c5fac5 merge, prepend a one-line entry to "Last 5 merges", drop the
now-6th row (bug_cluster_url_ssrf_hostname_bypass Phase 1 PR #510) into
the state_history.md older-entries reference, and add the full reasoning
entry to state_history.md.

The narrative covers the three bundled improvements (CA cert build
secret, diagnose_build_failure wrapper, symptom-first runbook), the
CI guard regex chronology (why commit B failed, what f34a278 fixed),
the Gemini adjudication (4 MEDIUM accepted in 3d6e8bc: 3 pipe-masks
real, trap EXIT substance accepted but if-! suggestion rejected), the
stale Gemini follow-up (4 stale re-flags on 3d6e8bc, all rejected
with cited counter-evidence), and frames this as the closer of today's
four-PR corp-network install story (#517 + #519 + #521 + #523).

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request Jun 16, 2026
Update state.md current-branch + execution context, prepend the new
one-line entry to "Last 5 merges", drop the now-6th row
(chore_dockerfile_http_proxy_args PR #519) into the state_history.md
older-entries reference, and add the full reasoning entry to
state_history.md.

The narrative covers the user-reported staleness (relyloop.com footer
pinned at f733fcc / PR #509 / "7 days ago"), the root cause (paths
filter correctly fires only on website/** changes; today's chore PRs
didn't touch website/), the three options considered, the timing
rationale (03:17 UTC for the project's "minute 17" convention, offset
from existing crons, pre-dawn GHA low-load window), the cost (~30s/day
of GHA time), and frames this PR as closing the "public visibility"
gap left by today's otherwise-thorough install story.

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant