Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,16 @@ STUDIES_DEFAULT_TIMEOUT_S=60
#
# Empty / unset → no proxy (unchanged behavior).
#
# ⚠️ SHELL OVERRIDES THIS FILE. Docker Compose resolves `${no_proxy:-}` from
# the shell environment FIRST and only falls back to this .env when the shell
# doesn't define it. Corporate machines often export `no_proxy`/`http_proxy`
# globally (~/.zshrc, /etc/profile, MDM) — that value WINS and your edit here
# is silently ignored. Authoritative check is the running container, not .env:
# docker compose exec api env | grep -i no_proxy # must list elasticsearch,opensearch,solr
# If it doesn't match this file, set the value in your shell (export it, then
# `docker compose up -d --force-recreate api worker`). See
# docs/03_runbooks/corporate-network-install.md §7.
#
# IMPORTANT — `no_proxy` MUST include the Compose service names. Without
# them, the worker's HTTP call to `http://elasticsearch:9200` (and similar
# in-network calls) gets routed to the corporate proxy, which has no path
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@ def test_hint_when_engine_hosts_not_exempt(monkeypatch: pytest.MonkeyPatch) -> N
# Names the actual missing hosts and points at the recreate step.
assert "elasticsearch" in hint and "opensearch" in hint and "solr" in hint
assert "force-recreate" in hint
# Calls out the shell-overrides-.env trap (the #1 "I set it but it didn't
# take" cause) and points at the runbook section that covers it.
assert "shell" in hint.lower() and "§7" in hint


def test_no_hint_when_engine_hosts_exempt(monkeypatch: pytest.MonkeyPatch) -> None:
Expand Down
66 changes: 65 additions & 1 deletion docs/03_runbooks/corporate-network-install.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ make up succeeds, but...

Runtime calls to OpenAI / GitHub / clusters fail
──→ Runtime egress not proxied (§5)

You set no_proxy in .env but `docker compose exec api env | grep no_proxy`
doesn't show your service names (engines unreachable / worker unhealthy /
seed skips every scenario despite a correct .env)
──→ Shell no_proxy overrides .env (§7)
```

If the wrapper around `docker compose build` ([`scripts/install.sh`](../../scripts/install.sh) `diagnose_build_failure`) detects a known signature, it prints a diagnostic block pointing at the right section here. Sections below have more detail than the inline diagnostic.
Expand Down Expand Up @@ -261,6 +266,8 @@ no_proxy=<your-corp-domains>,localhost,127.0.0.1,10.0.0.0/8,169.254.169.254,host

These are passed to BuildKit as **predefined ARGs** — BuildKit auto-forwards them into every `RUN` step's environment without requiring `ARG` declarations in the Dockerfile, AND intentionally excludes them from `docker history`, so the proxy URL never gets baked into the image.

> ⚠️ **If your shell already exports `no_proxy`/`http_proxy` (common on corporate-managed machines), that shell value OVERRIDES whatever you put in `.env`** — Docker Compose resolves `${no_proxy:-}` from the shell first. So a `.env` edit can be silently ignored. If you set these in `.env` but `docker compose exec api env | grep no_proxy` doesn't reflect it, see **§7**.

### The `no_proxy` checklist — three categories you must include

1. **Compose service names** (`postgres,redis,elasticsearch,opensearch,solr,api,worker,migrate`). Without these, the worker's call to `http://elasticsearch:9200` gets routed through the corp proxy, which has no path to those Compose-internal hostnames. The worker stays unhealthy. See §4.
Expand Down Expand Up @@ -306,7 +313,7 @@ Then `docker compose restart worker` to pick up the new env var (no rebuild need
docker compose exec worker env | grep -i proxy
```

You should see `no_proxy` with all the service names.
You should see `no_proxy` with all the service names. **If you added them to `.env` but they're still missing here, a shell-exported `no_proxy` is overriding your `.env` — see §7.**

---

Expand Down Expand Up @@ -412,6 +419,63 @@ Both should show your mirror URL (or the public default if you haven't overridde

---

## §7 — You edited `no_proxy` in `.env` but the container doesn't have it (shell overrides `.env`)

### Symptom

You added the Compose service names to `no_proxy` in `.env` (per §3/§4), saved the file, recreated the containers — and the engines are **still** unreachable: the worker stays unhealthy (§4), or `make seed-demo` skips every scenario with "engine unreachable" (and prints the corp-proxy hint), or studies can't reach the cluster. The giveaway:

```
$ docker compose exec api env | grep -i no_proxy
no_proxy=your-corp.com,10.0.0.0/8,169.254.169.254 # ← the service names from your .env are NOT here
```

The value **inside the container** is your corporate `no_proxy`, not the one you put in `.env`.

### Cause

Docker Compose resolves `${no_proxy:-}` in `docker-compose.yml` from the **shell environment first**, and only falls back to the `.env` file when the shell does **not** define the variable. Corporate-managed macOS/Linux machines very often export `no_proxy` (plus `http_proxy`/`https_proxy`) **globally** — via `~/.zshrc`, `~/.zprofile`, `/etc/profile`, or an MDM/Jamf profile. That shell value **wins**, so your `.env` edit is silently ignored.

> **It's the lowercase `no_proxy` that matters.** `docker-compose.yml` interpolates `${no_proxy}` (lowercase) for *both* the `no_proxy` and `NO_PROXY` container vars, and `.env` defines `no_proxy` (lowercase). So only a **non-empty lowercase `no_proxy` in the host shell** triggers this override — if the shell has only an uppercase `NO_PROXY` set (and lowercase `no_proxy` is empty/unset), Compose still falls back to your `.env`. Corporate setups usually export both, but the lowercase one is the one Compose reads.

This is the #1 "I set it but it didn't take" trap. Your `.env` is correct; it's just being overridden. Confirm what your **shell** exports (run on the host, NOT inside a container):

```bash
echo "no_proxy=$no_proxy" # ← this is the one Compose reads; if non-empty and
# missing the service names, that's the override
echo "NO_PROXY=$NO_PROXY" # informational (Compose doesn't read this for ${no_proxy})
```

### Fix

Append the Compose service names to the **shell** value (preserving your corporate entries), then recreate:

```bash
export no_proxy="${no_proxy},postgres,redis,elasticsearch,opensearch,solr,api,worker,migrate,host.docker.internal"
export NO_PROXY="$no_proxy"

docker compose up -d --force-recreate api worker
make seed-demo FORCE=1 # if you were seeding demo data
```

**Make it durable:** add that same `export` line to your shell startup (`~/.zshrc` / `~/.zprofile`), placed **after** any corporate `no_proxy` export so it appends rather than gets clobbered by a later corporate line.

Alternative: `unset no_proxy NO_PROXY` before `docker compose` lets your `.env` value win — but that drops your corporate entries (VPC ranges, cloud metadata) for that command, so appending to the shell value is the safer choice.

### Verify

```bash
docker compose exec api env | grep -i no_proxy # must now include elasticsearch,opensearch,solr
```

**This is the authoritative check** — `.env` containing the service names is NOT sufficient if a shell variable overrides it. Always confirm against the running container's environment, not the `.env` file.

### Background

Docker Compose [environment-variable precedence](https://docs.docker.com/compose/how-tos/environment-variables/envvars-precedence/): for `${VAR}` interpolation in the Compose file, the host shell environment takes precedence over the `.env` file.

---

## Verifying your full config in one shot

```bash
Expand Down
17 changes: 12 additions & 5 deletions scripts/seed_meaningful_demos.py
Original file line number Diff line number Diff line change
Expand Up @@ -3161,17 +3161,24 @@ def _is_exempt(host: str) -> bool:
missing = sorted(h for h in engine_hosts if not _is_exempt(h))
if not missing:
return None
service_names = "postgres,redis,elasticsearch,opensearch,solr,api,worker,migrate"
return (
"\nLikely cause — corporate proxy: http_proxy is set but no_proxy does "
f"NOT exempt the engine host(s) {', '.join(missing)}, so the in-network "
"engine probes are routed to the proxy (which can't reach Compose "
"service names) and Healthy engines read as unreachable. Fix: add the "
"Compose service names to no_proxy in .env "
"(postgres,redis,elasticsearch,opensearch,solr,api,worker,migrate,"
"host.docker.internal), recreate the containers "
f"Compose service names to no_proxy ({service_names},host.docker.internal), "
"recreate the containers "
"(`docker compose up -d --force-recreate api worker`), then re-run "
"`make seed-demo FORCE=1`. See "
"docs/03_runbooks/corporate-network-install.md §4."
"`make seed-demo FORCE=1`.\n"
"NOTE: if you ALREADY added these to .env, a shell-exported no_proxy is "
"overriding it — Docker Compose reads ${no_proxy} (lowercase) from the "
"shell BEFORE .env. Set it in your shell instead, e.g. "
f'`export no_proxy="$no_proxy,{service_names},host.docker.internal"`, '
"then recreate and verify with "
"`docker compose exec api env | grep -i no_proxy`. See "
"docs/03_runbooks/corporate-network-install.md §7 (shell overrides .env) "
"and §4."
)


Expand Down
Loading