diff --git a/docs/design-decisions.md b/docs/design-decisions.md index 2aa225a..a97091f 100644 --- a/docs/design-decisions.md +++ b/docs/design-decisions.md @@ -1085,6 +1085,40 @@ **Related:** D-035 (the mgmt VM), D-039 (per-cluster app-creds), phase-06 (SG creation), phase-07 Step 7.1 (DOCFIX-063 verify-first). +### D-063 -- AMENDMENT (2026-07-03): sources MEASURED; option (a) adopted as (a-amended) + +Measurement (jumphost session 2026-07-03; committing this amendment constitutes adoption): +1. Route: magnum/0 -> FIP 10.12.7.222 is ON-LINK via its provider leg (src 10.12.4.154) -- the + FIP lives inside 10.12.4.0/22, so no router/SNAT exists on that path. +2. 170s SYN capture on the mgmt VM (dst 6443, pod-net 10.1.0.0/16 and loopback excluded), + with a deliberate operator kubectl trigger mid-window. Observed sources: + 3x 10.12.4.154 -- magnum conductor health poll, ~60s cycle (route prediction CONFIRMED) + 2x 10.12.4.1 -- the OPERATOR kubectl trigger, arriving MASQUERADED to the provider + gateway by the upstream router (jumphost 10.17.11.246 is erased) + 2x 10.20.0.207 -- the VM hairpinning its own FIP (DNAT returns with the fixed IP as + source); SG rules DO apply to hairpinned traffic + Falsified along the way: "the conductor holds a persistent apiserver connection" -- it is + transient per-cycle; instantaneous `ss` snapshots cannot capture the client set. Use the + window-capture method (tcpdump SYN filter over >=2 poll cycles) for any future SG sourcing. + +ADOPTED RULE SET (a-amended), 6443 ingress on capi-mgmt-sg: + - magnum unit provider addresses, /32 each, derived DYNAMICALLY from Juju at apply time + (PATTERN-1; per-UNIT -- Roosevelt's 3-unit magnum contributes 3 rules; enumerate, never + assume one). Value: blocks tenant-workload sources (tenant egress arrives as each tenant + router's own 10.12.4.x SNAT address, which is NOT in the set). + - 10.12.4.1/32 -- the operator path, EXPLICITLY DOCUMENTED AS COARSE: upstream masquerade + makes this equivalent to "any source routed in via the gateway". The SG cannot + distinguish the jumphost; do not pretend otherwise in docs or reviews. + - 10.20.0.0/24 (capi-mgmt-subnet) -- self/hairpin traffic; omitting it breaks on-VM use of + the external endpoint silently. +SSH (22) ingress: 10.12.4.1/32 only (same coarseness caveat). +NOT adopted: 0.0.0.0/0 stays out; option (b) (LB in front) remains the Roosevelt-scale +candidate if magnum unit churn makes /32 maintenance noisy. +Roosevelt note: the 10.12.4.1 masquerade is a VR0 substrate artifact. The rule set MUST be +re-derived from a fresh window capture in every environment -- never copied forward. +Apply is a SEPARATE gated mutation with its own verify-before/after block; this amendment +records the decision, not the change. **Status: ADOPTED (a-amended) on commit; apply pending.** + --- diff --git a/docs/v1-redeploy-changelog.md b/docs/v1-redeploy-changelog.md index b05e535..becccec 100644 --- a/docs/v1-redeploy-changelog.md +++ b/docs/v1-redeploy-changelog.md @@ -1506,3 +1506,32 @@ two consecutive mis-anchored source assertions (first-line-with-curl; regex that could not cross || at line end) were caught by running the harness before delivery. Parameter expansion beats regex for line-splitting assertions. + + +### 2026-07-03 (addendum 4) -- 8d verified; V-1 ruled; D-063 measured + adopted (a-amended) + +GATE 8d (acme offboard post-apply verification): ALL FOUR PASS -- domain gone, trust 8592ee35 +gone, NO orphan per-cluster trustee in the magnum domain (magnum's cluster delete cleaned the +trustee; the offboard-v2 orphan gap stays theoretical), beta untouched at +CREATE_COMPLETE/HEALTHY. This completes the acme offboard record opened in addendum 3. + +V-1 RULING (octavia controller cert SAN, handoff section 2): RISK-ACCEPTED for v1. +Deployed SAN reads DNS:octavia-controller.omega.dc0.vr0.cloud.neumatrix.local, +DNS:octavia.omega.dc0.vr0.cloud.neumatrix.local, IP:10.12.4.233 (stale literal from the prior +carve). Octavia is demonstrably green (beta kubeapi LB + lbtest service LB built to +ACTIVE/ONLINE this session). Mechanism note for the record: controller_cert.pem is presented +as a TLS CLIENT credential to the amphora agent (:9443); client-cert verification performs no +hostname/IP SAN matching, so the stale IP-SAN never enters a verification path in the working +direction. Regenerate at next redeploy; NO hot-rotation. Local generation copies exist under +~/octavia-pki/ (2026-05-29) as corroborating evidence. + +D-063: measured (route + 170s SYN window with operator trigger) and ADOPTED as (a-amended) -- +full measurement record and rule set in the D-063 amendment (design-decisions.md). Key +finding: upstream masquerade collapses ALL routed-in sources (incl. the jumphost) to +10.12.4.1, so the operator rule is documented-coarse; magnum /32s + capi-mgmt-subnet hairpin +complete the set. SG APPLY IS PENDING as its own gated mutation. Probe-method lesson recorded: +transient clients need window capture, not ss snapshots; and one-liner probes now get fixture +tests like scripts (three parse bugs in one measurement chain this session -- ss column drop +under state filters, cut on mapped-v6 colons, regex vs || line tails -- all caught, none live). + +Next-free: D-071, DOCFIX-084, BUNDLEFIX-009 (unchanged; D-063 amendment reuses its number).