diff --git a/bundle.yaml b/bundle.yaml index cee28e1..c1ea33a 100644 --- a/bundle.yaml +++ b/bundle.yaml @@ -118,11 +118,12 @@ vault: charm: vault - # D-068 (2026-07-02): pinned 1.8/stable -> 1.16/stable. 1.8.8 is EOL. NB: on the LIVE env this - # is a MAJOR upgrade (multi-minor jump; requires unseal keys ready, storage compat check, and - # re-unseal after restart) -- NOT a casual `juju refresh`. Verify channel exists + rehearse the - # upgrade before applying live. Clean pin for Roosevelt/redeploy. See D-068. - channel: 1.16/stable + # D-068 AMENDMENT (2026-07-05): reverted 1.16/stable -> 1.8/stable. The 1.16 line is a + # fundamentally different charm (certs interface V0->V1, Raft-only storage, no upgrade + # path, broken with Charmed Ceph) -- INCOMPATIBLE with this reactive cloud; a deployable + # bundle MUST use 1.8/stable. 1.8.8 is EOL (tech debt); getting off it is an OPEN D-068 + # problem with 1.16 ruled out. See docs/D-068-vault-1.8-vs-1.16-analysis.md. BUNDLEFIX-010. + channel: 1.8/stable num_units: 1 # 3 on Roosevelt (D-009); HA backend decided there (C1) to: [lxd:11] bindings: diff --git a/docs/design-decisions.md b/docs/design-decisions.md index 590a375..85cb97e 100644 --- a/docs/design-decisions.md +++ b/docs/design-decisions.md @@ -1644,3 +1644,30 @@ **Related:** D-002 (channel pins), D-068 (Vault channel move, the non-routine counter-example), D-070 (restore posture), DOCFIX-086 (ops-update-procedure runbook), appendix-B (revision re-baseline policy). + +## D-068 -- AMENDMENT (2026-07-05): vault 1.16 forward-pin is NOT viable; revert bundle to 1.8/stable + +Item 1 ("Vault version") pinned the bundle to `1.16/stable` as a forward-pin off EOL Vault 1.8.8, +treating the move as "a MAJOR operation to rehearse." Deep-dive research (see +`docs/D-068-vault-1.8-vs-1.16-analysis.md`) shows it is not a hard *upgrade* but a largely- +**incompatible** different charm for this reactive Charmed OpenStack cloud: + +- certificates interface changed **V0 -> V1** -- the 18 `vault:certificates` client charms are V0 + requirers and cannot bind a V1-only provider (the entire cloud TLS/CA layer fails to connect); +- storage went **Raft-only**, dropping the MySQL backend this deployment uses (`vault-mysql-router`); +- **no upgrade path** (backup-on-1.8 / restore-on-new-1.15, a CA-replacement migration); +- Canonical community notes report **new-vault + Charmed Ceph broken** (this cloud uses Ceph); +- Vault 1.15+ payload is **BUSL 1.1** (was MPL) -- commercial-licensing review item. + +**Amendment:** (a) a deployable bundle MUST pin vault `1.8/stable` (the reactive, integration- +compatible charm) -- revert the `1.16/stable` bundle pin; (b) `1.16` is **ruled out** as the vault- +modernization path for this reactive cloud unless/until the OpenStack service charms gain +tls-certificates V1 support; (c) "get off EOL 1.8" stays **OPEN** as a Roosevelt-durability problem +(candidates: wait for OpenStack V1 support; OpenBao/MPL fork; or explicit EOL risk-acceptance for VR0 +with a production remediation deadline -- operator/security call, not resolved here). + +This also corrects the D-002 amendment note (item 3): the vault row reverts to `1.8/stable`; the +`1.16/stable` supersession is withdrawn. Bundle change coordinated with Code / applied at reconciliation. + +**Related:** D-068 (this amends item 1), D-002 amendment (vault row), D-067/BUNDLEFIX-007 (vault-kv), +D-069 (unseal). **Evidence:** `docs/D-068-vault-1.8-vs-1.16-analysis.md`. diff --git a/docs/session-ledger.md b/docs/session-ledger.md index 1388a88..91db2af 100644 --- a/docs/session-ledger.md +++ b/docs/session-ledger.md @@ -28,7 +28,8 @@ _(D-063 is CLOSED as of 2026-07-03.)_ - **OPEN security rows:** SEC-001 (libvirt cred rotate), SEC-003 (Vault unseal custody / second-person rehearsal), SEC-004 (repo public -> private at v1 close). -- **Next-free numbers:** D = 072, DOCFIX = 088, BUNDLEFIX = 010. +- **Next-free numbers:** D = 072, DOCFIX = 088, BUNDLEFIX = 011. (BUNDLEFIX-010 + consumed by the jumphost stream, addendum 15: vault bundle revert.) The 2026-07-03 D-071 contention is RESOLVED: the jumphost stream filed D-071 + DOCFIX-086 (ops-update-procedure) in changelog addendum 13 (2026-07-04); main stream numbering resumes at the values above. The wrapped-pointer scan artifact @@ -123,8 +124,17 @@ 124->152, octavia-diskimage-retrofit 196->232, cinder 733->820, cinder-ceph 533->568, barbican 209->265, barbican-vault 75->99. All probes OK; boundary cloud-assert PASS. -- **NEXT:** G2 octavia 441->542 (gated), G3 dashboards, G4 nova-compute; then - post-verify (S5), re-baseline + close-out (S6). +- **DONE Section 4 G2:** octavia 441->542; LBs 2/2 ACTIVE/ONLINE; cloud-assert PASS. +- **DONE Section 4 G3:** dashboards 728->750 / 59->122 / 120->168; D-044 override + INTACT; HTTP+login healthy. RCA: VIP HTTPS dead SINCE DEPLOY (haproxy 443 backend + targets vhost-less internal addr; L4 check masks; phase-03 3.3 check fails open) -- + NOT a G3 regression; G3 stands (operator ruling; addendum 15). Upstream bug + + 2 DOCFIX candidates queued post-window. +- **DONE (coordinated):** vault bundle revert 1.16->1.8/stable (BUNDLEFIX-010) + + D-068 AMENDMENT recorded; 1.16 ruled out (certs V0/V1, Raft-only, Ceph, BUSL); + "off EOL 1.8" remains OPEN. Evidence: docs/D-068-vault-1.8-vs-1.16-analysis.md. +- **NEXT:** G4 nova-compute 827->894 (approved); then post-verify (S5), + re-baseline + close-out (S6). - **Window findings logged for close-out (do not action mid-window):** 1. magnum can-upgrade-to anomaly REPRODUCED (points at magnum-dashboard-122; evidence ~/openstack-baseline/magnum-can-upgrade-anomaly-20260705.json) AND diff --git a/docs/v1-redeploy-changelog.md b/docs/v1-redeploy-changelog.md index b721c19..e46c97f 100644 --- a/docs/v1-redeploy-changelog.md +++ b/docs/v1-redeploy-changelog.md @@ -1874,3 +1874,49 @@ REVERT: git checkout HEAD~ -- scripts/cloud-assert.sh tests/cloud-assert/run-tests.sh Next-free after this push (per scan, keep this line unwrapped): D-072, DOCFIX-088, BUNDLEFIX-010. + +### 2026-07-05 (addendum 15, jumphost stream) -- G2/G3 complete + dashboard-TLS RCA + vault bundle revert (BUNDLEFIX-010) + +Identifier consumed: BUNDLEFIX-010 (vault bundle revert; scan-verified next-free). +D-068 AMENDMENT recorded (operator ruling; no new D number -- amends D-068 item 1). + +WINDOW PROGRESS (ops-update-20260705): G2 octavia 441->542 (LBs 2/2 ACTIVE/ONLINE, +cloud-assert PASS incl. A6). G3 dashboards openstack-dashboard 728->750, +magnum-dashboard 59->122, octavia-dashboard 120->168 (settled; HTTP + D-044 login +path healthy; D-044 override INTACT at its canonical /usr/share path). G3 stands -- +NO revert -- per the RCA below and operator ruling. + +RCA FINDING (dashboard VIP HTTPS dead SINCE DEPLOY -- not a G3 regression): charm +diff 728 vs 750 contains NO vhost/template/binding change; the charmhelpers +LP-1952414 trail log shows 2827 identical renders since 2026-06-30 -- the +internal-plane SSL vhost NEVER existed. Chain: dashboard charm (no os-*-network +options, public-only extra-binding) renders SSL vhosts only for 10.12.4.157/.8.182 +while charm-rendered haproxy targets its 443 backend at the cluster-binding address +10.12.12.112:433; apache answers that socket as PLAINTEXT main-server (proven: +plain HTTP 200 on :433) and haproxy's L4-only check stays green -- TLS clients die +in handshake (curl 000) on every dashboard VIP. Deploy gate phase-03 3.3 never +caught it because its https CHECK FAILS OPEN (empty curl capture -> "OK: csrftoken +not Secure" printed on transport failure). + +Logged, not actioned (candidates for post-window work): (1) DOCFIX: phase-03 3.3 +check fail-closed rewrite; (2) DOCFIX: ops-update-procedure G3 probe corrected to +the D-044 http-leg + login (https-VIP assumption wrong; also fold the create-backup +and format=line findings from earlier in the window); (3) upstream bug to OpenStack +Charmers (haproxy TLS backend on a vhost-less address; L4 check masks it) -- +evidence package: this entry + unit forensics; (4) long-term fix ruling: rebind +(BUNDLEFIX-class) vs Roosevelt edge-TLS root-fix (D-044 trajectory). + +VAULT BUNDLE REVERT (BUNDLEFIX-010, coordinated with the main stream / Claude +chat research): bundle.yaml vault channel 1.16/stable -> 1.8/stable + comment +replaced with the D-068-amendment rationale. docs/design-decisions.md gains +## D-068 -- AMENDMENT (2026-07-05) (operator text verbatim): 1.16 is an +INCOMPATIBLE different charm for this reactive cloud (certs V0->V1 vs 18 V0 +requirers, Raft-only storage vs MySQL backend, no upgrade path, Ceph breakage +reports, BUSL licensing review); deployable bundles MUST pin 1.8/stable; "get off +EOL 1.8" stays OPEN with 1.16 ruled out; the D-002-amendment item-3 supersession +note is withdrawn (vault row = 1.8/stable). Evidence: +docs/D-068-vault-1.8-vs-1.16-analysis.md (committed by operator). Gates: bundle +invariants PASS, channel assert PASS (29 pins), repo-lint 0 fail. +REVERT: git checkout HEAD~ -- bundle.yaml docs/design-decisions.md + +Next-free after this push (per scan, keep this line unwrapped): D-072, DOCFIX-088, BUNDLEFIX-011.