diff --git a/docs/design-decisions.md b/docs/design-decisions.md index 691bf75..f5d2a02 100644 --- a/docs/design-decisions.md +++ b/docs/design-decisions.md @@ -1053,3 +1053,34 @@ (matching its own `machines:` 8-11 declaration; the live 0-3 numbering was a deploy-time artifact). **Related:** D-009 (3 units on Roosevelt), BUNDLEFIX (bundle reverted to 3-unit). + +--- + +## D-063: PROPOSED / OPEN -- tighten the capi-mgmt-cluster security group ingress for Roosevelt + +**Status:** PROPOSED / OPEN (recorded 2026-07-01, phase-07 conductor graft). No action taken on v1. + +**Context:** the phase-06 in-cloud CAPI management VM (`capi-mgmt-v2`, D-035) sits behind the +`capi-mgmt-sg` security group. As built by phase-06 (a CAPI/cluster-api default posture), that SG +opens BOTH `tcp/6443` (apiserver) AND `tcp/22` (ssh) to `0.0.0.0/0`. Phase-07 Step 7.1 relies on +this: the conductor reaches the apiserver via the FIP with no per-conductor rule (which is why the +old hardcoded `10.12.4.76/32` rule-add was dropped in DOCFIX-063). + +**Question:** on v1 (single-DC virtual rehearsal) the FIP is the sole access point and `0.0.0.0/0` +on 6443/22 is acceptable. For ROOSEVELT (commercial, multi-DC, HARD tenant isolation) an apiserver +and ssh open to any source that can route to the FIP is too broad. + +**Options (unresolved):** +- (a) Tighten `tcp/6443` ingress to the magnum-conductor's measured source (and any operator + tooling), and `tcp/22` to the ops/jumphost source only. Requires knowing the conductor's + post-NAT source as the mgmt VM sees it (measured, not the tenant/provider literal). +- (b) Front the mgmt apiserver with a scoped ingress (e.g. a dedicated LB/SG chain) rather than a + raw FIP with a wide SG. +- (c) Leave wide on the rehearsal, and make the tightened SG a phase-06 build step on Roosevelt. + +No decision made -- recorded as an open hardening item (cf. D-043, D-050, also pending). Whichever +option, phase-07 Step 7.1 stays verify-first (it does not depend on the SG being wide; if 6443 is +NOT already permitted it measures the source and adds exactly that rule). + +**Related:** D-035 (the mgmt VM), D-039 (per-cluster app-creds), phase-06 (SG creation), +phase-07 Step 7.1 (DOCFIX-063 verify-first). diff --git a/docs/v1-redeploy-changelog.md b/docs/v1-redeploy-changelog.md index 3543d10..e6cbce6 100644 --- a/docs/v1-redeploy-changelog.md +++ b/docs/v1-redeploy-changelog.md @@ -1223,8 +1223,62 @@ were dropped and a no-dynamic-discovery fidelity assertion added to both affected suites. capi-stack.sh was already faithful (pins from dependencies.json; no discovery). All three suites re-pass. +### 2026-07-01 -- Phase-07 conductor graft EXECUTED (Pattern-A rebuild); DOCFIX-063 + D-063 +Grafted the magnum-capi-helm CAPI driver onto the charm-managed conductor `magnum/0` (now at +10.12.12.107 on metal-internal, D-052) via gated copy-paste, paste-back confirmed each step: +- 7.0 domain-setup (D-046) PASS: domain `magnum` (d9d0a4a8...) + user `magnum_domain_admin` + (0885dca3...); `coe service list` = 1 row `up`, no 403. +- 7.1 SG authorize: NO-OP -- the phase-06 capi-mgmt-sg already opens tcp/6443 to 0.0.0.0/0; + conductor->FIP:6443 = TCP-OK with no new rule (the hardcoded 10.12.4.76/32 rule was stale AND + unnecessary -> DOCFIX-063 verify-first). +- 7.2 kubeconfig -> conductor PASS: /etc/magnum/kubeconfig 0600 magnum:magnum, sha256 match both + sides (26ed1091...6c11); helm auth-proof (run post-7.4) = 6 mgmt releases deployed, versions + match D-034 dependencies.json. +- 7.3 served versions PASS: `kubectl api-versions` shows v1beta1 SERVED for all core CAPI groups + (alongside preferred v1beta2); empty `api_resources={}` correct -> D-042 premise confirmed. +- 7.4 driver + helm PASS: helm v3.17.3 on the restricted init PATH (/usr/bin/helm, DOCFIX-035); + magnum-capi-helm 1.4.0; entry point k8s_capi_helm_v1. +- 7.6 [capi_helm] drop-in PASS (after dir-create); 7.7 conductor --config-dir PASS; 7.7b + keystone-v3 drop-in (D-047) PASS (v3 URLs derived from live config: public 10.12.4.50:5000/v3, + admin 10.12.8.50:35357/v3). +- 7.8 restart PASS: magnum-conductor + magnum-api both active, both live cmdlines carry + --config-dir; `magnum-driver-manage list-drivers` lists k8s_capi_helm_v1 (enabled). +- HEALTHY poll (7.8 tail) + 7.9 regression DEFERRED to phase-08 per fresh-deploy routing (no + cluster / capi-k8s-v1-34 template exists yet). + +DOCFIX-063 (phase-07 runbook reconciliation, six fixes folded from the as-run): +(1) Step 7.1 -> verify-first (drop the stale/unnecessary hardcoded 10.12.4.76/32 SG rule-add; + reachability check is the gate; a measured source rule is a fallback only); +(2) Step 7.2 helm auth-proof MOVED after Step 7.4 (helm is installed there; absent on a fresh + conductor -- integrity sha256 + 7.1 TCP already gate 7.2 without it); +(3) Step 7.3 probe -> `kubectl api-versions | grep cluster.x-k8s.io/` (api-resources shows only + the PREFERRED version, a FALSE "v1beta1 not served" when core groups prefer v1beta2); +(4) Step 7.6 -> `install -d /etc/magnum/magnum.conf.d` before the tee (the .conf.d dir is absent + on a fresh rebuild; the deb ships magnum.conf only; also the 7.7 --config-dir target); +(5) Step 7.5/7.6 ASCII checks -> `sudo` the grep (a non-sudo read of the root-owned /etc/magnum + path gave a FALSE "ASCII clean" on Permission-denied); +(6) Step 7.4 helm egress pre-check -> hit a real asset (bare get.helm.sh/ 404s misleadingly). +As-built refreshed: conductor magnum/0 10.12.4.76 -> 10.12.12.107; mgmt FIP example +10.12.5.103 -> 10.12.7.222 (per-rebuild; commands stay dynamic-from-env). + +D-063 (PROPOSED/OPEN): the phase-06 capi-mgmt-sg opens tcp/6443 AND tcp/22 to 0.0.0.0/0 +(CAPI default; fine for a single-DC rehearsal where the FIP is the access point) -- tighten for +Roosevelt (commercial hard-isolation). Recorded, no action on v1. + +NEW (phase-07 now matches the per-phase encapsulation pattern): +- scripts/phase-07-conductor-graft.sh -- encapsulates 7.0-7.8 with DOCFIX-063 baked in + (verify-first 7.1; auth-proof after helm install; api-versions probe 7.3; install -d before + the tee; sudo ASCII; real-asset helm pre-check). [SENSITIVE] base64-pipes the FIP kubeconfig + to a 0600 magnum-owned file with a sha256 both-sides gate; every step is fail-loud (exit 1 + gate / exit 2 precondition). All as-run values are env-tunable but DEFAULT to the measured + values (MODEL, CONDUCTOR unit, DRIVER_VERSION 1.4.0, HELM_VERSION v3.17.3, CHART_VERSION + 0.25.1, ENVFILE ~/capi-mgmt-net.env). Health poll + regression are phase-08 (not in-script). +- tests/phase-07-conductor-graft/ -- fakebin-stubbed suite (juju/openstack/kubectl/base64/sha256sum) + asserting: 7.1 verify-first is a no-op when 6443 is 0.0.0.0/0; the v3-URL derivation (unversioned + and /v2.0 inputs -> /v3 in both sections); install -d precedes the tee; api-versions probe; the + sha256-mismatch gate fails loud; k8s_capi_helm_v1 enabled gate; precondition failures. ALL PASS. + ### Next-free numbers -Design decision: D-063. Doc fix: DOCFIX-063. (DOCFIX-062 phase-06 kubeconfig-server-rewrite ASSIGNED above; -DOCFIX-061 phase-05 as-built, DOCFIX-060 phase-04 md drift, DOCFIX-059 internal-cert SAN gate recorded -earlier; D-061 teardown, D-062 mysql. D-063 still unused -- the phase-06 sweep produced doc/script fixes, -no new design decision.) +Design decision: D-064. Doc fix: DOCFIX-064. (D-063 ASSIGNED above = capi-mgmt-sg 0.0.0.0/0 +hardening, PROPOSED/OPEN. DOCFIX-063 ASSIGNED above = phase-07 runbook reconciliation, six fixes. +Prior: DOCFIX-062 phase-06 kubeconfig-server-rewrite; D-061 teardown, D-062 mysql.) diff --git a/runbooks/phase-07-conductor-graft.md b/runbooks/phase-07-conductor-graft.md index 124c452..b0ef98e 100644 --- a/runbooks/phase-07-conductor-graft.md +++ b/runbooks/phase-07-conductor-graft.md @@ -15,6 +15,18 @@ D-047 (keystone v3 drop-in for magnum-api -- Step 7.7b). Troubleshooting: appendix-A DOCFIX-021, D-037, D-042, and lessons L-P6-1..4. +DOCFIX-063 (2026-07-01 as-run reconciliation, fresh Pattern-A rebuild): Step 7.1 rewritten +verify-first (the phase-06 capi-mgmt-sg already opens 6443, so the hardcoded per-conductor +rule-add is dropped to a measured fallback); Step 7.2 helm auth-proof moved AFTER 7.4 (helm is +installed there, absent on a fresh conductor); Step 7.3 probe switched to `kubectl api-versions` +(api-resources shows only the PREFERRED version, giving a false "v1beta1 not served" when the +core groups prefer v1beta2); Step 7.6 now creates /etc/magnum/magnum.conf.d before the tee +(absent on a fresh deploy); the conf.d ASCII checks now `sudo` the grep (a non-sudo read of the +root-owned path gave a false "ASCII clean"); the Step 7.4 helm egress pre-check points at a real +asset. As-built refreshed: conductor magnum/0 10.12.4.76 -> 10.12.12.107 (metal-internal, D-052). +D-063 (open): the phase-06 capi-mgmt-sg opens 6443+22 to 0.0.0.0/0 -- fine for a single-DC +rehearsal, tighten for Roosevelt. + --- ## Prerequisites (must be true entering phase-07) @@ -31,9 +43,9 @@ - `admin-openrc` on the jumphost; `juju` (model openstack); `jq`. ## Constants and env-literals (TAG: confirm per site on rebuild) -- `ENV(conductor-unit)` magnum/0 (LXD 1/lxd/2 on openstack1; addr 10.12.4.76 -- confirm per site) -- `ENV(conductor-src)` 10.12.4.76/32 (the conductor's provider IP; SG source -- confirm per site) -- `ENV(mgmt-fip)` per-rebuild (mgmt apiserver / kubeconfig server; source ~/capi-mgmt-net.env from phase-06 -- this rebuild 10.12.5.103; the old 10.12.7.40 is dead -- DOCFIX-038) +- `ENV(conductor-unit)` magnum/0 (LXD 1/lxd/2 on openstack1; addr 10.12.12.107 on metal-internal per D-052 -- confirm per site; was 10.12.4.76 pre-D-052) +- `ENV(conductor-src)` n/a (DOCFIX-063: verify-first 7.1 no longer adds a per-conductor SG rule -- the phase-06 capi-mgmt-sg already opens 6443; a source rule is a FALLBACK only, measured not hardcoded) +- `ENV(mgmt-fip)` per-rebuild (mgmt apiserver / kubeconfig server; source ~/capi-mgmt-net.env from phase-06 -- this rebuild 10.12.7.222; per-rebuild, DOCFIX-038) - `ENV(mgmt-sg)` capi-mgmt-sg (in the capi-mgmt project) - `ENV(project)` capi-mgmt (resolve by name; this rebuild id d5bc125c7c1841d389b76cd0a7b0a915, domain capi) - `ENV(magnum-ns)` magnum- (driver namespace per project; this rebuild magnum-d5bc125c7c1841d389b76cd0a7b0a915) @@ -98,39 +110,48 @@ Step A) or the magnum.conf `[trust]` names differ from the created domain/user. (Benign "No domain/user exists" idempotency lines may appear in the action output.) -## Step 7.1 -- Authorize the conductor source on the mgmt-cluster SG -(scoped to the capi-mgmt project). Idempotent. +## Step 7.1 -- Authorize the conductor source on the mgmt-cluster SG (VERIFY-FIRST; DOCFIX-063) +(scoped to the capi-mgmt project). DOCFIX-063: do NOT hardcode a per-conductor source rule. +The phase-06 capi-mgmt-sg already opens `tcp/6443` to `0.0.0.0/0` (the FIP is the access +point), so the conductor reaches the apiserver with NO new rule. Inspect the SG + prove +reachability FIRST; add a rule ONLY if 6443 is not already permitted, and then with the source +the mgmt VM actually SEES (measured, never the pre-D-052 provider literal 10.12.4.76). -**RUN -- jumphost** +**CHECK (read-only) -- jumphost** ```bash ( { set -u - # scope openstack CLI to the capi-mgmt project (id form -- robust to name/domain) source ~/admin-openrc - # resolve the capi-mgmt project id while still admin-scoped, THEN narrow scope to it (id form) CAPI_PID=$(openstack project show capi-mgmt --domain capi -f value -c id) # ENV(project); resolve, never hardcode unset OS_PROJECT_NAME OS_PROJECT_ID OS_TENANT_NAME OS_TENANT_ID export OS_PROJECT_ID="$CAPI_PID" SG=$(openstack security group show capi-mgmt-sg -f value -c id) # ENV(mgmt-sg) echo "SG=$SG" - echo "=== add ingress tcp/6443 from the conductor 10.12.4.76/32 (if absent) ===" - openstack security group rule list "$SG" -f value -c "IP Range" -c "Port Range" \ - | grep -q '10.12.4.76/32 6443:6443' \ - || openstack security group rule create --proto tcp --dst-port 6443 \ - --remote-ip 10.12.4.76/32 "$SG" - openstack security group rule list "$SG" -f value -c Protocol -c "Port Range" -c "IP Range" + echo "=== current ingress rules (JSON -- avoids the -c column-swap trap) ===" + openstack security group rule list "$SG" -f json } ) ``` -Then prove conductor -> mgmt apiserver reachability: +Then prove conductor -> mgmt apiserver reachability (FIP from phase-06's env, never hardcoded): **CHECK (read-only) -- jumphost -> magnum/0** ```bash -# RUN: jumphost -> magnum/0 (FIP from phase-06's ~/capi-mgmt-net.env -- never hardcode; DOCFIX-038) -source ~/capi-mgmt-net.env # MGMT_FIP +source ~/capi-mgmt-net.env # MGMT_FIP (DOCFIX-038: per-rebuild) juju ssh -m openstack magnum/0 \ "timeout 6 bash -c 'exec 3<>/dev/tcp/$MGMT_FIP/6443' && echo TCP-OK || echo TCP-FAIL" /32 is the source the mgmt VM sees; do NOT guess it. +( { source ~/admin-openrc + CAPI_PID=$(openstack project show capi-mgmt --domain capi -f value -c id) + unset OS_PROJECT_NAME OS_PROJECT_ID OS_TENANT_NAME OS_TENANT_ID; export OS_PROJECT_ID="$CAPI_PID" + SG=$(openstack security group show capi-mgmt-sg -f value -c id) + openstack security group rule create --proto tcp --dst-port 6443 --remote-ip /32 "$SG" +} ) +``` ## Step 7.2 -- Place the mgmt kubeconfig on the conductor [SENSITIVE; not batched] The source `~/capi-mgmt.kubeconfig` already has its @@ -159,12 +180,15 @@ ``` **GATE:** the two sha256 hashes are identical (an empty or truncated transfer fails here, not three steps later as a confusing conductor auth error). -End-to-end proof (the conductor user authenticates to the mgmt cluster via the FIP): +End-to-end proof (the conductor user authenticates to the mgmt cluster via the FIP) -- +DOCFIX-063: helm is installed in Step 7.4, so on a fresh conductor it is ABSENT here. RUN THIS +CHECK AFTER STEP 7.4 (integrity above + the 7.1 TCP reachability already gate 7.2 without it): -**CHECK (read-only) -- jumphost -> magnum/0** +**CHECK (read-only; run AFTER Step 7.4) -- jumphost -> magnum/0** ```bash juju ssh -m openstack magnum/0 \ - 'sudo -u magnum env HOME=/tmp helm --kubeconfig /etc/magnum/kubeconfig list -A' /dev/null 2>&1 || { echo "helm MISSING -- run this after Step 7.4"; exit 0; }; \ + sudo -u magnum env HOME=/tmp helm --kubeconfig /etc/magnum/kubeconfig list -A' /dev/null | awk 'NR==1 || /v1beta1/' - done + kubectl api-versions | grep -E 'cluster\.x-k8s\.io/' | sort } ) -# Expect v1beta1 for: cluster.x-k8s.io (Cluster/MachineDeployment/Machine), -# controlplane.cluster.x-k8s.io (KubeadmControlPlane), infrastructure.cluster.x-k8s.io -# (OpenStackCluster -- verified anchor). If a CORE kind serves ONLY v1beta2, override -# just that kind via api_resources in Step 7.6; otherwise the defaults work as-is. +# Expect v1beta1 SERVED for the core groups (alongside v1beta2 as preferred): +# cluster.x-k8s.io/v1beta1, controlplane.cluster.x-k8s.io/v1beta1, +# bootstrap.cluster.x-k8s.io/v1beta1, infrastructure.cluster.x-k8s.io/v1beta1, +# addons.cluster.x-k8s.io/v1beta1. If a CORE group serves ONLY v1beta2 (v1beta1 ABSENT), +# override just that kind via api_resources in Step 7.6; otherwise the empty default works. ``` ## Step 7.4 -- Install the driver (1.4.0) + helm in the conductor container @@ -208,10 +234,11 @@ **RUN -- jumphost -> magnum/0** ```bash -# egress pre-check +# egress pre-check (DOCFIX-063: hit a REAL asset -- bare https://get.helm.sh/ 404s (no root +# index) and looks like a failure; the versioned sha256sum URL is a true 200 reachability probe) juju ssh -m openstack magnum/0 \ 'curl -s -o /dev/null -w "pypi:%{http_code}\n" https://pypi.org/simple/ ; \ - curl -s -o /dev/null -w "helm:%{http_code}\n" https://get.helm.sh/' magnum/0** ```bash +# DOCFIX-063: /etc/magnum/magnum.conf.d/ does NOT exist on a fresh rebuild (the deb ships +# magnum.conf, not the .conf.d dir; tee cannot create a missing parent). Create it first +# (root:root 0755, magnum-traversable for --config-dir; also the Step 7.7 config-dir target). +juju ssh -m openstack magnum/0 'sudo install -d -o root -g root -m 0755 /etc/magnum/magnum.conf.d' /dev/null <<'CONF' [capi_helm] kubeconfig_file = /etc/magnum/kubeconfig @@ -292,8 +324,10 @@ **CHECK (read-only) -- jumphost -> magnum/0** ```bash +# DOCFIX-063: sudo the grep -- /etc/magnum is root-owned (0750 root:magnum); a non-sudo read +# gets "Permission denied" and the `|| echo` prints a FALSE "ASCII clean". juju ssh -m openstack magnum/0 \ - 'LC_ALL=C grep -nP "[^\x00-\x7F]" /etc/magnum/magnum.conf.d/00-capi-helm.conf && echo NON-ASCII || echo "ASCII clean"' read the version-less v1beta2 ref -> health UNHEALTHY (D-042). - PHASE-07 BASELINE supersedes this with the RELEASED magnum-capi-helm==1.4.0 (api_resources; default v1beta1). -- kubeconfig: /etc/magnum/kubeconfig, -rw------- magnum, ~5657 bytes, server = the mgmt FIP:6443 (per-rebuild; this rebuild 10.12.5.103, old 10.12.7.40 dead). +## As-built reference (2026-07-01 Pattern-A rebuild graft -- audit trail; supersedes the 2026-06-08/09 pre-D-052 run) +- magnum/0: LXD 1/lxd/2 on openstack1, addr 10.12.12.107 (metal-internal per D-052; was 10.12.4.76 pre-D-052), + charm magnum 2024.1/stable rev 70, DEB magnum 18.0.1, python3.10, container ubuntu 22.04; conductor user `magnum`. +- Driver: RELEASED magnum-capi-helm==1.4.0 (pip --no-deps; api_resources={} explicit -> code-default v1beta1, + served by CAPI v1.13.2 / CAPO v0.14.4). This is the v1 baseline; the pre-D-052 run's interim 1.3.0 + (version-less v1beta2 ref -> cosmetic UNHEALTHY, D-042) is superseded. +- kubeconfig: /etc/magnum/kubeconfig, -rw------- magnum, 5641 bytes this rebuild (sha256 26ed1091...6c11), + server = the mgmt FIP:6443 (per-rebuild; this rebuild 10.12.7.222 -- DOCFIX-038). - conf.d drop-in /etc/magnum/magnum.conf.d/00-capi-helm.conf: kubeconfig_file, helm_chart_repo (azimuth), helm_chart_name openstack-cluster, default_helm_chart_version 0.25.1 (api_resources left default -- v1beta1 served by CAPI v1.13.2 / CAPO v0.14.4). diff --git a/scripts/phase-07-conductor-graft.sh b/scripts/phase-07-conductor-graft.sh new file mode 100644 index 0000000..781817c --- /dev/null +++ b/scripts/phase-07-conductor-graft.sh @@ -0,0 +1,247 @@ +#!/usr/bin/env bash +# scripts/phase-07-conductor-graft.sh +# +# Phase-07 -- Magnum conductor graft (D-031 / D-037 / D-042 / D-046 / D-047), +# encapsulating the validated 2026-07-01 as-run (Steps 7.0-7.8) with DOCFIX-063 +# baked in. Runs on the jumphost; every conductor-side action ships via +# `juju ssh -m ... a false "v1beta1 not served"). +# 4. 7.6 runs `install -d /etc/magnum/magnum.conf.d` before the tee (absent on a +# fresh deploy; also the 7.7 --config-dir target). +# 5. ASCII checks `sudo` the grep (a non-sudo read of the root-owned path gave a +# false "ASCII clean"). +# 6. the helm egress pre-check hits a REAL asset (bare get.helm.sh/ 404s). +# +# [SENSITIVE] Step 7.2 base64-pipes the FIP-rewritten kubeconfig into a root-written +# 0600 file owned by the conductor user (magnum) and gates on a sha256 both-sides +# match. The kubeconfig holds a cluster-admin credential; it is never staged in /tmp. +# +# Health poll + create/delete regression are NOT in this script -- on a fresh deploy +# no cluster/template exists yet; phase-08 (D-011) is the superset acceptance. +# +# Tunables via env (all DEFAULT to the as-run measured values): +# MODEL CONDUCTOR ENVFILE KUBECONFIG_SRC DRIVER_VERSION HELM_VERSION CHART_VERSION +# API_PORT SG_NAME CAPI_PROJECT CAPI_PROJECT_DOMAIN ADMIN_OPENRC +# Requires: jumphost; juju (model reachable); openstack (admin-openrc); base64; +# sha256sum; ~/capi-mgmt-net.env (MGMT_FIP, from phase-06); ~/capi-mgmt.kubeconfig. +# Usage: bash scripts/phase-07-conductor-graft.sh +# Exit: 0 all phase-07 mechanisms in place | 1 gate fail | 2 precondition fail +# ASCII + LF. + +# shellcheck disable=SC1090 # $ADMIN_OPENRC / $ENVFILE are intentionally dynamic source paths +set -euo pipefail +shopt -s inherit_errexit 2>/dev/null || true + +MODEL="${MODEL:-openstack}" +CONDUCTOR="${CONDUCTOR:-magnum/0}" +MAGNUM_APP="${CONDUCTOR%%/*}" +ENVFILE="${ENVFILE:-$HOME/capi-mgmt-net.env}" +KUBECONFIG_SRC="${KUBECONFIG_SRC:-$HOME/capi-mgmt.kubeconfig}" +DRIVER_VERSION="${DRIVER_VERSION:-1.4.0}" +HELM_VERSION="${HELM_VERSION:-v3.17.3}" +CHART_VERSION="${CHART_VERSION:-0.25.1}" +API_PORT="${API_PORT:-6443}" +SG_NAME="${SG_NAME:-capi-mgmt-sg}" +CAPI_PROJECT="${CAPI_PROJECT:-capi-mgmt}" +CAPI_PROJECT_DOMAIN="${CAPI_PROJECT_DOMAIN:-capi}" +ADMIN_OPENRC="${ADMIN_OPENRC:-$HOME/admin-openrc}" +CONF_DIR="/etc/magnum/magnum.conf.d" + +say() { printf '\n=== %s ===\n' "$*"; } +ok() { printf '[OK] %s\n' "$*"; } +die1() { printf 'GATE FAIL: %s\n' "$*" >&2; exit 1; } +die2() { printf 'PRECONDITION FAIL: %s\n' "$*" >&2; exit 2; } + +# ---- helper: run a command string on the conductor (stdin closed; DOCFIX-021) ---- +rc() { juju ssh -m "$MODEL" "$CONDUCTOR" "$1" /dev/null || true; } + +# ============================ Preconditions ============================ +for c in juju openstack base64 sha256sum; do + command -v "$c" >/dev/null 2>&1 || die2 "$c not found on the jumphost" +done +[ -f "$ADMIN_OPENRC" ] || die2 "$ADMIN_OPENRC not found" +[ -f "$ENVFILE" ] || die2 "$ENVFILE not found (run phase-06 first)" +[ -s "$KUBECONFIG_SRC" ] || die2 "$KUBECONFIG_SRC not found/empty (run phase-06 6.5 first)" +# shellcheck disable=SC1090 +. "$ENVFILE" +[ -n "${MGMT_FIP:-}" ] || die2 "MGMT_FIP unset in $ENVFILE" +grep -qE "^[[:space:]]*server:[[:space:]]*https://${MGMT_FIP//./\\.}:${API_PORT}\$" "$KUBECONFIG_SRC" \ + || die2 "$KUBECONFIG_SRC server is not the FIP https://${MGMT_FIP}:${API_PORT} (phase-06 DOCFIX-062 rewrite missing)" +ok "preconditions met (model=$MODEL conductor=$CONDUCTOR MGMT_FIP=$MGMT_FIP driver=$DRIVER_VERSION)" + +# ============================ 7.0 domain-setup (D-046) ============================ +say "7.0 magnum trustee domain-setup (D-046; idempotent)" +juju run "${MAGNUM_APP}/leader" domain-setup /dev/null /dev/null &1 no SIGPIPE gate race) +grep -q 'magnum-conductor' <<<"$COE" || die1 "coe service list did not return magnum-conductor (trustee 403?)" +ok "domain magnum=$DOM_ID user magnum_domain_admin=$USR_ID; coe service list OK" + +# ============================ 7.1 reachability (VERIFY-FIRST; DOCFIX-063) ============================ +say "7.1 conductor -> mgmt apiserver reachability (verify-first; no hardcoded SG rule)" +CAPI_PID=$( ( . "$ADMIN_OPENRC"; openstack project show "$CAPI_PROJECT" --domain "$CAPI_PROJECT_DOMAIN" -f value -c id ) 2>/dev/null /dev/tcp/${MGMT_FIP}/${API_PORT}' && echo TCP-OK || echo TCP-FAIL" || true) +case "$TCP" in + *TCP-OK*) ok "conductor reaches ${MGMT_FIP}:${API_PORT} (phase-06 capi-mgmt-sg already permits it; no rule added)" ;; + *) die1 "conductor cannot reach ${MGMT_FIP}:${API_PORT}. DOCFIX-063 fallback (manual): scope to project $CAPI_PID, \ +MEASURE the source the mgmt VM sees from $CONDUCTOR (conntrack/listener on the VM), then \ +'openstack security group rule create --proto tcp --dst-port ${API_PORT} --remote-ip /32 $SG_NAME'. Never guess the source." ;; +esac + +# ============================ 7.2 kubeconfig -> conductor [SENSITIVE] ============================ +say "7.2 place the FIP kubeconfig on the conductor [SENSITIVE]" +CUSER=$(rc "systemctl show magnum-conductor -p User --value" | tr -d '\r') +[ -z "$CUSER" ] && CUSER=$(rc "ps -eo user:32,args | awk '/[m]agnum-conductor/{print \$1; exit}'" | tr -d '\r') +[ -n "$CUSER" ] || die1 "could not determine the conductor service user" +rc "getent passwd $CUSER >/dev/null" || die1 "conductor user '$CUSER' does not exist on the conductor" +ok "conductor user = $CUSER" +# base64-pipe: stdin IS the payload -> NO /etc/magnum/kubeconfig && \ + getent passwd $CUSER >/dev/null && chown $CUSER: /etc/magnum/kubeconfig && \ + chmod 0600 /etc/magnum/kubeconfig'" \ + || die1 "kubeconfig transfer to the conductor failed" +L_SHA=$(sha256sum "$KUBECONFIG_SRC" | cut -d' ' -f1) +R_SHA=$(rc "sudo sha256sum /etc/magnum/kubeconfig" | cut -d' ' -f1) +[ -n "$R_SHA" ] && [ "$L_SHA" = "$R_SHA" ] || die1 "kubeconfig sha256 mismatch (local=$L_SHA remote=$R_SHA)" +ok "kubeconfig on conductor: 0600 $CUSER, sha256 match ($L_SHA)" + +# ============================ 7.3 served CAPI versions (DOCFIX-063 probe) ============================ +say "7.3 confirm v1beta1 is SERVED per core CAPI group (kubectl api-versions)" +SERVED=$(KUBECONFIG="$KUBECONFIG_SRC" kubectl api-versions 2>/dev/null | grep -E 'cluster\.x-k8s\.io/' | sort || true) +[ -n "$SERVED" ] || die1 "no cluster.x-k8s.io api-versions returned (mgmt cluster unreachable via $KUBECONFIG_SRC)" +printf '%s\n' "$SERVED" +for g in cluster.x-k8s.io controlplane.cluster.x-k8s.io bootstrap.cluster.x-k8s.io infrastructure.cluster.x-k8s.io; do + printf '%s\n' "$SERVED" | grep -qx "${g}/v1beta1" \ + || die1 "core group ${g} does NOT serve v1beta1 -- set an api_resources override for it in 7.6 (edit CHART/driver map)" +done +ok "v1beta1 served for all core CAPI groups; empty api_resources={} is correct (D-042 premise)" + +# ============================ 7.4 driver + helm install ============================ +say "7.4 install helm $HELM_VERSION + magnum-capi-helm $DRIVER_VERSION on the conductor" +# (a) egress pre-check -- REAL assets (DOCFIX-063: bare get.helm.sh/ 404s) +rc "curl -s -o /dev/null -w 'pypi:%{http_code}\n' https://pypi.org/simple/ ; \ + curl -s -o /dev/null -w 'helm:%{http_code}\n' https://get.helm.sh/helm-${HELM_VERSION}-linux-amd64.tar.gz.sha256sum" +# (b) helm -- checksum-verified; /usr/local/bin + /usr/bin symlink (DOCFIX-035). WANT injected from the local tunable. +juju ssh -m "$MODEL" "$CONDUCTOR" "WANT='$HELM_VERSION'; "'set -e + if [ -x /usr/bin/helm ] && /usr/bin/helm version --short 2>/dev/null | grep -q "$WANT"; then + echo "[SKIP] /usr/bin/helm already $WANT" + else + T=helm-$WANT-linux-amd64.tar.gz; D=$(mktemp -d); cd "$D" + curl -fsSLO "https://get.helm.sh/$T" + EXP=$(curl -fsSL "https://get.helm.sh/$T.sha256sum" | cut -d" " -f1) + GOT=$(sha256sum "$T" | cut -d" " -f1) + [ -n "$EXP" ] && [ "$EXP" = "$GOT" ] || { echo "GATE FAIL: helm checksum exp=$EXP got=$GOT"; exit 1; } + tar xzf "$T" + sudo install -o root -g root -m 0755 linux-amd64/helm /usr/local/bin/helm + sudo ln -sfn /usr/local/bin/helm /usr/bin/helm + cd /; rm -rf "$D"; echo "[OK] installed $(/usr/bin/helm version --short)" + fi' /dev/null | grep -E '^Version:'")" \ + || die1 "installed magnum-capi-helm is not $DRIVER_VERSION" +grep -q 'k8s_capi_helm_v1' <<<"$(rcap "python3 -c \"import importlib.metadata as m; print([e.name for e in m.entry_points(group='magnum.drivers')])\"")" \ + || die1 "k8s_capi_helm_v1 entry point missing after install" +ok "helm $HELM_VERSION (restricted PATH) + magnum-capi-helm $DRIVER_VERSION; entry point present" + +# ---- moved 7.2 auth-proof (helm now present) ---- +say "7.2/7.4 end-to-end auth proof (helm list -A as $CUSER via the FIP)" +AUTH=$(rc "sudo -u $CUSER env HOME=/tmp helm --kubeconfig /etc/magnum/kubeconfig list -A" || true) +printf '%s\n' "$AUTH" +grep -q 'cert-manager' <<<"$AUTH" || die1 "conductor could not auth/list mgmt-cluster releases (expected cert-manager et al.)" +ok "conductor authenticates to the mgmt cluster; releases listed" + +# ============================ 7.6 [capi_helm] drop-in (D-037) ============================ +say "7.6 stage the [capi_helm] conf.d drop-in (D-037)" +rc "sudo install -d -o root -g root -m 0755 $CONF_DIR" # DOCFIX-063: dir absent on fresh deploy +CONF_CONTENT="[capi_helm] +kubeconfig_file = /etc/magnum/kubeconfig +helm_chart_repo = https://azimuth-cloud.github.io/capi-helm-charts +helm_chart_name = openstack-cluster +default_helm_chart_version = $CHART_VERSION +api_resources = {}" +# stdin IS the payload -> NO /dev/null" || die1 "writing 00-capi-helm.conf failed" +rc "sudo chmod 0644 $CONF_DIR/00-capi-helm.conf" +# verify content + perms + ASCII (DOCFIX-063: sudo the grep) +rc "sudo grep -q '^default_helm_chart_version = $CHART_VERSION\$' $CONF_DIR/00-capi-helm.conf" \ + || die1 "00-capi-helm.conf missing default_helm_chart_version = $CHART_VERSION" +rc "sudo env LC_ALL=C grep -nP '[^\x00-\x7F]' $CONF_DIR/00-capi-helm.conf" \ + && die1 "00-capi-helm.conf has non-ASCII bytes" || true +ok "00-capi-helm.conf staged (chart $CHART_VERSION, api_resources={}, ASCII clean)" + +# ============================ 7.7 conductor --config-dir (D-037) ============================ +say "7.7 wire --config-dir into the conductor via /etc/default (LSB init)" +juju ssh -m "$MODEL" "$CONDUCTOR" \ + "echo 'DAEMON_ARGS=\"\$DAEMON_ARGS --config-dir $CONF_DIR\"' | sudo tee /etc/default/magnum-conductor >/dev/null && \ + sudo chmod 0644 /etc/default/magnum-conductor" /dev/null \ + || echo 'DAEMON_ARGS="\$DAEMON_ARGS --config-dir $CONF_DIR"' >> /etc/default/magnum-api +chmod 0644 /etc/default/magnum-api +WWW=\$(awk -F'= ' '/^\[keystone_authtoken\]/{s=1} s&&/^www_authenticate_uri/{print \$2; exit}' /etc/magnum/magnum.conf) +AURL=\$(awk -F'= ' '/^\[keystone_authtoken\]/{s=1} s&&/^auth_url/{print \$2; exit}' /etc/magnum/magnum.conf) +WWW3=\${WWW/\/v2.0//v3}; case "\$WWW3" in */v3) ;; *) WWW3="\${WWW3%/}/v3";; esac +AURL3=\${AURL/\/v2.0//v3}; case "\$AURL3" in */v3) ;; *) AURL3="\${AURL3%/}/v3";; esac +printf '[keystone_authtoken]\nauth_version = v3\nwww_authenticate_uri = %s\nauth_url = %s\n[keystone_auth]\nauth_version = v3\nwww_authenticate_uri = %s\nauth_url = %s\n' \ + "\$WWW3" "\$AURL3" "\$WWW3" "\$AURL3" > $CONF_DIR/50-keystone-v3-override.conf +chmod 0644 $CONF_DIR/50-keystone-v3-override.conf +REOF +[ "$(rcap "sudo grep -c '^auth_version = v3\$' $CONF_DIR/50-keystone-v3-override.conf")" = 2 ] \ + || die1 "50-keystone-v3-override.conf missing auth_version=v3 in both sections" +[ "$(rcap "sudo grep -c -- '--config-dir $CONF_DIR' /etc/default/magnum-api")" = 1 ] \ + || die1 "/etc/default/magnum-api does not carry exactly one --config-dir line" +ok "keystone-v3 override written (both sections v3); magnum-api /etc/default wired" + +# ============================ 7.8 restart + driver enabled ============================ +say "7.8 restart conductor + api; verify both live cmdlines carry --config-dir" +ACT=$(rcap "sudo systemctl restart magnum-conductor magnum-api && sleep 3 && systemctl is-active magnum-conductor magnum-api") +[ "$(grep -c '^active$' <<<"$ACT")" = 2 ] || die1 "magnum-conductor and/or magnum-api not active after restart" +grep -q -- "--config-dir $CONF_DIR" <<<"$(rcap "ps -ww -C magnum-conductor -o args= | head -1")" \ + || die1 "running conductor cmdline lacks --config-dir after restart" +grep -q -- "--config-dir $CONF_DIR" <<<"$(rcap "ps -ww -C magnum-api -o args= | head -1")" \ + || die1 "running magnum-api cmdline lacks --config-dir after restart" +grep -q 'k8s_capi_helm_v1' <<<"$(rcap "sudo magnum-driver-manage list-drivers 2>/dev/null")" \ + || die1 "k8s_capi_helm_v1 not enabled in magnum-driver-manage list-drivers" +ok "both services active with --config-dir; k8s_capi_helm_v1 enabled" + +say "PHASE-07 COMPLETE" +echo "All conductor-graft mechanisms in place. HEALTHY poll + create/delete regression are" +echo "phase-08 (D-011) -- no cluster/template exists yet on a fresh deploy." +exit 0 diff --git a/tests/phase-07-conductor-graft/fakebin/juju b/tests/phase-07-conductor-graft/fakebin/juju new file mode 100644 index 0000000..679e2d5 --- /dev/null +++ b/tests/phase-07-conductor-graft/fakebin/juju @@ -0,0 +1,103 @@ +#!/usr/bin/env bash +# fake juju for phase-07-conductor-graft.sh tests. +# Logs every call to $JUJU_LOG; keeps decoded-file state in $JUJU_STATE so the +# 7.2 sha256 both-sides gate passes legitimately. Dispatches `juju ssh` remote +# commands by substring. Steered by env: +# DOMAIN_SETUP_FAIL TCP_FAIL SHA_MISMATCH DRIVER_MISSING NOTACTIVE +# SHOWARGS_NODIR PS_NODIR NODRIVER_ENABLED +: "${JUJU_LOG:=/dev/null}" +: "${JUJU_STATE:=/tmp}" +printf 'juju %s\n' "$*" >> "$JUJU_LOG" + +sub="${1:-}"; shift || true + +if [ "$sub" = "run" ]; then + # juju run magnum/leader domain-setup + [ "${DOMAIN_SETUP_FAIL:-0}" = 1 ] && exit 1 + echo "Running domain-setup on magnum/leader"; exit 0 +fi + +if [ "$sub" != "ssh" ]; then exit 0; fi + +# strip: -m MODEL UNIT ; the remainder is the remote command (1+ args) +while [ "${1:-}" = "-m" ]; do shift 2; done +shift || true # drop the UNIT +REMOTE="$*" + +emit_kubeconfig_sha() { + if [ "${SHA_MISMATCH:-0}" = 1 ]; then + echo "0000000000000000000000000000000000000000000000000000000000000000 /etc/magnum/kubeconfig" + elif [ -f "$JUJU_STATE/kubeconfig" ]; then + sha256sum "$JUJU_STATE/kubeconfig" | awk '{print $1" /etc/magnum/kubeconfig"}' + else + echo "(no kubeconfig)"; return 1 + fi +} + +case "$REMOTE" in + *"base64 -d > /etc/magnum/kubeconfig"*) # 7.2 write -- stdin is base64 payload (must precede the getent match) + base64 -d > "$JUJU_STATE/kubeconfig" 2>/dev/null || true; exit 0 ;; + *"echo TCP-OK"*) + if [ "${TCP_FAIL:-0}" = 1 ]; then echo "TCP-FAIL"; else echo "TCP-OK"; fi ;; + *"systemctl show magnum-conductor -p User"*) + echo "magnum" ;; + *"[m]agnum-conductor"*) # ps fallback owner probe + echo "magnum" ;; + *"getent passwd magnum"*) + exit 0 ;; + *"sha256sum /etc/magnum/kubeconfig"*) + emit_kubeconfig_sha ;; + *"curl "*pypi*|*"pypi:"*) + echo "pypi:200"; echo "helm:200" ;; + *"WANT="*"get.helm.sh"*) # helm install block + echo "[OK] installed v3.17.3+ge4da497" ;; + *"command -v helm"*) # restricted-PATH gate + echo "/usr/bin/helm"; echo "v3.17.3+ge4da497" ;; + *"pip install"*"magnum-capi-helm"*) + echo "Successfully installed magnum-capi-helm-1.4.0" ;; + *"pip show magnum-capi-helm"*) + echo "Version: 1.4.0" ;; + *"entry_points"*) + if [ "${DRIVER_MISSING:-0}" = 1 ]; then echo "['k8s_fedora_coreos_v1']"; else echo "['k8s_capi_helm_v1', 'k8s_fedora_coreos_v1']"; fi ;; + *"helm --kubeconfig /etc/magnum/kubeconfig list -A"*) # auth proof + echo "NAME NAMESPACE REVISION STATUS CHART" + echo "cert-manager cert-manager 1 deployed cert-manager-v1.20.2" + echo "ck-network kube-system 1 deployed cilium-1.17.12" ;; + *"install -d"*"magnum.conf.d"*) + mkdir -p "$JUJU_STATE/conf.d"; exit 0 ;; + *"tee /etc/magnum/magnum.conf.d/00-capi-helm.conf"*) # 7.6 write -- stdin payload + mkdir -p "$JUJU_STATE/conf.d"; cat > "$JUJU_STATE/conf.d/00-capi-helm.conf"; exit 0 ;; + *"chmod 0644 /etc/magnum/magnum.conf.d/00-capi-helm.conf"*) + exit 0 ;; + *"grep -q '^default_helm_chart_version = "*) + grep -q '^default_helm_chart_version = 0.25.1$' "$JUJU_STATE/conf.d/00-capi-helm.conf"; exit $? ;; + *"grep -nP '[^\\x00-\\x7F]' /etc/magnum/magnum.conf.d/00-capi-helm.conf"*) + LC_ALL=C grep -nP '[^\x00-\x7F]' "$JUJU_STATE/conf.d/00-capi-helm.conf"; exit $? ;; # empty -> exit 1 (no non-ascii) + *"tee /etc/default/magnum-conductor"*) + cat > "$JUJU_STATE/default-conductor"; exit 0 ;; + *"show-args"*) + if [ "${SHOWARGS_NODIR:-0}" = 1 ]; then + echo "/usr/bin/magnum-conductor --config-file=/etc/magnum/magnum.conf --log-file=/var/log/magnum/magnum-conductor.log" + else + echo "/usr/bin/magnum-conductor --config-file=/etc/magnum/magnum.conf --config-dir /etc/magnum/magnum.conf.d --log-file=/var/log/magnum/magnum-conductor.log" + fi ;; + *"sudo bash -s"*) # 7.7b heredoc -- consume + simulate success + cat >/dev/null; exit 0 ;; + *"grep -c '^auth_version = v3"*) + echo "2" ;; + *"grep -c -- '--config-dir /etc/magnum/magnum.conf.d' /etc/default/magnum-api"*) + echo "1" ;; + *"systemctl restart magnum-conductor magnum-api"*) + if [ "${NOTACTIVE:-0}" = 1 ]; then echo "active"; echo "failed"; else echo "active"; echo "active"; fi ;; + *"ps -ww -C magnum-conductor"*|*"ps -ww -C magnum-api"*) + if [ "${PS_NODIR:-0}" = 1 ]; then + echo "/usr/bin/python3 /usr/bin/magnum-x --config-file=/etc/magnum/magnum.conf --log-file=/var/log/magnum/x.log" + else + echo "/usr/bin/python3 /usr/bin/magnum-x --config-file=/etc/magnum/magnum.conf --config-dir /etc/magnum/magnum.conf.d --log-file=/var/log/magnum/x.log" + fi ;; + *"magnum-driver-manage list-drivers"*) + if [ "${NODRIVER_ENABLED:-0}" = 1 ]; then echo "| k8s_fedora_coreos_v1 |"; else echo "| k8s_capi_helm_v1 |"; echo "| k8s_fedora_coreos_v1 |"; fi ;; + *) + exit 0 ;; +esac +exit 0 diff --git a/tests/phase-07-conductor-graft/fakebin/kubectl b/tests/phase-07-conductor-graft/fakebin/kubectl new file mode 100644 index 0000000..29d0304 --- /dev/null +++ b/tests/phase-07-conductor-graft/fakebin/kubectl @@ -0,0 +1,25 @@ +#!/usr/bin/env bash +# fake kubectl for phase-07-conductor-graft.sh tests. +# Only `kubectl api-versions` is used (7.3 DOCFIX-063 probe). Emits the served +# group/versions; steered by env NO_V1BETA1_CORE (drops cluster.x-k8s.io/v1beta1 +# so the 7.3 core-group gate fails). +if [ "${1:-}" = "api-versions" ]; then + cat <<'AV' +addons.cluster.x-k8s.io/v1beta1 +addons.cluster.x-k8s.io/v1beta2 +bootstrap.cluster.x-k8s.io/v1beta1 +bootstrap.cluster.x-k8s.io/v1beta2 +controlplane.cluster.x-k8s.io/v1beta1 +controlplane.cluster.x-k8s.io/v1beta2 +infrastructure.cluster.x-k8s.io/v1beta1 +infrastructure.cluster.x-k8s.io/v1beta2 +v1 +AV + if [ "${NO_V1BETA1_CORE:-0}" = 1 ]; then + echo "cluster.x-k8s.io/v1beta2" + else + echo "cluster.x-k8s.io/v1beta1" + echo "cluster.x-k8s.io/v1beta2" + fi +fi +exit 0 diff --git a/tests/phase-07-conductor-graft/fakebin/openstack b/tests/phase-07-conductor-graft/fakebin/openstack new file mode 100644 index 0000000..c25c761 --- /dev/null +++ b/tests/phase-07-conductor-graft/fakebin/openstack @@ -0,0 +1,32 @@ +#!/usr/bin/env bash +# fake openstack for phase-07-conductor-graft.sh tests. +# Logs every call to $OS_LOG (so the harness can assert 7.1 verify-first NEVER +# calls `security group rule create`). Steered by env: +# DOMAIN_MISSING USER_MISSING PROJECT_MISSING COE_403 +: "${OS_LOG:=/dev/null}" +printf 'openstack %s\n' "$*" >> "$OS_LOG" + +j=" $* " +case "$j" in + *" domain show magnum "*) + [ "${DOMAIN_MISSING:-0}" = 1 ] && exit 1 + echo "d9d0a4a8215d49f2aeb243b6aea4b0b0" ;; + *" user show magnum_domain_admin "*) + [ "${USER_MISSING:-0}" = 1 ] && exit 1 + echo "0885dca38f8043ed85d5e72f14a54124" ;; + *" project show "*) + [ "${PROJECT_MISSING:-0}" = 1 ] && exit 1 + echo "d5bc125c7c1841d389b76cd0a7b0a915" ;; + *" coe service list "*) + if [ "${COE_403:-0}" = 1 ]; then + echo "ERROR (Forbidden): Keystone client authentication failed" >&2; exit 1 + fi + echo "| id | host | binary | state |" + echo "| 1 | None | magnum-conductor | up |" ;; + *" security group rule create "*) + # should NEVER run on the happy path (7.1 is verify-first) + echo "RULE-CREATE-CALLED" ; exit 0 ;; + *) + exit 0 ;; +esac +exit 0 diff --git a/tests/phase-07-conductor-graft/run-tests.sh b/tests/phase-07-conductor-graft/run-tests.sh new file mode 100644 index 0000000..c491bbd --- /dev/null +++ b/tests/phase-07-conductor-graft/run-tests.sh @@ -0,0 +1,126 @@ +#!/usr/bin/env bash +# tests/phase-07-conductor-graft/run-tests.sh -- offline regression for +# scripts/phase-07-conductor-graft.sh. Fake juju/openstack/kubectl; real +# base64/sha256sum/bash. Focus: the DOCFIX-063 behaviors (7.1 verify-first is a +# no-op when 6443 is open and NEVER creates a rule; 7.3 api-versions probe; the +# sha256 both-sides gate; install -d before the tee) plus the 7.7b v3-URL +# derivation and the phase gates/preconditions. +set -euo pipefail +IFS=$'\n\t' +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +SCRIPTS="$(cd "$HERE/../../scripts" && pwd)" +TARGET="$SCRIPTS/phase-07-conductor-graft.sh" +BIN="$HERE/fakebin" +[ -f "$TARGET" ] || { echo "FAIL: $TARGET missing" >&2; exit 1; } +chmod +x "$BIN"/* 2>/dev/null || true +WORK="$(mktemp -d)"; trap 'rm -rf "$WORK"' EXIT +rc_all=0 +FIP=10.12.7.222 + +# fixtures on the jumphost side +printf 'export OS_AUTH_URL=https://10.12.8.50:35357/v3\nexport OS_USERNAME=admin\n' > "$WORK/admin-openrc" +printf 'MGMT_FIP=%s\n' "$FIP" > "$WORK/net.env" +cat > "$WORK/kubeconfig" < "$WORK/juju.log"; : > "$WORK/os.log" + set +e + PATH="$BIN:$PATH" HOME="$WORK" \ + ADMIN_OPENRC="$WORK/admin-openrc" ENVFILE="$WORK/net.env" KUBECONFIG_SRC="$WORK/kubeconfig" \ + JUJU_LOG="$WORK/juju.log" JUJU_STATE="$state" OS_LOG="$WORK/os.log" \ + env "$@" bash "$TARGET" >"$WORK/out" 2>&1 + rc=$? + set -e + if [ "$rc" -eq "$want" ] && grep -qE "$re" "$WORK/out"; then + printf ' [OK] %-52s exit %s\n' "$label" "$rc" + else + printf ' [XX] %-52s exit %s (want %s; /%s/)\n' "$label" "$rc" "$want" "$re" + sed 's/^/ /' "$WORK/out"; rc_all=1 + fi +} + +echo "=== phase-07-conductor-graft.sh ===" +run 0 'PHASE-07 COMPLETE' "happy path (7.0-7.8)" +run 0 'no rule added' "7.1 verify-first: reachable -> no SG mutation" +run 0 'v1beta1 served for all core' "7.3 api-versions probe passes" +run 0 'sha256 match' "7.2 kubeconfig sha256 both-sides gate" +run 0 'k8s_capi_helm_v1 enabled' "7.8 driver enabled gate" + +# --- gate failures --- +run 1 'cannot reach' "7.1 TCP-FAIL -> exit 1 (fallback msg)" TCP_FAIL=1 +run 1 'sha256 mismatch' "7.2 sha mismatch -> exit 1" SHA_MISMATCH=1 +run 1 'does NOT serve v1beta1' "7.3 core group missing v1beta1 -> exit 1" NO_V1BETA1_CORE=1 +run 1 'entry point missing' "7.4 driver entry point absent -> exit 1" DRIVER_MISSING=1 +run 1 'not active after restart' "7.8 service not active -> exit 1" NOTACTIVE=1 +run 1 'lacks --config-dir' "7.8 live cmdline missing --config-dir" PS_NODIR=1 +run 1 'not enabled in magnum-driver' "7.8 driver not enabled -> exit 1" NODRIVER_ENABLED=1 +run 1 'domain-setup action failed' "7.0 domain-setup fails -> exit 1" DOMAIN_SETUP_FAIL=1 +run 1 "domain 'magnum' absent" "7.0 domain missing -> exit 1" DOMAIN_MISSING=1 + +# --- preconditions --- +run 2 'not found' "precondition: no ENVFILE -> exit 2" ENVFILE="$WORK/nope.env" +: > "$WORK/empty.env" +run 2 'MGMT_FIP unset' "precondition: empty env (MGMT_FIP unset) -> exit 2" ENVFILE="$WORK/empty.env" +cat > "$WORK/badkc" <<'BK' +apiVersion: v1 +clusters: +- cluster: + server: https://10.20.0.207:6443 + name: x +BK +run 2 'server is not the FIP' "precondition: kubeconfig server not FIP -> exit 2" KUBECONFIG_SRC="$WORK/badkc" + +# --- dedicated happy-path capture for the structural assertions (NOT leftover +# logs from the last precondition run, which never reaches 7.0+) --- +echo "=== structural assertions (dedicated happy-path capture) ===" +hstate="$WORK/hstate"; rm -rf "$hstate"; mkdir -p "$hstate" +: > "$WORK/hjuju.log"; : > "$WORK/hos.log" +PATH="$BIN:$PATH" HOME="$WORK" \ + ADMIN_OPENRC="$WORK/admin-openrc" ENVFILE="$WORK/net.env" KUBECONFIG_SRC="$WORK/kubeconfig" \ + JUJU_LOG="$WORK/hjuju.log" JUJU_STATE="$hstate" OS_LOG="$WORK/hos.log" \ + bash "$TARGET" >/dev/null 2>&1 || true + +# 7.1 verify-first NEVER calls security group rule create +if grep -q 'security group rule create' "$WORK/hos.log"; then + echo " [XX] 7.1 created an SG rule (must be verify-first no-op)"; rc_all=1 +else + echo " [OK] no 'security group rule create' in the openstack call log" +fi + +# install -d /etc/magnum/magnum.conf.d precedes the 00-capi-helm.conf tee +ln=$(grep -n 'install -d.*magnum.conf.d' "$WORK/hjuju.log" | head -1 | cut -d: -f1 || true) +lt=$(grep -n 'tee /etc/magnum/magnum.conf.d/00-capi-helm.conf' "$WORK/hjuju.log" | head -1 | cut -d: -f1 || true) +if [ -n "$ln" ] && [ -n "$lt" ] && [ "$ln" -lt "$lt" ]; then + echo " [OK] install -d (line $ln) precedes tee (line $lt)" +else + echo " [XX] dir-create/tee ordering wrong (install-d=$ln tee=$lt)"; rc_all=1 +fi + +# --- assertion: 7.7b v3-URL derivation (mirrors the as-run block) --- +echo "=== assert 7.7b keystone v3-URL derivation ===" +v3() { # replicate the 7.7b derivation exactly + local in="$1" out + out=${in/\/v2.0//v3} + case "$out" in */v3) ;; *) out="${out%/}/v3";; esac + printf '%s' "$out" +} +derr=0 +[ "$(v3 https://10.12.4.50:5000)" = https://10.12.4.50:5000/v3 ] || { echo " [XX] unversioned -> /v3 failed"; derr=1; } +[ "$(v3 https://10.12.4.50:5000/v2.0)" = https://10.12.4.50:5000/v3 ] || { echo " [XX] /v2.0 -> /v3 failed"; derr=1; } +[ "$(v3 https://10.12.4.50:5000/v3)" = https://10.12.4.50:5000/v3 ] || { echo " [XX] /v3 -> /v3 (idempotent) failed"; derr=1; } +[ "$(v3 https://10.12.8.50:35357/)" = https://10.12.8.50:35357/v3 ] || { echo " [XX] trailing-slash -> /v3 failed"; derr=1; } +[ "$derr" -eq 0 ] && echo " [OK] v3 derivation: unversioned, /v2.0, /v3, trailing-slash all -> /v3" +[ "$derr" -eq 0 ] || rc_all=1 + +echo +[ "$rc_all" -eq 0 ] && echo "ALL PASS" || echo "SOME FAILED" +exit "$rc_all"