diff --git a/docs/session-ledger.md b/docs/session-ledger.md index 5615285..a07f283 100644 --- a/docs/session-ledger.md +++ b/docs/session-ledger.md @@ -38,11 +38,15 @@ - **validate.sh D-011 runner -- modular.** Foundation committed: `scripts/lib-validate.sh` (exit contract 0/1/2/3/4, emit, vr_json stderr-safe, run, env hygiene, disruptive gate) + `scripts/validate.sh` orchestrator (profiles, verdict, --stop-on-fail, --include-disruptive). - - Batch 1 committed: `checks/d011-01-charms` (active/idle), `checks/d011-06-vault-unseal` - (SEC-003 attestation -> MANUAL until the rehearsal closes). - - **Batch 2 (next):** `d011-02-vip-jumphost` (catalog-derive VIPs, TLS+HTTP probe), - `d011-03-vip-tenant` (agnhost on a beta node -> keystone VIP 10.12.4.50:5000, D-035 pattern). - - **Batch 3:** `d011-04-octavia-lb` (2-backend round-robin -> member failover -> AMPHORA + - Batch 1 committed: `checks/d011-01-charms`, `checks/d011-06-vault-unseal` (MANUAL until + the SEC-003 rehearsal closes). + - Batch 2 committed: `checks/d011-02-vip-jumphost` (catalog-derive public origins, TLS+HTTP), + `checks/d011-03-vip-tenant` (ephemeral agnhost pod egress -> keystone VIP, D-035 proof). + `post-restart` profile now fully populated. + - Batch 3 DRAFTED (mock-tested only, NEEDS LIVE VALIDATION): `d011-04-octavia-lb` + (RR+member additive; amphora failover --disruptive, N+1 headroom-guarded) + + `d011-05-magnum-e2e` (wraps tenant-acceptance + timing). full-d011 profile now COMPLETE. + - **Next:** live-validate batch 3 (see Verify-live queue), then item-3/#5-#8 backlog (2-backend round-robin -> member failover -> AMPHORA failover with an N+1 scheduler-headroom HOLD-guard -> recovery -> self-cleanup; --disruptive-gated) and `d011-05-magnum-e2e` (wrap `tenant-acceptance.sh` + timing). - D-011 AMENDED bar: 1 charms; 2 VIP jumphost; 3 VIP tenant; 4 octavia RR/failover/recovery; @@ -72,6 +76,16 @@ - **R-3:** was D-063 -- now CLOSED. The handoff doc still lists R-3 OPEN; update it. - **D-1:** Pattern-A full redeploy (VR0 DC0). +## Verify-live queue (batch 3 drafts -- confirm on real cloud before trusting) + +- d011-04 amphora_headroom() field parsing (`server show` .flavor; `hypervisor list --long` + vcpu/ram) -- safe default HOLDs on any parse miss, but confirm the OK path live. +- d011-04 OCCM LB-name-contains-service-name assumption; agnhost /hostname round-robin behavior. +- d011-05 needs a FOIL tenant (2nd tenant) for P3 isolation -- onboard one or set VR_FOIL_APPCRED + (acme, the former foil, was offboarded). +- Sequence: run d011-04 non-disruptively first (RR+member), then --include-disruptive once + headroom confirmed. + ## Logged-not-actioned (small; would vanish at compaction) - offboard v2 `--sweep-magnum-orphans` mode (orphan per-cluster trustee in the magnum domain). diff --git a/docs/v1-redeploy-changelog.md b/docs/v1-redeploy-changelog.md index 3dfd883..56b7f6c 100644 --- a/docs/v1-redeploy-changelog.md +++ b/docs/v1-redeploy-changelog.md @@ -1719,5 +1719,74 @@ SKILL.md: session-start step (read ledger + run scan) + a routing-table row. .skill package rebuilt from the updated references. -Next-free (per scan, header-authoritative): D-071 (CONTENDED -- parallel stream), DOCFIX-086, -BUNDLEFIX-010. +Next-free (per scan, header-authoritative): D-071 (CONTENDED -- parallel stream), DOCFIX-086, BUNDLEFIX-010. + + +### 2026-07-03 (addendum 11) -- validate checks batch 2: d011-02-vip-jumphost, d011-03-vip-tenant + +Two checks on the lib-validate foundation; harness now 39/39 across all checks; gauntlet 31. +The `post-restart` profile is now fully populated (d011-01/02/03 all exist). + +scripts/checks/d011-02-vip-jumphost.sh -- D-011 item 2 (public API VIPs respond on hostname +from the jumphost). Enumerates PUBLIC endpoints from the keystone catalog DYNAMICALLY +(openstack endpoint list --interface public), reduces to unique scheme://host:port, probes +each root over TLS (vault CA). Healthy set {200,201,300,301,302,401,403,404}=responds; +5xx=reachable-but-erroring->FAIL; transport error->FAIL, and a TLS-verify failure is retried +with -k ONLY to classify cert-broken vs unreachable (still FAILs -- never passes an +unverifiable endpoint). No admin scope / no endpoints -> HOLD. + +scripts/checks/d011-03-vip-tenant.sh -- D-011 item 3 (API reachability from a tenant VM, +Option B / the D-035 pod-egress proof). Derives the keystone VIP:port from the tenant cred's +auth_url (dynamic); reuses ~/tenant-/kube/config or fetches via `coe cluster config` +AS THE TENANT; runs ONE ephemeral agnhost pod (`/agnhost connect VIP:port`) and deletes it +(trap, even on failure). Succeeded=Option-B proven->PASS; Failed/TIMEOUT/REFUSED->FAIL; no +kubectl/creds/kubeconfig/VIP->HOLD. Additive+self-cleaning, so NOT --disruptive-gated. + +LEDGER-SCAN limitation found + fixed: a "Next-free:" pointer that WORD-WRAPPED onto a second +line escaped ledger-scan's line-based `next-free` exclusion, falsely showing BUNDLEFIX next-free +as 011. No real assignment -- unwrapped the pointer here; CONVENTION added: keep "Next-free:" +pointer lines on ONE line (or rely on ledger-scan, which is now the next-free authority). +Real next-free unchanged: D-071 (contended), DOCFIX-086, BUNDLEFIX-010. + +REMAINING: batch 3 -- d011-04-octavia-lb (2-backend round-robin -> member failover -> AMPHORA +failover with N+1 scheduler-headroom HOLD-guard -> recovery -> self-cleanup; --disruptive-gated) +and d011-05-magnum-e2e (wrap tenant-acceptance.sh + timing). + +### 2026-07-03 (addendum 12) -- validate checks batch 3: d011-04-octavia-lb, d011-05-magnum-e2e + +DRAFTED, mock-tested only (no live octavia/magnum in the sandbox); check harness 62/62, +gauntlet 31. The full-d011 profile is now COMPLETE (all six items exist). REQUIRES LIVE +VALIDATION before being trusted for an acceptance gate -- see the verify-live notes below. + +scripts/checks/d011-05-magnum-e2e.sh -- D-011 item 5. Thin wrapper over tenant-acceptance.sh +with timing; maps 0->PASS, 11/12->FAIL, 13->FAIL(CRITICAL isolation), 14->HOLD, other->HOLD. +Additive+self-cleaning; not disruptive. DEPENDENCY SURFACED: P3 isolation needs a SECOND tenant +as the foil; post-acme-offboard only beta exists, so without VR_FOIL_APPCRED it HOLDs with +"onboard a 2nd tenant" -- honest, not hidden. Scope note: verifying the standing HEALTHY cluster +is a legitimate item-5 check; a FRESH create-verify-teardown is deferred to +tenant-cluster-create.sh (backlog #5) as a future --full mode. + +scripts/checks/d011-04-octavia-lb.sh -- D-011 item 4 (Bobcat v3 LB pattern). Stands up its own +2-replica agnhost backend + LoadBalancer in the tenant cluster, self-cleans via trap. Two tiers: + ALWAYS (additive): round-robin (>=2 distinct backends over N curls to /hostname) + member + failover (delete a backend pod, assert continuity) + recovery (assert 2 backends again). + --disruptive ONLY: AMPHORA failover, guarded by an N+1 scheduler-headroom pre-check. Octavia + STANDALONE failover transiently needs room for one MORE amphora; at/near ceiling => HOLD + (never FAIL, and never trigger a failover that would strand the LB in ERROR). On headroom OK: + `loadbalancer failover`, wait ACTIVE, assert serving again. + Verdict: FAIL if any assertion broke; HOLD if octavia unready / setup or headroom undetermined; + PASS otherwise (with an explicit amphora SKIPPED note when --disruptive is absent). + +VERIFY-LIVE QUEUE (highest-risk assumptions in d011-04, confirm on the real cloud before trust): + 1. amphora_headroom() parses `server show -f json` .flavor and `hypervisor list --long -f json` + (.vcpus/.vcpus_used/.memory_mb/.memory_mb_used). Field shapes vary by microversion; .flavor + may render as a dict. SAFE DEFAULT: any parse miss -> UNKNOWN -> HOLD (never failover on bad + data), so a mismatch degrades to a conservative HOLD, not a wrong action -- but confirm the + OK path live. + 2. LB-name match assumes the OCCM Octavia LB name CONTAINS the k8s service name. Confirm live. + 3. agnhost netexec /hostname round-robin distinctness + member-failover continuity behave as + assumed. Confirm with a live 2-replica service. + Run NON-disruptively first (RR+member) on beta, then --include-disruptive once headroom confirmed. + beta sits at node_count=2 (adequate for a 2-replica service). + +Next-free: D-071 (CONTENDED -- parallel stream), DOCFIX-086, BUNDLEFIX-010. diff --git a/scripts/checks/d011-04-octavia-lb.sh b/scripts/checks/d011-04-octavia-lb.sh new file mode 100644 index 0000000..ded64b1 --- /dev/null +++ b/scripts/checks/d011-04-octavia-lb.sh @@ -0,0 +1,145 @@ +#!/usr/bin/env bash +# scripts/checks/d011-04-octavia-lb.sh -- D-011 item 4: Octavia LB pattern +# (round-robin -> member failover -> recovery -> AMPHORA failover), per Bobcat v3 work. +# +# Stands up its OWN 2-replica backend + LoadBalancer Service in a tenant cluster (beta), +# drives the pattern, and self-cleans (trap). Two tiers: +# ALWAYS (additive, self-cleaning): round-robin distribution + member failover + recovery. +# --disruptive ONLY: amphora failover (destroys+rebuilds the amphora). Guarded by an N+1 +# scheduler-headroom pre-check -- Octavia STANDALONE failover transiently needs room for +# one MORE amphora; a cloud at its ceiling cannot heal its own LB, so at-ceiling => HOLD +# (never FAIL, and never TRIGGER a failover that would strand the LB in ERROR). +# +# Exit: 0 PASS | 1 FAIL (a pattern assertion broke) | 2 HOLD (octavia not ready, LB setup +# undetermined, or amphora headroom not confidently available) | 4 SKIPPED (only if the +# WHOLE check is not applicable). Amphora tier not requested => PASS on the additive tiers +# with an explicit "amphora SKIPPED (needs --include-disruptive)" note. +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +# shellcheck source=scripts/lib-validate.sh +. "$HERE/../lib-validate.sh" +ID=d011-04-octavia-lb; vr_begin "$ID" +CLIENT="${VR_TENANT:-beta}" +AGNHOST="${VR_AGNHOST:-registry.k8s.io/e2e-test-images/agnhost:2.47}" +NPROBE="${VR_RR_PROBES:-12}" +FAILS=0; HELD=0; AMPHORA_NOTE="amphora tier not run" + +vr_need kubectl openstack curl jq awk || { emit "$ID" "$VR_HOLD" "missing tool"; exit "$VR_HOLD"; } + +# --- precondition: octavia reachable as admin --- +if ! vr_admin_env; then emit "$ID" "$VR_HOLD" "no admin scope (octavia ops need it)"; exit "$VR_HOLD"; fi +if ! vr_json _LBL openstack loadbalancer list -f json; then + vr_err_tail; emit "$ID" "$VR_HOLD" "octavia not reachable (loadbalancer list failed)"; exit "$VR_HOLD" +fi + +# --- tenant kubeconfig --- +CF="" +for c in "$HOME/tenant-${CLIENT}/${CLIENT}-cluster-cred.txt"; do [ -s "$c" ] && CF="$c"; done +KCFG="$HOME/tenant-${CLIENT}/kube/config" +[ -s "$KCFG" ] || { emit "$ID" "$VR_HOLD" "no tenant kubeconfig at $KCFG (run tenant-acceptance/d011-03 first)"; exit "$VR_HOLD"; } +export KUBECONFIG="$KCFG" + +SVC="d011lb$RANDOM" +cleanup(){ + kubectl delete svc "$SVC" --ignore-not-found --now >/dev/null 2>&1 || true + kubectl delete deploy "$SVC" --ignore-not-found --now >/dev/null 2>&1 || true +} +trap cleanup EXIT + +# --- stand up 2-replica backend + LoadBalancer --- +echo " creating 2-replica backend + LB service ($SVC)" +run kubectl create deployment "$SVC" --image="$AGNHOST" --replicas=2 -- /agnhost netexec --http-port=8080 || { emit "$ID" "$VR_HOLD" "deployment create failed"; exit "$VR_HOLD"; } +run kubectl expose deployment "$SVC" --port=80 --target-port=8080 --type=LoadBalancer || { emit "$ID" "$VR_HOLD" "service expose failed"; exit "$VR_HOLD"; } +kubectl wait --for=condition=available --timeout=120s deploy/"$SVC" >/dev/null 2>&1 || true + +# wait for FIP +FIP="" +for _ in $(seq 1 40); do + FIP="$(kubectl get svc "$SVC" -o jsonpath='{.status.loadBalancer.ingress[0].ip}' 2>/dev/null || true)" + [ -n "$FIP" ] && break; sleep 15 +done +[ -n "$FIP" ] || { emit "$ID" "$VR_FAIL" "LB never got an EXTERNAL-IP (provisioning broken)"; exit "$VR_FAIL"; } +echo " LB EXTERNAL-IP=$FIP" + +rr_distinct(){ # echo count of distinct backend hostnames over NPROBE curls + local i h; declare -A seen=() + for i in $(seq 1 "$NPROBE"); do + h="$(curl -s --max-time 5 "http://$FIP/hostname" 2>/dev/null || true)" + [ -n "$h" ] && seen["$h"]=1 + done + echo "${#seen[@]}" +} +codes_ok(){ # echo count of 200s over N curls + local i c n=0; for i in $(seq 1 "$1"); do + c="$(curl -s -o /dev/null -w '%{http_code}' --max-time 5 "http://$FIP/hostname" 2>/dev/null || true)" + [ "$c" = 200 ] && n=$((n+1)); done; echo "$n" +} + +# --- TIER 1: round-robin --- +D="$(rr_distinct)" +if [ "$D" -ge 2 ]; then echo " round-robin: $D distinct backends (PASS)"; else echo " round-robin: only $D distinct backend(s) (FAIL)"; FAILS=$((FAILS+1)); fi + +# --- TIER 2: member failover + recovery --- +POD1="$(kubectl get pods -l app="$SVC" -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || true)" +if [ -n "$POD1" ]; then + echo " member failover: deleting $POD1" + run kubectl delete pod "$POD1" --now || true + OK="$(codes_ok 6)" + if [ "$OK" -ge 5 ]; then echo " continuity during member loss: $OK/6 200s (PASS)"; else echo " continuity broken: $OK/6 200s (FAIL)"; FAILS=$((FAILS+1)); fi + kubectl wait --for=condition=available --timeout=120s deploy/"$SVC" >/dev/null 2>&1 || true + D2="$(rr_distinct)" + if [ "$D2" -ge 2 ]; then echo " recovery: back to $D2 distinct backends (PASS)"; else echo " recovery: only $D2 backend(s) after heal (FAIL)"; FAILS=$((FAILS+1)); fi +else echo " member failover: could not identify a backend pod (FAIL)"; FAILS=$((FAILS+1)); fi + +# --- locate the Octavia LB for this service (admin), by name substring --- +LBID="$(openstack loadbalancer list -f json /dev/null | jq -r --arg s "$SVC" '.[] | select(.name|test($s)) | .id' | head -1 || true)" + +# --- TIER 3: amphora failover (disruptive only, headroom-guarded) --- +amphora_headroom(){ # echo OK|NO|UNKNOWN for +1 amphora of the LB's amphora flavor + local lb="$1" aj cid sj fl fv fr hj free_ok=0 + aj="$(openstack loadbalancer amphora list --loadbalancer "$lb" -f json /dev/null || true)" + cid="$(jq -r '.[0].compute_id // empty' <<<"$aj" 2>/dev/null || true)" + [ -n "$cid" ] || { echo UNKNOWN; return; } + sj="$(openstack server show "$cid" -f json /dev/null || true)" + fl="$(jq -r '.flavor // empty' <<<"$sj" 2>/dev/null | grep -oE '[A-Za-z0-9._-]+' | head -1 || true)" + [ -n "$fl" ] || { echo UNKNOWN; return; } + fv="$(openstack flavor show "$fl" -f json /dev/null | jq -r '.vcpus // empty' 2>/dev/null || true)" + fr="$(openstack flavor show "$fl" -f json /dev/null | jq -r '.ram // empty' 2>/dev/null || true)" + { [ -n "$fv" ] && [ -n "$fr" ]; } || { echo UNKNOWN; return; } + hj="$(openstack hypervisor list --long -f json /dev/null || true)" + [ -n "$hj" ] || { echo UNKNOWN; return; } + # any hypervisor with free vcpu and ram for +1 amphora flavor? + free_ok="$(jq -r --argjson v "$fv" --argjson r "$fr" ' + [ .[] | select(((.vcpus // 0)-(.vcpus_used // 0)) >= $v and ((.memory_mb // 0)-(.memory_mb_used // 0)) >= $r) ] | length' <<<"$hj" 2>/dev/null || echo 0)" + case "$free_ok" in ''|0) echo NO;; *) echo OK;; esac +} + +if vr_disruptive_ok; then + if [ -z "$LBID" ]; then echo " amphora failover: could not locate Octavia LB for $SVC (HOLD)"; HELD=1; AMPHORA_NOTE="amphora LB not located" + else + HR="$(amphora_headroom "$LBID")" + echo " amphora headroom (N+1 for STANDALONE failover): $HR" + if [ "$HR" != OK ]; then + echo " amphora failover HELD: headroom $HR -- will not risk failover at/near ceiling" + HELD=1; AMPHORA_NOTE="amphora failover held (headroom=$HR)" + else + echo " triggering amphora failover on $LBID" + run openstack loadbalancer failover "$LBID" || { FAILS=$((FAILS+1)); } + PS="" + for _ in $(seq 1 30); do + PS="$(openstack loadbalancer show "$LBID" -f value -c provisioning_status /dev/null || true)" + [ "$PS" = ACTIVE ] && break; [ "$PS" = ERROR ] && break; sleep 10 + done + OK="$(codes_ok 6)" + if [ "$PS" = ACTIVE ] && [ "$OK" -ge 5 ]; then echo " amphora failover recovered: ACTIVE + $OK/6 200s (PASS)"; AMPHORA_NOTE="amphora failover PASS" + else echo " amphora failover did NOT recover (status=$PS, $OK/6 200s) (FAIL)"; FAILS=$((FAILS+1)); AMPHORA_NOTE="amphora failover FAIL"; fi + fi + fi +else + echo " amphora failover SKIPPED (needs --include-disruptive)"; AMPHORA_NOTE="amphora SKIPPED (not disruptive)" +fi + +# --- verdict --- +if [ "$FAILS" -gt 0 ]; then emit "$ID" "$VR_FAIL" "$FAILS LB-pattern assertion(s) failed; $AMPHORA_NOTE"; exit "$VR_FAIL"; fi +if [ "$HELD" -gt 0 ]; then emit "$ID" "$VR_HOLD" "LB round-robin+member-failover PASS; $AMPHORA_NOTE"; exit "$VR_HOLD"; fi +emit "$ID" "$VR_PASS" "octavia LB pattern PASS (round-robin+member-failover+recovery); $AMPHORA_NOTE"; exit "$VR_PASS" diff --git a/scripts/checks/d011-05-magnum-e2e.sh b/scripts/checks/d011-05-magnum-e2e.sh new file mode 100644 index 0000000..5b87736 --- /dev/null +++ b/scripts/checks/d011-05-magnum-e2e.sh @@ -0,0 +1,54 @@ +#!/usr/bin/env bash +# scripts/checks/d011-05-magnum-e2e.sh -- D-011 item 5: end-to-end Magnum CAPI cluster +# health + OCCM (not crash-looping), by wrapping scripts/tenant-acceptance.sh with timing. +# +# tenant-acceptance verifies an EXISTING tenant cluster end-to-end: P0 health/trustee, +# P1 kubeconfig+nodes+pods, P2 OCCM->Octavia LB serving, P3 cross-tenant isolation. +# Scope note (debated): "cluster CREATION succeeds" is demonstrated transitively -- a +# HEALTHY cluster with a working OCCM LB is a cluster whose create succeeded. A FRESH +# create-verify-teardown is the fuller test and belongs with tenant-cluster-create.sh +# (backlog #5); when that lands, add a --full mode here. Verifying the standing cluster +# is a legitimate item-5 check as written and is fast enough for routine runs. +# +# Isolation (P3) needs a SECOND tenant as the foil. Post-acme-offboard only beta exists, +# so without VR_FOIL_APPCRED the wrapped script HOLDs (exit 14) -- surfaced honestly, not +# hidden. Onboard a foil tenant (or set VR_FOIL_APPCRED) to validate isolation. +# +# Exit (mapping tenant-acceptance 0/11/12/13/14): +# 0 PASS | 11,12 -> FAIL | 13 -> FAIL (CRITICAL isolation) | 14 -> HOLD (precond/no foil) +# other -> HOLD. Additive+self-cleaning (tenant-acceptance cleans its lbtest); not disruptive. +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +# shellcheck source=scripts/lib-validate.sh +. "$HERE/../lib-validate.sh" +ID=d011-05-magnum-e2e; vr_begin "$ID" +CLIENT="${VR_TENANT:-beta}" +TA="$HERE/../tenant-acceptance.sh" + +[ -x "$TA" ] || [ -f "$TA" ] || { emit "$ID" "$VR_HOLD" "tenant-acceptance.sh not found at $TA"; exit "$VR_HOLD"; } +vr_need kubectl openstack || { emit "$ID" "$VR_HOLD" "missing tool (kubectl/openstack)"; exit "$VR_HOLD"; } + +# assemble args: [foil-appcred] +set -- "$CLIENT" +[ -n "${VR_FOIL_APPCRED:-}" ] && set -- "$@" "$VR_FOIL_APPCRED" + +T0=$(date +%s) +OUT="$(bash "$TA" "$@" 2>&1)"; RC=$? +T1=$(date +%s); DUR=$((T1 - T0)) +printf '%s\n' "$OUT" | sed 's/^/ /' +echo " tenant-acceptance exit=$RC elapsed=${DUR}s" + +case "$RC" in + 0) emit "$ID" "$VR_PASS" "magnum e2e PASS for '$CLIENT' (health+OCCM+isolation) in ${DUR}s"; exit "$VR_PASS" ;; + 11) emit "$ID" "$VR_FAIL" "P1 kube failed for '$CLIENT' (nodes/pods)"; exit "$VR_FAIL" ;; + 12) emit "$ID" "$VR_FAIL" "P2 OCCM/Octavia LB failed for '$CLIENT'"; exit "$VR_FAIL" ;; + 13) emit "$ID" "$VR_FAIL" "P3 ISOLATION VIOLATION for '$CLIENT' (CRITICAL)"; exit "$VR_FAIL" ;; + 14) + if printf '%s\n' "$OUT" | grep -qi 'foil app-cred'; then + emit "$ID" "$VR_HOLD" "no foil tenant for isolation -- onboard a 2nd tenant or set VR_FOIL_APPCRED" + else + emit "$ID" "$VR_HOLD" "precondition unmet (cluster/creds/kubectl) for '$CLIENT'" + fi + exit "$VR_HOLD" ;; + *) emit "$ID" "$VR_HOLD" "tenant-acceptance returned unexpected code $RC"; exit "$VR_HOLD" ;; +esac diff --git a/tests/checks/run-tests.sh b/tests/checks/run-tests.sh index 149537f..aef0d16 100644 --- a/tests/checks/run-tests.sh +++ b/tests/checks/run-tests.sh @@ -66,4 +66,180 @@ grep -q 'MANUAL=1' <<<"$OUT" && ok "orchestrator counts manual" || no "orchestrator counts manual" grep -q 'PASS=1' <<<"$OUT" && ok "orchestrator counts charms pass" || no "orchestrator counts charms pass" + +# ============================ BATCH 2 ============================ +command -v jq >/dev/null || { echo "SKIP batch2: jq absent"; echo; [ "$F" = 0 ] && { echo "ALL PASS ($P checks)"; exit 0; } || { echo "FAILURES: $F"; exit 1; }; } + +# ---- d011-02-vip-jumphost: mock openstack (endpoint list) + mock curl ---- +B2="$(mktemp -d)"; mkdir -p "$B2/bin" "$B2/home/vault-init" +printf 'export OS_AUTH_URL=https://ks:5000/v3\nexport OS_CACERT=%s/home/vault-init/ca.pem\nexport OS_USERNAME=admin\nexport OS_PASSWORD=x\nexport OS_PROJECT_NAME=admin\n' "$B2" > "$B2/home/admin-openrc" +touch "$B2/home/vault-init/ca.pem" +cat > "$B2/bin/openstack" <<'OM' +#!/usr/bin/env bash +case "$*" in + *"endpoint list"*"--interface public"*) + if [ "${MOCK_EP:-ok}" = empty ]; then echo "[]"; else + cat <<'J' +[{"Service Name":"keystone","URL":"https://ks.corp:5000/v3","Interface":"public"}, + {"Service Name":"nova","URL":"https://nova.corp:8774/v2.1","Interface":"public"}] +J + fi ;; +esac +OM +cat > "$B2/bin/curl" <<'CM' +#!/usr/bin/env bash +k=0; url="" +for a in "$@"; do [ "$a" = "-k" ] && k=1; case "$a" in https://*) url="$a";; esac; done +case "${MOCK_CURL:-healthy}" in + healthy) echo 300; exit 0 ;; + fivexx) echo 500; exit 0 ;; + unreachable) exit 7 ;; # both verified and -k fail + tlsbroken) if [ "$k" = 1 ]; then echo 200; exit 0; else exit 60; fi ;; +esac +CM +chmod +x "$B2/bin/openstack" "$B2/bin/curl" +runvip2(){ HOME="$B2/home" PATH="$B2/bin:$PATH" bash "$CHK/d011-02-vip-jumphost.sh" 2>&1; } + +OUT="$(MOCK_CURL=healthy runvip2)"; chk "vip-jumphost all-healthy PASS" "$?" 0 +grep -qE '^RESULT d011-02-vip-jumphost PASS 0 ' <<<"$OUT" && ok "vip2 PASS line" || no "vip2 PASS line" +grep -q 'OK https://ks.corp:5000' <<<"$OUT" && ok "vip2 lists healthy origin" || no "vip2 lists healthy origin" +OUT="$(MOCK_CURL=fivexx runvip2)"; chk "vip-jumphost 5xx FAIL" "$?" 1 +grep -q 'reachable but 5xx' <<<"$OUT" && ok "vip2 flags 5xx" || no "vip2 flags 5xx" +OUT="$(MOCK_CURL=unreachable runvip2)"; chk "vip-jumphost unreachable FAIL" "$?" 1 +grep -q 'UNREACHABLE' <<<"$OUT" && ok "vip2 classifies unreachable" || no "vip2 classifies unreachable" +OUT="$(MOCK_CURL=tlsbroken runvip2)"; chk "vip-jumphost tls-broken FAIL" "$?" 1 +grep -q 'TLS-verify-FAILED' <<<"$OUT" && ok "vip2 classifies tls-broken (reachable w/-k)" || no "vip2 classifies tls-broken" +OUT="$(MOCK_EP=empty MOCK_CURL=healthy runvip2)"; chk "vip-jumphost no-endpoints HOLD" "$?" 2 +OUT="$(HOME=/nonexistent PATH="$B2/bin:$PATH" bash "$CHK/d011-02-vip-jumphost.sh" 2>&1)"; chk "vip-jumphost no-scope HOLD" "$?" 2 + +# ---- d011-03-vip-tenant: mock kubectl + openstack + tenant cred/kubeconfig ---- +B3="$(mktemp -d)"; mkdir -p "$B3/bin" "$B3/home/tenant-beta/kube" "$B3/home/vault-init" +touch "$B3/home/vault-init/vault-ca-root.pem" +printf 'auth_url=https://10.12.4.50:5000/v3\nusername=beta-cluster\nuser_domain_id=%s\nproject_id=%s\npassword=pw\n' "$(python3 -c 'print("b"*32)')" "$(python3 -c 'print("a"*32)')" > "$B3/home/tenant-beta/beta-cluster-cred.txt" +printf 'apiVersion: v1\nkind: Config\n' > "$B3/home/tenant-beta/kube/config" +cat > "$B3/bin/kubectl" <<'KM' +#!/usr/bin/env bash +case "$*" in + "version --client"*) exit 0 ;; + "run "*) echo "pod/${2} created" ;; + "get pod"*"jsonpath={.status.phase}"*) + case "${MOCK_POD:-succeeded}" in succeeded) echo Succeeded;; blocked) echo Failed;; hang) echo Running;; esac ;; + "logs "*) + case "${MOCK_POD:-succeeded}" in blocked) echo "TIMEOUT";; succeeded) echo "CONNECTED to 10.12.4.50:5000";; *) echo "";; esac ;; + "delete pod"*) : ;; +esac +KM +cat > "$B3/bin/openstack" <<'OM' +#!/usr/bin/env bash +case "$*" in + *"coe cluster config"*) + # write a fake kubeconfig into --dir + d=""; prev=""; for a in "$@"; do [ "$prev" = "--dir" ] && d="$a"; prev="$a"; done + [ -n "$d" ] && printf 'apiVersion: v1\nkind: Config\n' > "$d/config"; echo "config written" ;; +esac +OM +chmod +x "$B3/bin/kubectl" "$B3/bin/openstack" +runvip3(){ HOME="$B3/home" PATH="$B3/bin:$PATH" bash "$CHK/d011-03-vip-tenant.sh" 2>&1; } + +OUT="$(MOCK_POD=succeeded runvip3)"; chk "vip-tenant reachable PASS" "$?" 0 +grep -qE '^RESULT d011-03-vip-tenant PASS 0 ' <<<"$OUT" && ok "vip3 PASS line" || no "vip3 PASS line" +grep -q 'target keystone VIP: 10.12.4.50:5000' <<<"$OUT" && ok "vip3 derives VIP from auth_url" || no "vip3 derives VIP" +OUT="$(MOCK_POD=blocked runvip3)"; chk "vip-tenant blocked FAIL" "$?" 1 +grep -q 'BLOCKED' <<<"$OUT" && ok "vip3 reports blocked" || no "vip3 reports blocked" +# fetch path: remove cached kubeconfig, ensure it fetches then passes +rm -f "$B3/home/tenant-beta/kube/config" +OUT="$(MOCK_POD=succeeded runvip3)"; chk "vip-tenant fetch-kubeconfig PASS" "$?" 0 +grep -q 'fetching via coe cluster config' <<<"$OUT" && ok "vip3 fetch path exercised" || no "vip3 fetch path" +# no tenant cred -> HOLD +B3b="$(mktemp -d)"; mkdir -p "$B3b/home" +OUT="$(HOME="$B3b/home" PATH="$B3/bin:$PATH" bash "$CHK/d011-03-vip-tenant.sh" 2>&1)"; chk "vip-tenant no-cred HOLD" "$?" 2 + +# ---- INTEGRATION: post-restart profile now has 3 real checks (01 pass, 02/03 mocked) ---- +# (02/03 need their own env; just assert orchestrator resolves & runs 01 with the others reported) +OUT="$(VR_CHECKDIR="$CHK" bash "$VAL" --list 2>&1)" +grep -q 'd011-02-vip-jumphost' <<<"$OUT" && ok "orchestrator discovers d011-02" || no "orchestrator discovers d011-02" +grep -q 'd011-03-vip-tenant' <<<"$OUT" && ok "orchestrator discovers d011-03" || no "orchestrator discovers d011-03" + + +# ============================ BATCH 3 ============================ + +# ---- d011-05-magnum-e2e: mock tenant-acceptance.sh returning each code ---- +B5="$(mktemp -d)"; mkdir -p "$B5/bin" "$B5/scripts/checks" +cp "$CHK/../lib-validate.sh" "$B5/scripts/lib-validate.sh" +cp "$CHK/d011-05-magnum-e2e.sh" "$B5/scripts/checks/" +mkta(){ printf '#!/usr/bin/env bash\n%s\nexit %s\n' "$1" "$2" > "$B5/scripts/tenant-acceptance.sh"; chmod +x "$B5/scripts/tenant-acceptance.sh"; } +printf '#!/usr/bin/env bash\nexit 0\n' > "$B5/bin/kubectl"; printf '#!/usr/bin/env bash\nexit 0\n' > "$B5/bin/openstack"; chmod +x "$B5/bin/"* +run05(){ PATH="$B5/bin:$PATH" bash "$B5/scripts/checks/d011-05-magnum-e2e.sh" 2>&1; } +mkta 'echo P0-3 all good' 0; OUT="$(run05)"; chk "magnum-e2e PASS" "$?" 0 +grep -qE '^RESULT d011-05-magnum-e2e PASS 0 ' <<<"$OUT" && ok "e2e PASS line+timing" || no "e2e PASS line" +mkta 'echo kube broke' 11; OUT="$(run05)"; chk "magnum-e2e kube FAIL" "$?" 1 +mkta 'echo lb broke' 12; OUT="$(run05)"; chk "magnum-e2e LB FAIL" "$?" 1 +mkta 'echo isolation!' 13; OUT="$(run05)"; chk "magnum-e2e isolation FAIL" "$?" 1 +grep -q 'CRITICAL' <<<"$OUT" && ok "e2e flags isolation critical" || no "e2e flags isolation critical" +mkta 'echo no foil app-cred file' 14; OUT="$(run05)"; chk "magnum-e2e no-foil HOLD" "$?" 2 +grep -q 'onboard a 2nd tenant' <<<"$OUT" && ok "e2e surfaces foil dependency" || no "e2e surfaces foil dependency" +mkta 'echo kubectl absent' 14; OUT="$(run05)"; chk "magnum-e2e precond HOLD" "$?" 2 + +# ---- d011-04-octavia-lb: mock kubectl+openstack+curl (faithful: LB name from svc) ---- +B4="$(mktemp -d)"; mkdir -p "$B4/bin" "$B4/home/tenant-beta/kube" "$B4/home/vault-init" +touch "$B4/home/vault-init/vault-ca-root.pem" +printf 'export OS_AUTH_URL=https://ks:5000/v3\nexport OS_CACERT=%s/home/vault-init/vault-ca-root.pem\nexport OS_USERNAME=admin\nexport OS_PASSWORD=x\n' "$B4" > "$B4/home/admin-openrc" +printf 'apiVersion: v1\nkind: Config\n' > "$B4/home/tenant-beta/kube/config" +export MST="$B4/state"; mkdir -p "$MST" +cat > "$B4/bin/kubectl" <<'KM' +#!/usr/bin/env bash +ST="${MST:?}" +case "$1 $2" in + "create deployment") echo "$3" > "$ST/svc"; echo "deployment.apps/$3 created"; exit 0;; + "expose deployment") echo exposed; exit 0;; + "get svc") [ "${OCTAVIA_READY:-1}" = 1 ] && echo "10.12.6.50"; exit 0;; + "get pods") echo "pod-a"; exit 0;; + "delete pod") exit 0;; "delete svc") exit 0;; "delete deploy") exit 0;; +esac +case "$*" in "wait "*) exit 0;; esac +exit 0 +KM +cat > "$B4/bin/openstack" <<'OM' +#!/usr/bin/env bash +ST="${MST:?}"; SVC="$(cat "$ST/svc" 2>/dev/null || echo unknown)" +case "$*" in + "loadbalancer list"*) + [ "${OCTAVIA_READY:-1}" = 1 ] || { echo "octavia down" >&2; exit 1; } + printf '[{"id":"lb-1","name":"kube_service_beta_default_%s","vip_address":"10.12.6.50"}]\n' "$SVC";; + "loadbalancer amphora list"*) echo '[{"compute_id":"cmp-1","role":"STANDALONE"}]';; + "loadbalancer show"*) echo "${MOCK_LB_STATUS:-ACTIVE}";; + "loadbalancer failover"*) echo "failover accepted";; + "server show"*) echo '{"flavor":"m1.amphora","id":"cmp-1"}';; + "flavor show"*) echo '{"vcpus":1,"ram":1024}';; + "hypervisor list"*) + if [ "${MOCK_HEADROOM:-ok}" = ok ]; then echo '[{"vcpus":16,"vcpus_used":4,"memory_mb":32768,"memory_mb_used":8192}]'; + else echo '[{"vcpus":4,"vcpus_used":4,"memory_mb":8192,"memory_mb_used":8192}]'; fi;; +esac +OM +cat > "$B4/bin/curl" <<'CM' +#!/usr/bin/env bash +ST="${MST:?}"; w=0; for a in "$@"; do [ "$a" = "-w" ] && w=1; done +if [ "$w" = 1 ]; then echo 200; exit 0; fi +n=$(cat "$ST/rr" 2>/dev/null || echo 0); n=$((n+1)); echo "$n" > "$ST/rr" +if [ "${MOCK_RR:-multi}" = multi ]; then [ $((n % 2)) -eq 0 ] && echo pod-a || echo pod-b; else echo pod-a; fi +CM +chmod +x "$B4/bin/"* +run04(){ rm -f "$MST/rr" "$MST/svc"; HOME="$B4/home" PATH="$B4/bin:$PATH" MST="$MST" bash "$CHK/d011-04-octavia-lb.sh" 2>&1; } +OUT="$(OCTAVIA_READY=0 run04)"; chk "octavia not-ready HOLD" "$?" 2 +OUT="$(MOCK_RR=multi run04)"; chk "lb RR+member PASS (amphora skipped)" "$?" 0 +grep -qE '^RESULT d011-04-octavia-lb PASS 0 ' <<<"$OUT" && ok "lb PASS line" || no "lb PASS line" +grep -q 'amphora failover SKIPPED' <<<"$OUT" && ok "lb notes amphora skipped" || no "lb notes amphora skipped" +grep -q 'round-robin: 2 distinct' <<<"$OUT" && ok "lb round-robin distinct" || no "lb round-robin distinct" +OUT="$(MOCK_RR=single run04)"; chk "lb RR single-backend FAIL" "$?" 1 +rm -f "$B4/home/tenant-beta/kube/config"; OUT="$(MOCK_RR=multi run04)"; chk "lb no-kubeconfig HOLD" "$?" 2 +printf 'apiVersion: v1\nkind: Config\n' > "$B4/home/tenant-beta/kube/config" +OUT="$(VR_DISRUPTIVE=1 MOCK_RR=multi MOCK_HEADROOM=no run04)"; chk "amphora headroom-NO HOLD" "$?" 2 +grep -q 'headroom NO' <<<"$OUT" && ok "amphora headroom guard fires" || no "amphora headroom guard fires" +OUT="$(VR_DISRUPTIVE=1 MOCK_RR=multi MOCK_HEADROOM=ok MOCK_LB_STATUS=ACTIVE run04)"; chk "amphora failover PASS" "$?" 0 +grep -q 'amphora failover recovered' <<<"$OUT" && ok "amphora failover recovery asserted" || no "amphora failover recovery asserted" +OUT="$(VR_DISRUPTIVE=1 MOCK_RR=multi MOCK_HEADROOM=ok MOCK_LB_STATUS=ERROR run04)"; chk "amphora failover ERROR FAIL" "$?" 1 +OUT="$(VR_CHECKDIR="$CHK" bash "$VAL" --list 2>&1)" +grep -q 'd011-04-octavia-lb' <<<"$OUT" && ok "orchestrator discovers d011-04" || no "orchestrator discovers d011-04" +grep -q 'd011-05-magnum-e2e' <<<"$OUT" && ok "orchestrator discovers d011-05" || no "orchestrator discovers d011-05" + echo; [ "$F" = 0 ] && { echo "ALL PASS ($P checks)"; exit 0; } || { echo "FAILURES: $F"; exit 1; }