diff --git a/runbooks/phase-00-maas-reconfigure.md b/runbooks/phase-00-maas-reconfigure.md new file mode 100644 index 0000000..7d8e330 --- /dev/null +++ b/runbooks/phase-00-maas-reconfigure.md @@ -0,0 +1,63 @@ +# Phase 00 -- MAAS reconfigure to D-058 + +Sequences the gated steps that take the live D-052/053 cloud to the D-058 plane +scheme, then hands off to deploy. Scripts do the deterministic/idempotent work; +the destructive juju + libvirt steps stay human-gated (runbooks), by design. + +Precondition check (read-only): `scripts/phase-00-maas-standup.sh` should report +the three DRIFT planes (.8 metal-admin -> provider-vip, .12 metal-internal -> +metal-admin, .16 data-tenant -> metal-internal). If it reports no drift, the cloud +is already on D-058 and only the deploy remains. + +## Step 1 -- Teardown (gated runbook: runbooks/phase-00-teardown-maas-reset.md) +Destroy the `openstack` Juju model and release openstack0-3 to MAAS Ready. The +hosts MUST be released so the migrating subnets carry no live interface links -- +the re-CIDR deletes those subnets, and MAAS refuses to delete a subnet with live +allocations. `juju destroy-model` is typed by the operator (not auto-scripted). +GATE: `juju models` shows no `openstack`; openstack0-3 are Ready. + +## Step 2 -- Audit (read-only) +``` +scripts/phase-00-maas-standup.sh # expect 3 DRIFT lines (.8/.12/.16) +scripts/phase-00-maas-recidr.sh # audit: migration plan + metal/data fabric ids +``` +Eyeball the fabric ids and confirm no live IP allocations are flagged on the +migrating subnets. Change nothing here. + +## Step 3 -- Re-CIDR (gated, destructive) +``` +scripts/phase-00-maas-recidr.sh --apply +``` +Deletes the old .8/.12/.16 subnets (reserved ranges first), then recreates +.12/.16/.20 on the SAME fabrics/VLANs (reuse-in-place; spaces inherited via the +persisted VLANs). Collision-safe: all deletes precede all creates. If a delete is +refused (live links remain), clear them (release/delete the machine interfaces) +and re-run -- the script is idempotent (already-migrated planes SKIP). + +## Step 4 -- Standup (gated) +``` +scripts/phase-00-maas-standup.sh --apply # provider-vip .8 (VID 104) + gateways + dns + ALL reserves +scripts/phase-00-maas-standup.sh # verify: all-SKIP, no drift +``` +The standup is the single MAAS-address authority (topology + VIP bands + FIP pool ++ mgmt reserves). `phase-00-maas-carve.sh` is retired. + +## Step 5 -- Jumphost bridges (gated host runbook: runbooks/jumphost-provider-vip-gateway.md) +ORDERING TRAP (D-058): provider-vip's gateway 10.12.8.1 IS metal-admin's OLD +address. On the jumphost, in order: + (a) virbr2 (metal-admin) 10.12.8.1 -> 10.12.12.1 + (b) virbr7 (oob) confirm already 10.12.60.1 (live) + (c) THEN virbr1.104 (provider-vip) = 10.12.8.1 +Bringing up virbr1.104=.8.1 before (a) frees .8.1 is a same-subnet collision. +libvirt/netplan persistence is host-specific -- typed by the operator, not scripted. + +## Step 6 -- Deploy handoff +Proceed to phase-01 bundle deploy. Per-host interface carve +(`scripts/carve-host-interfaces.sh`) runs after commissioning. The bundle already +carries the D-058 VIP triples; `d057-bundle-check.py` PASSes against it. + +## Why teardown + jumphost are runbooks, not scripts +`juju destroy-model` and libvirt bridge edits are the most consequential and least +reversible / least portable actions in the phase. Per the operating discipline, +consequential mutations are human-gated; these stay operator-typed. The +deterministic, idempotent MAAS work (re-CIDR, standup) is scripted + behavior-tested. diff --git a/scripts/phase-00-maas-recidr.sh b/scripts/phase-00-maas-recidr.sh new file mode 100644 index 0000000..f627006 --- /dev/null +++ b/scripts/phase-00-maas-recidr.sh @@ -0,0 +1,156 @@ +#!/usr/bin/env bash +# scripts/phase-00-maas-recidr.sh [--apply] +# +# Gated MAAS re-CIDR migration D-052/053 -> D-058 for the planes whose CIDR MOVES: +# metal-admin 10.12.8.0/22 -> 10.12.12.0/22 (untagged, metal fabric) +# metal-internal 10.12.12.0/22 -> 10.12.16.0/22 (VID 103, metal fabric) +# data-tenant 10.12.16.0/22 -> 10.12.20.0/22 (untagged, data fabric) +# (provider-vip 10.12.8.0/22 is NEW, not a move -- the standup creates it once .8 is freed.) +# +# REUSE-IN-PLACE: MAAS cannot change a subnet's CIDR, so each plane is migrated by +# deleting the old subnet and recreating it at the new CIDR on the SAME fabric + SAME +# VLAN. The VLAN (and its space assignment) persists across the subnet delete, so the +# new subnet inherits the correct space with no space/VLAN edits. Existing fabrics are +# kept (no orphaned fabrics). +# +# Default is DRY-RUN (audit): resolves everything live BY CIDR (PATTERN-1, no hardcoded +# ids), verifies each old subnet is on its expected space + VLAN, surfaces the metal/data +# fabric ids, lists reserved ranges + any live IP allocations, and prints the plan. Pass +# --apply to execute. COLLISION-SAFE: all old subnets are deleted BEFORE any new subnet is +# created (each new CIDR is the old CIDR of another plane, freed by the deletes). +# +# This script does ONLY the destructive subnet swap. Gateways, managed, dns, the reserved +# bands, and provider-vip are the standup's job -- run AFTER this: +# scripts/phase-00-maas-standup.sh --apply (build provider-vip + reserves + attrs) +# scripts/phase-00-maas-standup.sh (dry-run: expect all-SKIP, no drift) +# +# PRE-REQS: openstack model torn down + hosts released, so the subnets have no live links. +# If MAAS refuses a delete (interfaces still linked), the error is surfaced and we STOP -- +# clear the links (release/delete the machines) and re-run. We never force-delete. +# +# Exit: 0 ok (or nothing to migrate) | 1 fatal / unexpected state | 2 precondition +# CLI forms per Canonical MAAS how-to-manage-networks. ASCII + LF. + +set -euo pipefail +shopt -s inherit_errexit 2>/dev/null || true + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +# shellcheck source=scripts/lib-net.sh +. "$SCRIPT_DIR/lib-net.sh" + +MAAS_PROFILE="${MAAS_PROFILE:-admin}" +MODE="dryrun"; [ "${1:-}" = "--apply" ] && MODE="apply" +FATAL=0 + +hdr() { echo; echo "=== $* ==="; } +note() { echo " - $*"; } +fail() { echo "FAIL: $*" >&2; FATAL=$((FATAL+1)); } +need_jq || exit 1 + +# read wrapper: valid JSON or "[]" so a stray MAAS error never crashes us under set -e. +maas_json() { local out; out="$(maas "$MAAS_PROFILE" "$@" 2>/dev/null || true)"; printf '%s' "$out" | jq empty 2>/dev/null && printf '%s' "$out" || printf '[]'; } + +emit() { # + local desc="$1"; shift + if [ "$MODE" = "apply" ]; then + echo " DO: $desc" + local out + if ! out="$(maas "$MAAS_PROFILE" "$@" 2>&1)"; then + fail "$desc" + echo " MAAS said: $(printf '%s' "$out" | grep -viE '^(Success|Machine-readable)' | head -3 | tr '\n' ' ')" >&2 + return 1 + fi + else + echo " WOULD: $desc" + echo " maas $MAAS_PROFILE $*" + fi +} + +sub_id() { maas_json subnets read | jq -r --arg c "$1" '.[]|select(.cidr==$c)|(.id|tostring)' | head -1; } +sub_vid() { maas_json subnets read | jq -r --arg c "$1" '.[]|select(.cidr==$c)|(.vlan.vid|tostring)' | head -1; } +sub_fabid() { maas_json subnets read | jq -r --arg c "$1" '.[]|select(.cidr==$c)|(.vlan.fabric_id|tostring)' | head -1; } +sub_space() { maas_json subnets read | jq -r --arg c "$1" '.[]|select(.cidr==$c)|(.space // "")' | head -1; } +vlanobj() { maas_json vlans read "$1" | jq -r --arg v "$2" '.[]|select((.vid|tostring)==$v)|(.id|tostring)' | head -1; } +ipr_ids_on(){ maas_json ipranges read | jq -r --arg s "$1" '.[]|select((.subnet.id|tostring)==$s)|(.id|tostring)'; } +allocs_on() { maas "$MAAS_PROFILE" subnet ip-addresses "$1" 2>/dev/null | jq -r 'if type=="array" then (.[]|.ip // .start_ip // empty) else empty end' 2>/dev/null || true; } + +# --- migration table: name|old_cidr|new_cidr|kind|vid (vid=0 for untagged) --- +MIG="$(cat < D-058 mode=$MODE" +note "reuse-in-place: new subnet created on each plane's EXISTING fabric + VLAN; spaces untouched." + +# ---------------------------------------------------------------- AUDIT (capture) +declare -A M_FAB M_VID M_OLDSUB M_NEW +ORDER=(); PENDING=0 +hdr "audit (read-only): resolve + verify each migrating plane by its OLD cidr" +while IFS='|' read -r name ocidr ncidr kind vid; do + [ -n "$name" ] || continue + osub="$(sub_id "$ocidr")" + if [ -z "$osub" ]; then note "$name: no subnet at old $ocidr -- already migrated or absent; SKIP"; continue; fi + curspace="$(sub_space "$ocidr")" + if [ "$curspace" != "$name" ]; then + note "$name: old $ocidr is now space '$curspace' (not '$name') -- already migrated or not this plane; SKIP"; continue; fi + want_vid=$([ "$kind" = tagged ] && echo "$vid" || echo 0) + gotvid="$(sub_vid "$ocidr")" + if [ "$gotvid" != "$want_vid" ]; then + fail "$name: subnet $ocidr on VID '$gotvid', expected $want_vid -- refusing"; continue; fi + fab="$(sub_fabid "$ocidr")" + ranges="$(ipr_ids_on "$osub" | tr '\n' ' ')" + allocs="$(allocs_on "$osub" | tr '\n' ' ')" + M_FAB["$name"]="$fab"; M_VID["$name"]="$want_vid"; M_OLDSUB["$name"]="$osub"; M_NEW["$name"]="$ncidr" + ORDER+=("$name"); PENDING=$((PENDING+1)) + note "$name: $ocidr (subnet $osub, fabric $fab, vid $want_vid) -> $ncidr on the SAME fabric/vid" + [ -n "${ranges// }" ] && note " reserved range ids to delete first: $ranges" + [ -n "${allocs// }" ] && note " NOTE live IP allocations present ($allocs) -- if a delete is refused, clear these (release/delete machines) and re-run" +done <<< "$MIG" + +hdr "fabric summary (eyeball before any mutation)" +note "metal fabric (metal-admin/metal-internal) = ${M_FAB[metal-admin]:-${M_FAB[metal-internal]:-?}}" +note "data fabric (data-tenant) = ${M_FAB[data-tenant]:-?}" +note "provider fabric (provider-vip target, handled by standup) = resolve via provider-public 10.12.4.0/22" + +[ "$FATAL" -eq 0 ] || { echo; echo "completed with $FATAL failure(s) -- fix the unexpected state above before proceeding"; exit 1; } +if [ "$PENDING" -eq 0 ]; then hdr "result"; note "nothing to migrate (no old-scheme subnets present)"; echo; echo "OK ($MODE)"; exit 0; fi + +# ------------------------------------------------------------------------- PLAN +if [ "$MODE" = dryrun ]; then + hdr "PLAN (dry-run -- nothing changed)" + echo " 1) delete (ranges then subnet), all $PENDING old subnets first:" + for n in "${ORDER[@]}"; do echo " - $n delete subnet ${M_OLDSUB[$n]} (was the old CIDR)"; done + echo " 2) create new subnets on the same fabric/VLAN (collision-free after the deletes):" + for n in "${ORDER[@]}"; do echo " - $n create ${M_NEW[$n]} on fabric ${M_FAB[$n]} vid ${M_VID[$n]}"; done + echo " 3) then: scripts/phase-00-maas-standup.sh --apply (provider-vip + gateways + dns + reserves)" + echo " scripts/phase-00-maas-standup.sh (verify: all-SKIP, no drift)" + echo + echo " re-run with --apply to execute." + exit 0 +fi + +# ----------------------------------------------------------------------- MUTATE +hdr "MUTATE 1/2: delete old subnets (ranges first), collision-safe" +for n in "${ORDER[@]}"; do + osub="${M_OLDSUB[$n]}" + for rid in $(ipr_ids_on "$osub"); do emit "delete iprange $rid (on $n old subnet $osub)" iprange delete "$rid" || true; done + emit "delete subnet id=$osub ($n old CIDR)" subnet delete "$osub" || true +done +[ "$FATAL" -eq 0 ] || { echo; echo "delete phase hit $FATAL failure(s) -- STOP (likely live interface links; clear them and re-run). No new subnets created."; exit 1; } + +hdr "MUTATE 2/2: create new subnets on the existing fabric/VLAN" +for n in "${ORDER[@]}"; do + fab="${M_FAB[$n]}"; vid="${M_VID[$n]}"; ncidr="${M_NEW[$n]}" + vobj="$(vlanobj "$fab" "$vid")" + [ -n "$vobj" ] || { fail "$n: cannot resolve VLAN obj for fabric $fab vid $vid -- the VLAN should persist after subnet delete; aborting before create"; continue; } + emit "create subnet $ncidr on fabric $fab vid $vid (vlan obj $vobj)" subnets create cidr="$ncidr" vlan="$vobj" +done + +[ "$FATAL" -eq 0 ] || { echo; echo "completed with $FATAL failure(s)"; exit 1; } +hdr "next" +echo " run: scripts/phase-00-maas-standup.sh --apply (provider-vip + gateways + dns + reserves)" +echo " then: scripts/phase-00-maas-standup.sh (verify: all-SKIP, no drift)" +echo; echo "OK ($MODE)" diff --git a/tests/phase-00-maas-recidr/fakebin/maas b/tests/phase-00-maas-recidr/fakebin/maas new file mode 100644 index 0000000..514ff06 --- /dev/null +++ b/tests/phase-00-maas-recidr/fakebin/maas @@ -0,0 +1,11 @@ +#!/usr/bin/env bash +# fake maas for re-CIDR: serves reads from fixtures; mutations succeed (apply path). +prof="${1:-}"; obj="${2:-}"; act="${3:-}"; a4="${4:-}" +case "$obj $act" in + "subnets read") cat "${FIX_SUBNETS:?}"; exit 0 ;; + "ipranges read") cat "${FIX_IPRANGES:?}"; exit 0 ;; + "subnet ip-addresses") cat "${FIX_IPADDRS:?}"; exit 0 ;; + "vlans read") jq --arg f "$a4" '[.[]|select((.fabric_id|tostring)==$f)]' "${FIX_VLANS:?}"; exit 0 ;; + "subnets create"|"subnet delete"|"iprange delete") echo "Success."; exit 0 ;; +esac +echo "{}"; exit 0 diff --git a/tests/phase-00-maas-recidr/make_fixtures.py b/tests/phase-00-maas-recidr/make_fixtures.py new file mode 100644 index 0000000..b3dbbb4 --- /dev/null +++ b/tests/phase-00-maas-recidr/make_fixtures.py @@ -0,0 +1,44 @@ +#!/usr/bin/env python3 +# Fixtures for phase-00-maas-recidr.sh: pre-migration (d052), migrated (done), wrong-vid. +import json, os +FIX = os.path.join(os.path.dirname(os.path.abspath(__file__)), "fix"); os.makedirs(FIX, exist_ok=True) +def sub(cidr, sid, space, vid, fab): return {"cidr": cidr, "id": sid, "space": space, "vlan": {"vid": vid, "fabric_id": fab}} +def vlan(vid, vid_id, fab): return {"vid": vid, "id": vid_id, "fabric_id": fab} +def ipr(start, end, sid, rid): return {"id": rid, "type": "reserved", "start_ip": start, "end_ip": end, "subnet": {"id": sid}} +def dump(scn, subs, vlans, ipranges, ipaddrs): + for nm, obj in (("subnets", subs), ("vlans", vlans), ("ipranges", ipranges), ("ipaddrs", ipaddrs)): + open(os.path.join(FIX, f"{scn}_{nm}.json"), "w").write(json.dumps(obj, indent=2) + "\n") + +# vlans common to a metal(2)/data(3)/provider(0)/storage(9)/replication(5) layout +def vlans_pre(): + return [vlan(0,10,0), vlan(0,20,2), vlan(103,21,2), vlan(0,30,3), vlan(0,90,9), vlan(0,50,5)] + +# d052 (pre-migration): the three movers at their OLD cidrs, correct space+vid +d052_subs = [ + sub("10.12.4.0/22", 1, "provider-public", 0, 0), + sub("10.12.8.0/22", 2, "metal-admin", 0, 2), + sub("10.12.12.0/22", 3, "metal-internal", 103, 2), + sub("10.12.16.0/22", 4, "data-tenant", 0, 3), + sub("10.12.32.0/22", 6, "storage", 0, 9), + sub("10.12.36.0/22", 7, "replication", 0, 5), +] +d052_ipr = [ipr("10.12.8.2","10.12.8.100",2,100), ipr("10.12.12.2","10.12.12.100",3,101)] # ranges on metal-admin + metal-internal old subnets +dump("d052", d052_subs, vlans_pre(), d052_ipr, []) + +# done (migrated D-058): old cidrs now host different planes +done_subs = [ + sub("10.12.4.0/22", 1, "provider-public", 0, 0), + sub("10.12.8.0/22", 2, "provider-vip", 104, 0), + sub("10.12.12.0/22", 3, "metal-admin", 0, 2), + sub("10.12.16.0/22", 4, "metal-internal", 103, 2), + sub("10.12.20.0/22", 5, "data-tenant", 0, 3), + sub("10.12.32.0/22", 6, "storage", 0, 9), + sub("10.12.36.0/22", 7, "replication", 0, 5), +] +done_vlans = [vlan(0,10,0), vlan(104,11,0), vlan(0,20,2), vlan(103,21,2), vlan(0,30,3), vlan(0,90,9), vlan(0,50,5)] +dump("done", done_subs, done_vlans, [], []) + +# wrongvid: metal-internal at old .12 is space metal-internal but VID 99 (not 103) +wv_subs = [s for s in d052_subs if s["cidr"] != "10.12.12.0/22"] + [sub("10.12.12.0/22", 3, "metal-internal", 99, 2)] +dump("wrongvid", wv_subs, vlans_pre(), d052_ipr, []) +print("fixtures written") diff --git a/tests/phase-00-maas-recidr/run-tests.sh b/tests/phase-00-maas-recidr/run-tests.sh new file mode 100644 index 0000000..dead104 --- /dev/null +++ b/tests/phase-00-maas-recidr/run-tests.sh @@ -0,0 +1,55 @@ +#!/usr/bin/env bash +# Behavior regression for phase-00-maas-recidr.sh (D-052/053 -> D-058). Fake maas + real jq. +set -uo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +SCRIPT="$(cd "$HERE/../../scripts" && pwd)/phase-00-maas-recidr.sh" +BIN="$HERE/fakebin"; FIX="$HERE/fix" +chmod +x "$BIN"/* 2>/dev/null || true +command -v jq >/dev/null || { echo "FAIL: jq required"; exit 1; } +python3 "$HERE/make_fixtures.py" >/dev/null +rc_all=0; OUT="$(mktemp)" +run() { # want label scenario [--apply] + local want="$1" label="$2" s="$3"; shift 3 + PATH="$BIN:$PATH" FIX_SUBNETS="$FIX/${s}_subnets.json" FIX_VLANS="$FIX/${s}_vlans.json" \ + FIX_IPRANGES="$FIX/${s}_ipranges.json" FIX_IPADDRS="$FIX/${s}_ipaddrs.json" \ + bash "$SCRIPT" "$@" >"$OUT" 2>&1 + local rc=$? + if [ "$rc" -ne "$want" ]; then printf ' [XX] %-44s exit %s (want %s)\n' "$label" "$rc" "$want"; sed 's/^/ /' "$OUT"; rc_all=1; return 1; fi + printf ' [ok] %-44s exit %s\n' "$label" "$rc"; return 0 +} +has() { grep -qE "$1" "$OUT" || { printf ' MISS /%s/\n' "$1"; rc_all=1; }; } +absent(){ grep -qE "$1" "$OUT" && { printf ' LEAK /%s/\n' "$1"; rc_all=1; } || true; } +before(){ local x y; x=$(grep -nE "$1" "$OUT"|head -1|cut -d: -f1); y=$(grep -nE "$2" "$OUT"|head -1|cut -d: -f1); { [ -n "$x" ] && [ -n "$y" ] && [ "$x" -lt "$y" ]; } || { printf ' ORDER /%s/ not before /%s/\n' "$1" "$2"; rc_all=1; }; } + +echo "=== phase-00-maas-recidr.sh -- D-052/053 -> D-058 (fake maas + real jq) ===" +run 0 "pre-migration (d052): dry-run plan" d052 +has 'metal-admin: 10\.12\.8\.0/22 .*fabric 2, vid 0. -> 10\.12\.12\.0/22' +has 'metal-internal: 10\.12\.12\.0/22 .*vid 103. -> 10\.12\.16\.0/22' +has 'data-tenant: 10\.12\.16\.0/22 .*fabric 3.*-> 10\.12\.20\.0/22' +has 'metal fabric .* = 2' +has 'data fabric .* = 3' +has 'reserved range ids to delete first: 100' +has 're-run with --apply' +absent 'DO:' + +run 0 "pre-migration (d052): --apply, deletes BEFORE creates" d052 --apply +has 'DO: delete iprange 100' +has 'DO: delete subnet id=2' +has 'DO: create subnet 10\.12\.12\.0/22 on fabric 2 vid 0' +has 'DO: create subnet 10\.12\.16\.0/22 on fabric 2 vid 103' +has 'DO: create subnet 10\.12\.20\.0/22 on fabric 3 vid 0' +before 'MUTATE 1/2' 'MUTATE 2/2' +before 'DO: delete subnet id=4' 'DO: create subnet 10\.12\.12' + +run 0 "migrated (done): nothing to migrate" done +has 'already migrated or not this plane; SKIP' +has 'nothing to migrate' +absent 'DO:' +absent 'WOULD:' + +run 1 "wrong VID at old .12 -> refuse" wrongvid +has "subnet 10\.12\.12\.0/22 on VID '99', expected 103 -- refusing" + +echo +[ "$rc_all" -eq 0 ] && echo "ALL PASS" || echo "SOME FAILED" +rm -f "$OUT"; exit "$rc_all" diff --git a/tests/run-tests.sh b/tests/run-tests.sh index 60ec8d9..dead104 100644 --- a/tests/run-tests.sh +++ b/tests/run-tests.sh @@ -1,75 +1,54 @@ #!/usr/bin/env bash -# Behavior regression for phase-00-maas-standup.sh (D-058). Fake `maas` + real jq. -# Drives DRY-RUN and asserts WOULD/SKIP/DRIFT/refuse behaviour across scenarios. +# Behavior regression for phase-00-maas-recidr.sh (D-052/053 -> D-058). Fake maas + real jq. set -uo pipefail HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -SCRIPT="$(cd "$HERE/../../scripts" && pwd)/phase-00-maas-standup.sh" +SCRIPT="$(cd "$HERE/../../scripts" && pwd)/phase-00-maas-recidr.sh" BIN="$HERE/fakebin"; FIX="$HERE/fix" -chmod +x "$BIN"/* 2>/dev/null || true # GitHub Desktop lands files mode 100644 +chmod +x "$BIN"/* 2>/dev/null || true command -v jq >/dev/null || { echo "FAIL: jq required"; exit 1; } python3 "$HERE/make_fixtures.py" >/dev/null rc_all=0; OUT="$(mktemp)" - -run() { #