Newer
Older
openstack-caracal-ipv4 / runbooks / phase-00-teardown-maas-reset.md

Phase 00 -- Teardown + MAAS Reset

Destroy the openstack Juju model and reset the four MAAS hosts to a clean, deploy-ready state: OSD secondary disks wiped, storage-class NICs linked, and the MAAS VIP/FIP address carve in place. This is the rebuild-prep window -- it runs BEFORE phase-01, because the VIP block must be MAAS-reserved before the bundle deploys onto it, and link-subnet only works on a Ready (not Deployed) machine.

Decisions: D-018 (skip graceful; MAAS-release-direct; supersedes D-013), D-017 (full rebuild every cycle, nothing preserved), KI-P3-001 (the VIP carve fix). Troubleshooting: appendix-A -- DOCFIX-016 (never maas list -- API-key leak), DOCFIX-017 (no maas whoami; hardcode the eyeballed system_ids), R7 (sudo for libvirt/qemu-img), KI-P3-001.

!!! DESTRUCTIVE. Phase 1 (destroy-model + release) and Phase 2 (OSD wipe) are irreversible short of the KVM snapshots (the D-017 safety net). Each destructive step is DISCRETE and individually gated -- do not batch.

CAPI-MGMT NOTE: this teardown releases the FOUR openstack hosts only. The MAAS capi-mgmt VM is the RETIRED D-033 out-of-cloud node; the in-cloud capi-mgmt-v2 tenant VM (phase-06) replaces it. Leave capi-mgmt Ready (its separate Phase-7 teardown is out of scope here). (The older 01-destroy-model.md released 5 VMs incl. capi-mgmt -- that was the D-033 era; do NOT release it on the current rebuild.)


Prerequisites

  • KVM snapshots of openstack0-3 exist (safety net). Authenticated juju session (juju whoami). MAAS CLI logged in as profile admin.
  • Run from jumphost vopenstack-jesse (user jessea123, sudo; also the libvirt hypervisor).

Constants and env-literals

  • MAAS profile: admin (DOCFIX-016: NEVER maas list -- it prints the API key).
  • system_ids (hardcode; DOCFIX-017, no maas whoami): openstack0=4na83t, openstack1=qdbqd6, openstack2=h8frng, openstack3=tmsafc.
  • MAAS subnet ids: 1=provider 10.12.4.0/22, 2=metal 10.12.8.0/22, 6=data 10.12.12.0/22, 7=storage 10.12.16.0/22, 8=replication 10.12.20.0/22.
  • per-host storage NIC octet = 40 + index: data 10.12.12.4N, storage 10.12.16.4N, replication 10.12.20.4N.

Run-location legend

  • # RUN: jumphost -- juju + maas admin; the jumphost is also the libvirt hypervisor (sudo).

Phase 0 -- Pre-flight (READ-ONLY; run before teardown)

# RUN: jumphost

( {
  echo "=== 0a. five network spaces (hard blocker if absent) ==="
  juju spaces   # expect metal 10.12.8.0/22 | provider 10.12.4.0/22 | data 10.12.12.0/22 | storage 10.12.16.0/22 | replication 10.12.20.0/22

  echo "=== 0b. VIP ipranges (note the front-loaded ones to KEEP + the stale .224-.254 to remove) ==="
  maas admin ipranges read \
    | jq -r '.[] | "id=\(.id)\ttype=\(.type)\t\(.start_ip)-\(.end_ip)\tsubnet=\(.subnet.cidr // "?")\t\(.comment // "")"' | sort
  #   KEEP: provider 10.12.4.2-.63, metal 10.12.8.2-.63 (bundle VIPs live here), provider FIP 10.12.5.0-10.12.7.254.
  #   STALE: metal 10.12.8.224-.254 (old scheme) -> its id feeds Phase 4 (this arc: id=2).

  echo "=== 0c. storage-class NIC link state on all four hosts (drives Phase 3) ==="
  for SID in 4na83t qdbqd6 h8frng tmsafc; do echo "  -- $SID --"
    maas admin interfaces read "$SID" | jq -r '.[] | select(.name|test("^enp(8|9|10)s0$"))
      | "    \(.name)\tid=\(.id)\tlinks=\([.links[]?|{(.subnet.cidr):.ip_address}])"'
  done   # enp8s0(data) is the one KNOWN unlinked + a HARD deploy prereq; enp9s0/enp10s0 usually already linked
} )
# 0d. OSD-wipe pre-flight gate -- post-teardown these are "shut off"; vdb is root:root / 600. (R7: sudo)
for host in openstack0 openstack1 openstack2 openstack3; do
  f="/var/lib/libvirt/images/${host}-1.qcow2"
  printf '%-46s state=%s owner=%s mode=%s\n' "$f" \
    "$(sudo virsh -c qemu:///system domstate "$host" 2>/dev/null)" \
    "$(sudo stat -c '%U:%G' "$f" 2>/dev/null)" "$(sudo stat -c '%a' "$f" 2>/dev/null)"
done   # expect (AFTER Phase 1 release): 4 lines, state=shut off, owner=root:root, mode=600

Phase 1 -- Teardown (D-018) DISCRETE / DESTRUCTIVE

# RUN: jumphost

# A. pre-destroy capture (reference only; NOT for restore)
TS=$(date -u +%Y%m%dT%H%M%SZ); BACKUP_DIR=$HOME/backups/pre-caracal-destroy-$TS; mkdir -p "$BACKUP_DIR"
juju export-bundle > "$BACKUP_DIR/bundle-pre-destroy.yaml"
juju status --format=yaml > "$BACKUP_DIR/juju-status-pre-destroy.yaml"
for f in "$BACKUP_DIR"/*.yaml; do [ -s "$f" ] || echo "WARNING: $f empty"; done
echo "$BACKUP_DIR" > "$HOME/.last-pre-caracal-destroy-backup"; ls -la "$BACKUP_DIR"
# B. destroy the openstack model (returns ~1-2 min; reaping ~5-10 min background). Controller untouched.
juju destroy-model openstack --force --no-wait --destroy-storage --no-prompt
# C. release the FOUR openstack hosts by system_id (DOCFIX-017: hardcoded ids, no whoami). NOT capi-mgmt.
for SID in 4na83t qdbqd6 h8frng tmsafc; do
  echo "Releasing $SID..."; maas admin machine release "$SID" comment="Caracal rebuild teardown $TS"
done
# D. verify
juju models   # expect: no 'openstack' (allow a few min)
maas admin machines read \
  | jq -r '.[] | select(.hostname|test("^openstack[0-3]$")) | "\(.hostname)\t\(.status_name)"' | sort
  # expect four lines, each ending "Ready"

GATE: juju models shows no openstack; openstack0-3 all Ready. (link-subnet is REJECTED on a Deployed machine -- Phases 2-3 REQUIRE Ready.) If the model is still destroying after ~10 min: juju machines -m openstack --format=yaml, then juju remove-machine -m openstack --force <id> for each lingering id, then re-run the destroy-model in B.

Phase 2 -- OSD secondary-disk wipe (clean-slate Ceph) DISCRETE / DESTRUCTIVE

# RUN: jumphost (libvirt host; R7 sudo) Only after Phase 0d is GREEN (all "shut off") AND explicit go. vda (the OS disk) is NOT touched -- MAAS reinstalls it on deploy; only vdb (the OSD target) is recreated blank.

OWNER="root:root"; MODE="600"
for host in openstack0 openstack1 openstack2 openstack3; do
  f="/var/lib/libvirt/images/${host}-1.qcow2"
  echo "=== Wiping $f ==="
  sudo rm -f "$f"
  sudo qemu-img create -f qcow2 "$f" 512G
  sudo chown "$OWNER" "$f"; sudo chmod "$MODE" "$f"
  sudo ls -lh "$f"
done
# verify
for host in openstack0 openstack1 openstack2 openstack3; do
  sudo qemu-img info "/var/lib/libvirt/images/${host}-1.qcow2" | grep -E 'virtual size|disk size'
done

GATE: 4 files, ~200 KiB actual / 512 GiB virtual, root:root mode 600.

# RUN: jumphost Links every storage-class NIC to its space's subnet. enp8s0 (data) is the one KNOWN unlinked and a HARD deploy prereq (nova-compute:neutron-plugin->data, octavia:ovsdb-cms->data, chassis data bindings). enp9s0/enp10s0 back the C2 Ceph public/cluster bindings; this links them too only if not already linked.

declare -A NIC_CIDR=( [enp8s0]=10.12.12.0/22 [enp9s0]=10.12.16.0/22 [enp10s0]=10.12.20.0/22 )
declare -A HOST_OCTET=( [4na83t]=40 [qdbqd6]=41 [h8frng]=42 [tmsafc]=43 )
declare -A HN=( [4na83t]=openstack0 [qdbqd6]=openstack1 [h8frng]=openstack2 [tmsafc]=openstack3 )

for SID in 4na83t qdbqd6 h8frng tmsafc; do
  echo "=== ${HN[$SID]} ($SID) ==="
  IFJSON=$(maas admin interfaces read "$SID")
  for NIC in enp8s0 enp9s0 enp10s0; do
    cidr="${NIC_CIDR[$NIC]}"; prefix="${cidr%.0/22}"; ip="${prefix}.${HOST_OCTET[$SID]}"
    ifid=$(echo "$IFJSON" | jq -r --arg n "$NIC" '.[]|select(.name==$n)|.id')
    if [ -z "$ifid" ]; then echo "  $NIC: NOT FOUND -- inspect 'maas admin interfaces read $SID'"; continue; fi
    linked=$(echo "$IFJSON" | jq -r --arg c "$cidr" --argjson id "$ifid" \
              '[.[]|select(.id==$id).links[]?|select(.subnet.cidr==$c)]|length')
    if [ "$linked" != "0" ]; then echo "  $NIC id=$ifid already on $cidr -- SKIP"; continue; fi
    subid=$(maas admin subnets read | jq -r --arg c "$cidr" '.[]|select(.cidr==$c)|.id')
    echo "  $NIC id=$ifid -> $ip (subnet id=$subid, $cidr)"
    maas admin interface link-subnet "$SID" "$ifid" mode=STATIC subnet="$subid" ip_address="$ip"
  done
done

# verify -- every host should now show data/storage/replication links
for SID in 4na83t qdbqd6 h8frng tmsafc; do
  echo "=== ${HN[$SID]} ($SID) ==="
  maas admin interfaces read "$SID" \
    | jq -r '.[] | select(.name|test("^enp(8|9|10)s0$")) | "  \(.name)\t\([.links[]?|{(.subnet.cidr):.ip_address}])"'
done

GATE: each host's enp8s0/enp9s0/enp10s0 shows a 10.12.{12,16,20}.4N STATIC link.

Phase 4 -- MAAS VIP/FIP address carve (mutation; confirm-first)

# RUN: jumphost The bundle's VIPs live in the front-loaded /26 blocks; the FIP pool (phase-04) lives at 10.12.5.0-10.12.7.254. These MAAS reservations persist across teardown, so on a repeat rebuild they usually already exist -- verify, create only if absent, and delete the stale old-scheme reservation. (KI-P3-001: a reserved range stops MAAS auto-static landing a primary on a configured VIP.)

# 4a. verify current state
maas admin ipranges read | jq -r '.[] | "id=\(.id)\t\(.type)\t\(.start_ip)-\(.end_ip)\tsubnet=\(.subnet.cidr // "?")\t\(.comment // "")"' | sort
#   want present: provider .4.2-.63 (subnet 1), metal .8.2-.63 (subnet 2), provider FIP .5.0-.7.254.
#   want absent : metal .8.224-.254 (stale).
# 4b. create the front-loaded VIP reservations ONLY if absent (idempotent; carve doc section 8)
( {
  RANGES="$(maas admin ipranges read)"
  [ -n "$RANGES" ] || { echo "ipranges read failed/empty -- ABORT (do not create blind)"; exit 1; }
  # provider VIPs 10.12.4.2-.63 (subnet 1)
  if printf '%s' "$RANGES" | jq -e '.[]|select(.start_ip=="10.12.4.2" and .end_ip=="10.12.4.63")' >/dev/null; then
    echo "provider .4.2-.63 present -- SKIP"
  else
    maas admin ipranges create type=reserved subnet=1 start_ip=10.12.4.2 end_ip=10.12.4.63 \
      comment="OpenStack public API HA VIPs (front-loaded /26; supersedes .224-.236)"
  fi
  # metal VIPs 10.12.8.2-.63 (subnet 2)
  if printf '%s' "$RANGES" | jq -e '.[]|select(.start_ip=="10.12.8.2" and .end_ip=="10.12.8.63")' >/dev/null; then
    echo "metal .8.2-.63 present -- SKIP"
  else
    maas admin ipranges create type=reserved subnet=2 start_ip=10.12.8.2 end_ip=10.12.8.63 \
      comment="OpenStack internal/admin API HA VIPs (front-loaded /26; supersedes D-020 .224-.254)"
  fi
} )
# 4c. delete the stale .224-.254 metal reservation -- CONFIRM the id from 4a first (this arc: id=2)
#   maas admin iprange delete <stale-id>

GATE: ipranges read shows provider FIP + provider VIPs .4.2-.63 + metal VIPs .8.2-.63; the metal .8.224-.254 reservation is gone; the metal DHCP dynamic (10.12.9.0-10.12.11.254) is unchanged.

Phase 5 -- Post-prep verification (READ-ONLY gate before deploy)

# RUN: jumphost

( {
  juju spaces                                              # 5 spaces present
  maas admin machines read | jq -r '.[]|select(.hostname|test("^openstack[0-3]$"))|"\(.hostname)\t\(.status_name)"' | sort   # all Ready
  for SID in 4na83t qdbqd6 h8frng tmsafc; do echo "-- $SID --"
    maas admin interfaces read "$SID" | jq -r '.[]|select(.name|test("^enp(8|9|10)s0$"))|"  \(.name)\t\([.links[]?|{(.subnet.cidr):.ip_address}])"'
  done                                                     # data/storage/replication links on all four
  for host in openstack0 openstack1 openstack2 openstack3; do
    sudo qemu-img info "/var/lib/libvirt/images/${host}-1.qcow2" | grep -E 'virtual size|disk size'
  done                                                     # OSD 512G blank
} )

EXIT GATE (phase-00 complete)

  • juju models shows no openstack; openstack0-3 all Ready.
  • OSD vdb files 512 GiB blank (root:root, 600) on all four hosts.
  • enp8s0/enp9s0/enp10s0 linked (10.12.{12,16,20}.4N STATIC) on all four.
  • MAAS carve: front-loaded VIP /26 reserved on provider + metal; FIP pool reserved; stale .224-.254 gone.
  • Clean slate ready for phase-01 (deploy). NOTE: the deploy uses ONE overlay (octavia-pki only) -- NOT the vr0-dc0-testcloud overlay (R10; that overlay's intent is folded into the hardened base bundle).

As-built reference (rebuild-prep arc -- audit trail)

  • Teardown D-018: juju destroy-model openstack --force --no-wait --destroy-storage --no-prompt; release the four hosts by system_id (capi-mgmt left Ready).
  • OSD wipe proven 2026-05-22, re-run 2026-05-30: 512G blank, root:root, 600.
  • NIC links: enp8s0 found UNLINKED this arc (the hard prereq); enp9s0/enp10s0 already linked. Reference enp8s0 ids (arc): openstack1=26, openstack2=32, openstack3=38; openstack0 resolved dynamically (the block does not depend on these).
  • MAAS carve: front-loaded .2-.63 reservations created earlier and persistent; stale metal .224-.254 was iprange id=2 (deleted after confirmation).

Next

phase-01 -- bundle deploy.