Newer
Older
openstack-caracal-ipv4 / runbooks / phase-00-teardown-maas-reset.md

Phase 00 -- Teardown + Pattern A Reset (D-060)

Destroy the openstack Juju model, delete the orphaned capi-mgmt MAAS machine, and reset the four hosts (openstack0-3) to a clean, Pattern A, deploy-ready state on the EXISTING D-052/D-053 plane scheme. This is the rebuild-prep window -- it runs BEFORE phase-01.

The deterministic, repeatable work is owned by SCRIPTS (each resolves every id live and is dry-run by default); the destructive juju + libvirt steps stay human-gated here. Run from jumphost vopenstack-jesse (user jessea123, sudo; also the libvirt hypervisor); MAAS CLI logged in as profile admin.

Decisions: D-060 (Pattern A revert; supersedes D-057/D-058), D-018 (MAAS-release-direct teardown), D-017 (full rebuild every cycle, nothing preserved). The live MAAS is already on D-052/D-053, so there is NO re-CIDR -- the carve re-establishes Pattern A interfaces on the existing subnets.

!!! DESTRUCTIVE. The model destroy + host release (Step 2) and the OSD wipe (Step 4) are irreversible. There is NO model-state rollback (D-017): the repo runbooks ARE the tested restore path. Each destructive step is DISCRETE and individually gated -- do not batch.

CAPI-MGMT: the orphaned capi-mgmt MAAS machine (retired D-033 out-of-cloud node) is DELETED by scripts/phase-00-teardown.sh -- it is no longer left Ready. The in-cloud capi-mgmt-v2 tenant VM (phase-06) dies with the model.


Sequence (this phase)

1. Pre-flight            (read-only; baseline)
2. Teardown              scripts/phase-00-teardown.sh --apply   [DESTRUCTIVE: model + capi-mgmt]
   -- hosts release to MAAS Ready, powered off --
3. 8_lbaas net removal   one-off jumphost op                    [hosts off]
4. OSD secondary wipe    vdb -> blank 512G                      [DESTRUCTIVE; hosts off]
5. Pattern A re-carve    scripts/carve-host-interfaces.sh <host> --apply   [hosts Ready]
6. Standup + bundle gate scripts/phase-00-maas-standup.sh ; provider-bundle-check.py  [read-only]
   -> EXIT GATE -> phase-01 deploy

Steps 3-6 all run with the hosts Ready and powered off (the carve edits MAAS interface metadata; 8_lbaas + OSD operate on libvirt). The phase-01 deploy powers the hosts on and applies the carved netplan.

Step 1 -- Pre-flight (READ-ONLY)

CHECK (read-only) -- jumphost

( {
  echo "=== six D-052/D-053 spaces (hard blocker if absent) ==="
  # expect: provider-public 10.12.4.0/22 | metal-admin 10.12.8.0/22 | metal-internal 10.12.12.0/22
  #       | data-tenant 10.12.16.0/22 | storage 10.12.32.0/22 | replication 10.12.36.0/22
  maas admin spaces read | jq -r '.[] | "\(.name)\t\([.subnets[]?.cidr] | join(", "))"' | sort

  echo "=== hosts + capi-mgmt status (baseline) ==="
  maas admin machines read | jq -r '.[]|select(.hostname|test("^(openstack[0-3]|capi-mgmt)$"))|"\(.hostname)\t\(.status_name)"' | sort

  echo "=== OSD vdb baseline (pre-teardown: running, libvirt-qemu:kvm) ==="
  for host in openstack0 openstack1 openstack2 openstack3; do
    f="/var/lib/libvirt/images/${host}-1.qcow2"
    printf '  %-46s state=%s owner=%s mode=%s\n' "$f" \
      "$(sudo virsh -c qemu:///system domstate "$host" 2>/dev/null)" \
      "$(sudo stat -c '%U:%G' "$f" 2>/dev/null)" "$(sudo stat -c '%a' "$f" 2>/dev/null)"
  done
} )

Step 2 -- Teardown (D-018 / D-060) DISCRETE / DESTRUCTIVE

scripts/phase-00-teardown.sh is the authority: it resolves the four host system_ids live (no hardcoded ids), HARD-EXCLUDES the management substrate (juju, lxd, tailscale), destroys the openstack model, and deletes the orphaned capi-mgmt machine. A pre-destroy juju export/status capture runs first (reference only; NOT a restore path).

CHECK (read-only) -- jumphost -- dry-run first (default; changes nothing)

scripts/phase-00-teardown.sh

Expect: the four openstack hosts listed as release targets + capi-mgmt as the delete target; PROTECTED juju / lxd / tailscale shown as excluded. Confirm the resolved system_ids look right before applying.

CAUTION: destroys the entire openstack Juju model and DELETES the capi-mgmt MAAS machine -- irreversible. Confirm you are on the test cloud, not Roosevelt.

RUN -- jumphost

scripts/phase-00-teardown.sh --apply

GATE: juju models shows no openstack; maas admin machines read shows openstack0-3 all Ready and capi-mgmt gone. (link-subnet is REJECTED on a Deployed machine -- the carve in Step 5 REQUIRES Ready.) If the model is still destroying after ~10 min: juju remove-machine -m openstack --force <id> for each lingering id, then re-run --apply.

Step 3 -- Remove the idle 8_lbaas libvirt network (hosts off) one-off

Each host still carries an idle virtio NIC on the isolated 8_lbaas libvirt network (bridge virbr6, no L3, ex-lbaas). MAAS has no lbaas space; the NIC is unused. Remove it now while the hosts are shut off. This is a one-off jumphost op (Roosevelt is bare metal, no libvirt nets) -- it is NOT part of any phase-00 script; log it to the as-executed log.

CAUTION: detaches a NIC from each host's persistent domain config and undefines a libvirt network. Reversible (XML backed up first); the detach uses --config only (no live change).

Use the two gated blocks from the as-executed log / session notes -- do NOT improvise an irreversible libvirt op:

  • Block 1: back up ~/8_lbaas-net.xml.bak; pre-check every host domstate = shut off (REFUSE otherwise); detach the idle NIC per host (virsh detach-interface <dom> network --mac <mac> --config); verify no domain still references 8_lbaas.
  • Block 2: virsh net-destroy 8_lbaas; virsh net-undefine 8_lbaas; confirm gone.

GATE: sudo virsh net-list --all shows no 8_lbaas; no host domain references it.

Step 4 -- OSD secondary-disk wipe (clean-slate Ceph) DISCRETE / DESTRUCTIVE

Only after Step 2 GATE is green (hosts Ready, shut off) AND explicit go. vda (the OS disk) is NOT touched -- MAAS reinstalls it on deploy; only vdb (the OSD target) is recreated blank.

CAUTION: deletes and recreates each host's vdb OSD disk (512G blank) -- destroys all Ceph OSD data. vda is untouched. Hosts must be shut off (post-release). (R7: sudo for qemu-img.)

RUN -- jumphost

OWNER="root:root"; MODE="600"
for host in openstack0 openstack1 openstack2 openstack3; do
  f="/var/lib/libvirt/images/${host}-1.qcow2"
  echo "=== Wiping $f ==="
  sudo rm -f "$f"
  sudo qemu-img create -f qcow2 "$f" 512G
  sudo chown "$OWNER" "$f"; sudo chmod "$MODE" "$f"
done
# verify
for host in openstack0 openstack1 openstack2 openstack3; do
  sudo qemu-img info "/var/lib/libvirt/images/${host}-1.qcow2" | grep -E 'virtual size|disk size'
done

GATE: 4 files, ~200 KiB actual / 512 GiB virtual, root:root mode 600.

Step 5 -- Pattern A interface re-carve (per host; machines Ready)

scripts/carve-host-interfaces.sh rebuilds each host's interface tree to Pattern A on the EXISTING D-052/D-053 subnets:

  • enp1s0 -> OVS br-ex + STATIC 10.12.4.N (provider-public) -- MAAS builds the OVS bridge; ovn-chassis consumes it (bridge-interface-mappings + physnet1:br-ex), API containers attach.
  • enp7s0 -> br-metal (STATIC 10.12.8.N) -> br-metal.103 -> br-internal (STATIC 10.12.12.N).
  • enp8s0 / enp9s0 / enp10s0 raw + STATIC on data 10.12.16.N / storage 10.12.32.N / replication 10.12.36.N.

It resolves every id live, is idempotent, and requires Ready (interface edits are rejected on Deployed).

CHECK (read-only) -- jumphost -- dry-run each host first (default)

for h in openstack0 openstack1 openstack2 openstack3; do scripts/carve-host-interfaces.sh "$h"; done

Expect: each plan ends Summary: 0 fatal; the provider plane shows create br-ex (OVS) parent=enp1s0 and br-ex -> STATIC 10.12.4.N; metal / internal / data / storage / replication statics as above. No br-prov-api, no enp1s0.104, no provider-vip.

CAUTION: mutates MAAS interface definitions on each host. Re-runnable (idempotent), but apply ONE host at a time and re-read the resulting tree.

RUN -- jumphost (per host)

scripts/carve-host-interfaces.sh openstack0 --apply
# then openstack1, openstack2, openstack3 (one at a time)

GATE: each host shows br-ex (type ovs) STATIC 10.12.4.N; br-metal 10.12.8.N; br-internal 10.12.12.N; enp8s0/enp9s0/enp10s0 STATIC on 10.12.16/32/36.N.

Step 6 -- Standup + bundle gate (READ-ONLY; before deploy)

CHECK (read-only) -- jumphost -- MAAS topology

scripts/phase-00-maas-standup.sh

Expect: no drift and OK (dryrun) -- topology consistent with D-052/D-053. Any DRIFT line is a hard stop (do not deploy onto a mis-bound plane).

CHECK (read-only) -- jumphost -- bundle invariants

python3 scripts/provider-bundle-check.py bundle.yaml

Expect: PASS -- 11 charms public->provider-public, .4/.8/.12 VIP triples, all 4 chassis MACs present (incl openstack0).


EXIT GATE (phase-00 complete)

  • juju models shows no openstack; openstack0-3 all Ready; capi-mgmt DELETED.
  • 8_lbaas libvirt network gone; no host domain references it.
  • OSD vdb files 512 GiB blank (root:root, 600) on all four hosts.
  • Pattern A interfaces on all four: br-ex (OVS) STATIC .4.N; br-metal .8.N; br-internal .12.N; data / storage / replication .16/.32/.36.N.
  • phase-00-maas-standup.sh reports no drift; provider-bundle-check.py PASSes.
  • Clean slate ready for phase-01. The deploy uses ONE overlay (octavia-pki) -- NOT the vr0-dc0-testcloud overlay (its intent is folded into the hardened base bundle).

Next

phase-01 -- bundle deploy.