Newer
Older
openstack-caracal-ipv4 / runbooks / v1-do-doc-03-destroy.md

v1 Do-Document 03 — Teardown / Pre-Deploy Cloud State

Status: First execution document of Batch B. Conditional execution — current cloud state determines whether to skip or run the existing teardown runbook.

Position in sequence: Runs after v1-do-doc-02-pki.md (overlay generated). Runs before v1-do-doc-04-deploy.md (juju deploy).

Type: Thin pointer to the still-authoritative runbooks/01-destroy-model.md. This doc adds the state-detection and routing logic; the destructive procedure itself lives in the existing runbook.

Cross-references:

  • D-017 (CAPI bootstrap cluster lifecycle) — every cycle is a full rebuild
  • D-018 (teardown strategy) — skip graceful, release MAAS directly
  • runbooks/01-destroy-model.md — the authoritative procedure (Phases A→D)

1. Purpose & scope

This document determines whether the existing cloud needs to be torn down before deploy, and routes to runbooks/01-destroy-model.md if so.

There are exactly three possible states going into this step:

State Description Routing
Clean Cloud already down. All 5 VMs MAAS-Ready, no openstack Juju model. Skip to §3 verification → §4 acceptance
Dirty Cloud up (any unit deployed, any Juju model present). Execute runbook 01-destroy-model.md in full → return to §3
Partial Some intermediate state (model destroying, machines not all Ready, etc.). Execute runbook 01 Phase D remediation; resolve before continuing

For the current first Caracal cycle (post-2026-05-27): state is Clean (verified in v1-do-doc-01-prep.md §4.5 + §4.6). This document is effectively a one-step verification.

For future rebuild cycles: state will be Dirty after each Caracal deploy. This document then routes through the full runbook 01 procedure.

Out of scope:

  • The procedure itself — owned by runbooks/01-destroy-model.md
  • KVM snapshot capture (per D-017, full rebuild every cycle; pre-existing KVM snapshots remain on disk but are not refreshed)

2. State detection

export REPO="$HOME/openstack-caracal-ipv4"
cd "$REPO"

echo "=== Juju model state ==="
if juju models 2>/dev/null | grep -qE "^openstack(\*| )"; then
  JUJU_STATE="present"
  juju models | grep "^openstack" || true
else
  JUJU_STATE="absent"
  echo "[OK] no 'openstack' model on current controller"
fi
echo "JUJU_STATE=$JUJU_STATE"
echo ""

echo "=== MAAS machine state ==="
export MAAS_PROFILE=$(maas list 2>/dev/null | awk 'NR==1 {print $1}')
if [ -z "$MAAS_PROFILE" ]; then
  echo "[FAIL] no MAAS profile logged in. Run 'maas login <profile> <url> <key>' first."
fi

READY_COUNT=$(maas "$MAAS_PROFILE" machines read 2>/dev/null \
  | python3 -c "
import json, sys
machines = json.load(sys.stdin)
targets = ['openstack0', 'openstack1', 'openstack2', 'openstack3', 'capi-mgmt']
ready = [m for m in machines if m.get('hostname') in targets and m.get('status_name') == 'Ready']
print(len(ready))
")

echo "Cloud-target VMs in Ready state: $READY_COUNT / 5"
if [ "$READY_COUNT" -eq 5 ]; then
  MAAS_STATE="all_ready"
elif [ "$READY_COUNT" -eq 0 ]; then
  MAAS_STATE="all_deployed_or_other"
else
  MAAS_STATE="partial"
fi
echo "MAAS_STATE=$MAAS_STATE"
echo ""

echo "=== Routing decision ==="
case "$JUJU_STATE:$MAAS_STATE" in
  "absent:all_ready")
    echo "[CLEAN] Cloud is already torn down. Skip to §3 verification."
    ;;
  "present:all_ready")
    echo "[ANOMALY] Juju has the openstack model but MAAS shows machines Ready."
    echo "          Likely: model is in 'destroying' state. Run runbook 01 Phase D remediation."
    ;;
  "absent:all_deployed_or_other")
    echo "[ANOMALY] No Juju model, but machines are not Ready."
    echo "          Likely: someone else owns the machines, OR commissioning is mid-flight."
    echo "          Investigate before continuing."
    ;;
  "present:all_deployed_or_other")
    echo "[DIRTY] Cloud is up. Execute runbook 01-destroy-model.md in full."
    ;;
  *":partial")
    echo "[PARTIAL] Mixed MAAS state. Use runbook 01 Phase C (release loop) and Phase D verification."
    ;;
  *)
    echo "[UNKNOWN] state=$JUJU_STATE:$MAAS_STATE — investigate before continuing"
    ;;
esac

3. Execute (or verify) the teardown procedure

3.1 If state is CLEAN (current Caracal cycle)

No execution needed. Skip to §4 acceptance.

Optional sanity check (read-only):

echo "=== Confirm no 'openstack' model ==="
juju models | grep "^openstack" && echo "[FAIL] model still present" || echo "[OK] no openstack model"

echo ""
echo "=== Confirm all 5 VMs Ready ==="
maas "$MAAS_PROFILE" machines read 2>/dev/null \
  | python3 -c "
import json, sys
machines = json.load(sys.stdin)
targets = ['openstack0', 'openstack1', 'openstack2', 'openstack3', 'capi-mgmt']
for m in machines:
    h = m.get('hostname', '')
    if h in targets:
        print(f'{h}: {m.get(\"status_name\", \"?\")}, owner={m.get(\"owner\") or \"(none)\"}')
"

echo ""
echo "=== Confirm OSD qcow2 files still exist (should be ~200 KiB each after wipe) ==="
ls -la /var/lib/libvirt/images/openstack[0-3]-1.qcow2 2>/dev/null \
  || echo "[NOTE] OSD qcow2 files not visible to current user; check from jumphost user if needed"

3.2 If state is DIRTY (future rebuild cycles)

Execute runbooks/01-destroy-model.md in full, in order:

  1. Phase A — Pre-destroy capture (~30 sec). Captures juju export-bundle, juju status, juju models, juju controllers to $HOME/backups/pre-caracal-destroy-<TS>/. Updates $HOME/.last-pre-caracal-destroy-backup pointer.

  2. Phase B — Force-destroy the Juju model. Returns in ~1-2 min; reaping continues for ~5-10 min. Command:

    juju destroy-model openstack --force --no-wait --destroy-storage --no-prompt
  3. Phase C — Release MAAS machines (parallel with Phase B; ~5 min). Either Path 1 (MAAS UI) or Path 2 (CLI loop). The CLI loop is filtered by owner — only releases machines you own.

  4. Phase D — Verification (~1 min). Confirms model is gone and all 5 VMs are Ready.

Critical pre-flight before Phase B: verify you are about to destroy the right model. The destruction is not undoable short of restoring KVM snapshots:

juju models
juju status --model openstack 2>/dev/null | head -20

Confirm the model name and the unit counts match what you expect to lose.

3.3 If state is PARTIAL or ANOMALY

Use runbook 01 Phase D's remediation block (lines 161-173 of runbooks/01-destroy-model.md):

juju machines -m openstack --format=yaml 2>/dev/null

# For each lingering machine ID:
juju remove-machine -m openstack --force <id>

# Then re-attempt model removal:
juju destroy-model openstack --force --no-wait --no-prompt

If the model is still listed after the above, escalate — controller-side state may be corrupted (rare).


4. Acceptance criteria — go/no-go for v1-do-doc-04-deploy

Before proceeding to deploy:

  • juju models does NOT list openstack
  • All 5 VMs (openstack0, openstack1, openstack2, openstack3, capi-mgmt) report MAAS status Ready, owner (none)
  • If state was DIRTY: $HOME/.last-pre-caracal-destroy-backup exists and points to a populated backup directory
  • If OSD wipe is needed (rebuild after first Caracal cycle): verify the OSD qcow2 wipe procedure was executed per the 2026-05-22 protocol

If all checked, proceed to v1-do-doc-04-deploy.md.


5. Roosevelt deltas (forward-look)

Aspect Testcloud (v1) Roosevelt
Teardown target 5 KVM VMs on jumphost Bare-metal MAAS-managed servers
Phase C release libvirt VM owned by current MAAS user Bare-metal owned by current MAAS user
OSD qcow2 wipe Yes (libvirt secondary disks survive MAAS release) No (real disks; MAAS commissioning wipes them)
Backup directory $HOME/backups/pre-caracal-destroy-*/ Same convention, on bastion

6. Change log

Date Change Reference
2026-05-27 Document created. Thin pointer to runbooks/01-destroy-model.md with state-detection routing for clean/dirty/partial cases. Batch B drafting