Newer
Older
openstack-caracal-ipv4 / runbooks / 00-pre-deploy.md

Runbook 00 — Pre-Deploy

Purpose

Prepare for a clean Caracal rebuild of the VR0 DC0 Omega Cloud. Capture all state needed for rollback, gracefully tear down dependent workloads, and verify the destination environment is ready before destroying the existing OpenStack model.

Prerequisites

  • SSH access to jumphost vopenstack-jesse as jessea123
  • admin-openrc and user1-openrc available in $HOME
  • Access to the Juju controller hosting the openstack model
  • Access to the capi-mgmt.maas k3s cluster (kubeconfig present)
  • NetBox IPv4 imports completed (per netbox/ipv4-prefixes-import.py)
  • NetBox VLAN imports completed (per netbox/vlans-import.py)

Phase 1 — Verify NetBox readiness (gating)

Run the verification path of the NetBox import scripts. Confirm all entries appear correctly scoped to VR0 DC0.

cd ~/vr0-dc0-caracal
NETBOX_URL=https://netbox.baldurkeep.com NETBOX_TOKEN=<token> \
  python3 netbox/ipv4-prefixes-import.py --verify-only
NETBOX_URL=https://netbox.baldurkeep.com NETBOX_TOKEN=<token> \
  python3 netbox/vlans-import.py --verify-only

Expected: all prefixes and VLANs report scope-OK, no MISSING entries.

Phase 2 — Capture current state

Backups needed for potential rollback:

# Vault unseal keys and root CA cert
juju ssh vault/0 -- sudo cat /var/snap/vault/common/vault.crt > ~/backups/$(date +%F)/vault-root-ca.crt
# (Unseal keys MUST be on file from initial Vault setup; verify presence)
ls -la ~/.vault-keys

# Export current bundle
juju export-bundle --model openstack > ~/backups/$(date +%F)/bundle-pre-rebuild.yaml

# Snapshot of current 'juju status'
juju status --model openstack --format=yaml > ~/backups/$(date +%F)/juju-status-pre-rebuild.yaml

# Inventory of FIPs and tenant resources we might want to recreate
source ~/admin-openrc
openstack floating ip list -c "Floating IP Address" -c "Fixed IP Address" \
  -c "Project" -f csv > ~/backups/$(date +%F)/floating-ips.csv
openstack server list --all-projects -c ID -c Name -c Project -c Status -f csv \
  > ~/backups/$(date +%F)/servers.csv
openstack network list --all-projects -c ID -c Name -c Project -f csv \
  > ~/backups/$(date +%F)/networks.csv
openstack loadbalancer list -c id -c name -c project_id -c vip_address -f csv \
  > ~/backups/$(date +%F)/loadbalancers.csv

Phase 3 — KVM snapshots of openstack0-3

From the jumphost (which is the hypervisor):

for vm in openstack0 openstack1 openstack2 openstack3; do
  sudo virsh snapshot-create-as --domain "$vm" \
    --name "pre-caracal-rebuild-$(date +%F)" \
    --description "Pre-Caracal rebuild baseline" \
    --atomic
done
sudo virsh snapshot-list openstack0

These snapshots are the disaster-recovery point.

Phase 4 — Graceful CAPI workload teardown (D-013)

Delete the CAPI workload cluster cleanly so its OpenStack resources (LBs, FIPs, volumes, Octavia members) are released by CAPI controllers before model destroy.

export KUBECONFIG=~/magnum-capi/phase3/capi-mgmt-cluster.kubeconfig
# (Adjust path if kubeconfig has moved)

# Delete the workload cluster — CAPI handles tenant OpenStack cleanup
kubectl delete cluster capi-mgmt-cluster -n default
# Wait for finalizers; this may take ~10 minutes
kubectl wait --for=delete cluster/capi-mgmt-cluster -n default --timeout=15m

Verify on the OpenStack side that resources were released:

source ~/admin-openrc
openstack server list --all-projects | grep -i capi || echo "No CAPI servers remaining"
openstack loadbalancer list | grep -i capi || echo "No CAPI LBs remaining"
openstack floating ip list -c "Floating IP Address" -c "Fixed IP Address" -f csv

Phase 5 — Preserve capi-mgmt.maas itself

The bootstrap k3s + CAPI controllers on capi-mgmt.maas are NOT destroyed — they will be re-used post-rebuild as the Magnum CAPI mgmt plane. Verify the controllers are still healthy:

ssh capi-mgmt.maas -- sudo kubectl --kubeconfig /etc/rancher/k3s/k3s.yaml \
  get pods -A

Confirm:

  • capi-system namespace pods Running
  • capo-system (CAPI OpenStack provider) pods Running
  • cert-manager pods Running
  • orc-system (OpenStack Resource Controller) pods Running

Phase 6 — Final go/no-go checklist

Do not proceed to runbooks/01-destroy-model.md until all of the following pass:

  • NetBox verification clean
  • Vault unseal keys backed up and verified readable
  • bundle-pre-rebuild.yaml exists and is non-empty
  • juju-status-pre-rebuild.yaml shows desired-pre-destroy state captured
  • All four KVM snapshots created (virsh snapshot-list confirms)
  • CAPI workload cluster deletion completed (kubectl get cluster returns
    "no resources found")
  • OpenStack-side resources from CAPI workload are released (no orphaned LBs,
    FIPs, volumes)
  • capi-mgmt.maas k3s cluster controllers all Running

Notes

  • Snapshot disk space consumption can grow significantly during the rebuild window. Verify free space on /var/lib/libvirt/images prior to running the rebuild deploy.
  • If Vault unseal keys cannot be located, STOP. A failed Vault re-init without the original keys means lost issued certificates and is destructive to any data sealed under the existing root key. This MUST be confirmed before model destroy.