Newer
Older
openstack-caracal-ipv4 / runbooks / v1-do-doc-01-prep.md

v1 Do-Document 01 — Pre-Deploy State Check

Status: First execution document of Batch A. Pure read-only verification. No cloud state is mutated.

Position in sequence: Runs after docs/v1-pre-deploy-fixes.md (the repo-hygiene fixes) have been committed and pushed. Runs before v1-do-doc-02-pki.md.

Replaces: runbooks/deprecated/00-pre-deploy.md (which contained references to D-013 graceful CAPI teardown, capi-mgmt preservation, and netbox/vlans-import.py — all superseded or removed).

Cross-references:

  • D-014 (repo path) — verifies clone matches
  • D-015 (v1/v2 fork) — v1 scope confirmed
  • D-017 (CAPI bootstrap cluster lifecycle) — every cycle is a full rebuild; no preservation
  • D-018 (teardown strategy) — MAAS-release-direct; pre-existing teardown verified

1. Purpose & scope

This document confirms the jumphost, repo, openrc files, Juju controller, and MAAS state are all in the expected pre-deploy posture before any cloud-touching execution begins.

What this document does:

  • Verifies jumphost identity and shell context
  • Verifies repo cloned at $HOME/openstack-caracal-ipv4 with credential helper configured
  • Verifies the repo is on main and pulled up to date (so the pre-deploy fixes from docs/v1-pre-deploy-fixes.md are local)
  • Verifies admin and user1 openrc files exist and source cleanly
  • Verifies the Juju controller for the openstack model is reachable
  • Verifies all 5 cloud-target VMs are MAAS-Ready
  • Explicitly acknowledges what is NOT in scope for this prep phase

What this document does NOT do:

  • Run NetBox imports (pinned for external NetBox-engineer review; not part of v1 deploy flow per netbox/README.md)
  • Take KVM snapshots (per D-017, every cycle is a full rebuild; no rollback target needed; pre-existing snapshots remain as a safety net but are not refreshed here)
  • Back up Vault unseal keys (the prior-cloud Vault keys are accepted lost per the Caracal_Rebuild handoff; the Caracal deploy will reinit Vault from scratch)
  • Graceful CAPI workload teardown (superseded by D-018)
  • Backup of juju export-bundle output (the canonical bundle is bundle.yaml in the repo; export is for diagnostics only and runs only if cloud is currently up)

Out of scope:

  • Bundle review (done; tracked in docs/v1-pre-deploy-fixes.md)
  • PKI generation (next document, v1-do-doc-02-pki.md)
  • Cloud destroy (runbooks/01-destroy-model.md; pre-existing teardown likely already done; verify-only path used by v1-do-doc-03)

2. Decisions captured

Decision Choice Roosevelt parallel
Snapshot strategy for this rebuild None taken here; existing KVM snapshots remain as safety net but per D-017 every cycle is a full rebuild N/A — Roosevelt is bare-metal; equivalent is MAAS re-deploy of all hosts
Vault key backup Skipped; prior keys accepted lost Vault will be re-initialized from scratch each Roosevelt rebuild too
NetBox import sequence Out of scope; pinned external Roosevelt's NetBox state will be set up before the bundle pulls values
State capture Skipped for this Caracal cycle (cloud is already down) On Roosevelt, capture juju export-bundle and juju status BEFORE the next teardown cycle

3. Prerequisites

Prereq Verification command
You are SSH'd into jumphost vopenstack-jesse as jessea123 hostname && id -un
$HOME is writable test -w "$HOME" && echo OK
git, juju, maas, openssl, and python3 available on PATH `for t in git juju maas openssl python3; do command -v $t echo "MISSING: $t"; done`
The openstack snap is installed (used in later runbooks; not strictly needed here) `snap list openstackclients 2>/dev/null echo "(not installed; v1-do-doc-04 will check)"`

If any prereq fails, stop and resolve before continuing.


4. Step-by-step state check

Each section is a copy-paste block. The blocks are read-only. No exit is used in interactive context; failures are reported as text and operator decides whether to proceed.

4.1 Jumphost identity

echo "=== Jumphost identity ==="
hostname
id -un
date -Iseconds
echo "=== Disk space ==="
df -h "$HOME" /var/lib/libvirt/images 2>/dev/null

Expected:

  • hostnamevopenstack-jesse (or your configured equivalent)
  • id -unjessea123
  • $HOME free space: at least 5 GB (octavia PKI, deploy logs, kubeconfigs all live here)
  • /var/lib/libvirt/images free space: enough for the existing OSD qcow2 files (already trimmed to ~200 KiB each per teardown)

4.2 Repository clone present at agreed path

export REPO="$HOME/openstack-caracal-ipv4"

if [ ! -d "$REPO/.git" ]; then
  echo "MISSING: $REPO does not contain a git checkout."
  echo "Run the agreed clone procedure (HTTPS clone with 72h credential cache) first:"
  echo "  cd \$HOME && git clone https://git.baldurkeep.com/git/OpenStack/openstack-caracal-ipv4.git"
  echo "  cd $REPO && git config credential.helper \"cache --timeout=259200\""
  echo ""
  echo "Stop here. Re-run this section after the clone."
else
  echo "[OK] Repo present at: $REPO"
  cd "$REPO"
  echo ""
  echo "=== Current branch ==="
  git branch --show-current
  echo ""
  echo "=== Remote URL ==="
  git remote get-url origin
  echo ""
  echo "=== Credential helper ==="
  git config --get credential.helper || echo "(no credential helper configured)"
  echo ""
  echo "=== Working tree status ==="
  git status --short
  echo ""
  echo "=== Latest commit ==="
  git log --oneline -1
fi

Expected:

  • Branch: main
  • Remote URL: https://git.baldurkeep.com/git/OpenStack/openstack-caracal-ipv4.git
  • Credential helper: cache --timeout=259200 (or however configured)
  • Working tree status: empty (clean)
  • Latest commit: should reflect the most recent pre-deploy-fixes commit (the one that moved 8 runbooks to deprecated/)

If branch is not main, the working tree is dirty, or commits are behind the remote, stop and reconcile before continuing.

4.3 Pre-deploy fixes are present locally

cd "$REPO"

echo "=== ceph-osd block (after pre-deploy fix): should NOT contain a 'storage:' line ==="
grep -A 12 "^  ceph-osd:" bundle.yaml
echo ""

echo "=== Verify ceph-osd has no storage block ==="
grep -A 12 "^  ceph-osd:" bundle.yaml | grep "^    storage:" \
  && echo "[FAIL] storage block still present under ceph-osd" \
  || echo "[OK] no storage block in ceph-osd"
echo ""

echo "=== expected-osd-count on ceph-mon: should be 4 ==="
grep -A 8 "^  ceph-mon:" bundle.yaml | grep "expected-osd-count"
echo ""

echo "=== VIP grep: should return 12 ==="
grep -cE "^[[:space:]]+vip: 10\.12\.4\." bundle.yaml
echo ""

echo "=== D-002 Vault row dedup: dedicated row remains, OS-core row no longer mentions vault ==="
grep -c "Vault.*1.8/stable" docs/design-decisions.md
grep "magnum, vault)" docs/design-decisions.md && echo "[FAIL] vault still in OS-core row" || echo "[OK] vault not in OS-core row"
echo ""

echo "=== D-014 repo path: should be OpenStack/openstack-caracal-ipv4 ==="
grep -A 3 "^## D-014" docs/design-decisions.md | grep "Repo path:"
echo ""

echo "=== Deprecated runbooks moved: count should be 8 (or 9 with README) ==="
ls runbooks/deprecated/ 2>/dev/null | wc -l

Expected:

  • ceph-osd block shows options.osd-devices: /dev/vdb and NO storage: block
  • storage: count under ceph-osd: 0
  • expected-osd-count: 4
  • VIP grep returns 12
  • D-002: one Vault.*1.8/stable match; OS-core row does NOT mention vault
  • D-014: shows OpenStack/openstack-caracal-ipv4
  • runbooks/deprecated/ contains 8 runbooks + 1 README

If any of these don't match, the pre-deploy fixes did not land correctly. Stop and reconcile (re-pull, re-apply missing commits).

4.4 Openrc files present and source cleanly

echo "=== admin openrc ==="
if [ -f "$HOME/admin-openrc" ]; then
  echo "[OK] $HOME/admin-openrc exists"
  ( source "$HOME/admin-openrc"; \
    env | grep -E "^OS_" | grep -v PASSWORD | sort )
else
  echo "[MISSING] $HOME/admin-openrc"
fi

echo ""
echo "=== user1 openrc ==="
if [ -f "$HOME/user1-openrc" ]; then
  echo "[OK] $HOME/user1-openrc exists"
  ( source "$HOME/user1-openrc"; \
    env | grep -E "^OS_" | grep -v PASSWORD | sort )
else
  echo "[MISSING] $HOME/user1-openrc"
fi

Expected:

  • Both files exist and contain OS_AUTH_URL, OS_USERNAME, OS_PROJECT_NAME, OS_USER_DOMAIN_NAME, OS_PROJECT_DOMAIN_NAME, OS_IDENTITY_API_VERSION, OS_REGION_NAME (and OS_PASSWORD, which is filtered out of the display).
  • The subshell-source pattern (( source ...; env ... )) prevents the openrc from polluting your interactive shell environment.

Note: these openrc files target the prior Bobcat cloud (or whatever was last running). They will NOT work against the Caracal cloud until that cloud is up. They're verified here as a sanity check that the files exist for the operator. New openrc files will be generated as part of v1-do-doc-09-tenant.md (Batch D) using the new Caracal endpoints.

4.5 Juju controller reachable

echo "=== Juju controllers ==="
juju controllers
echo ""
echo "=== Current controller ==="
juju show-controller 2>/dev/null | head -20
echo ""
echo "=== Models on current controller ==="
juju models

Expected:

  • At least one controller is listed.
  • The controller should be in Available state.
  • The openstack model may or may not exist depending on whether the prior teardown completed. Per the Caracal_Rebuild handoff (2026-05-27 verification), it does not exist — that's expected post-teardown.

If juju controllers errors, the controller is unreachable. Stop and resolve juju access before continuing.

4.6 MAAS state of cloud-target VMs

echo "=== MAAS profile ==="
maas list | head -5
export MAAS_PROFILE=$(maas list | awk 'NR==1 {print $1}')
echo "Using MAAS_PROFILE=$MAAS_PROFILE"
echo ""

echo "=== Cloud-target VMs (openstack0-3 + capi-mgmt): expect 5 in Ready ==="
maas "$MAAS_PROFILE" machines read 2>/dev/null \
  | python3 -c "
import json, sys
machines = json.load(sys.stdin)
targets = ['openstack0', 'openstack1', 'openstack2', 'openstack3', 'capi-mgmt']
print(f'{\"hostname\":<15} {\"status\":<15} {\"owner\":<15} {\"system_id\":<25}')
print('-' * 70)
seen = []
for m in machines:
    h = m.get('hostname', '')
    if h in targets:
        seen.append(h)
        status = m.get('status_name', '')
        owner = m.get('owner', '') or '(none)'
        sid = m.get('system_id', '')
        print(f'{h:<15} {status:<15} {owner:<15} {sid:<25}')
missing = [t for t in targets if t not in seen]
if missing:
    print(f'\nMISSING from MAAS: {missing}')
"

Expected:

  • All 5 hostnames appear: openstack0, openstack1, openstack2, openstack3, capi-mgmt
  • Status for all 5: Ready
  • Owner for all 5: (none) (unowned, ready for deploy)

If any VM is not Ready or has an owner, stop. The teardown either did not complete or something has acquired the VMs since.

4.7 Acknowledged "not done here" items

This step prints a checklist for the operator to mentally acknowledge before proceeding:

cat <<'EOF'
=== Items NOT done in this prep phase (acknowledged) ===

[ ] NetBox imports — pinned for external NetBox-engineer review; not blocking
    v1 deploy. NetBox state is whatever the engineer has set up.

[ ] KVM snapshots — per D-017, every cycle is a full rebuild; no rollback
    target needed for this deploy. Pre-existing KVM-level snapshots (from
    prior cycles) remain on disk as a safety net but are not refreshed here.

[ ] Vault unseal key backup — prior Vault keys are accepted lost per the
    Caracal_Rebuild handoff. Caracal deploy will run `vault operator init`
    fresh in v1-do-doc-05.

[ ] Graceful CAPI workload teardown — D-018 supersedes D-013; teardown is
    MAAS-release-direct, not graceful. The teardown that happened pre-2026-05-27
    used the D-018 path.

[ ] juju export-bundle / juju status capture — the cloud is currently down,
    so there is nothing to capture. On future cycles where the cloud is up
    pre-teardown, those captures happen in runbook 01 Phase A.

Proceed only if every item above is acknowledged.
EOF

5. Acceptance criteria — go/no-go for v1-do-doc-02

The following must all be true before proceeding:

  • §4.1: Identity correct, disk space sufficient
  • §4.2: Repo cloned at $HOME/openstack-caracal-ipv4, on main, clean, with credential helper configured
  • §4.3: All six pre-deploy fixes verified present in the local checkout (storage block removed, expected-osd-count=4, VIPs=12, D-002 dedup, D-014 path, 8 deprecated files)
  • §4.4: Both openrc files exist and source cleanly (content of openrc not yet validated against new cloud — that's v1-do-doc-04+)
  • §4.5: Juju controller reachable
  • §4.6: All 5 cloud-target VMs MAAS-Ready, unowned
  • §4.7: "Not done here" items all acknowledged

If all checked, proceed to v1-do-doc-02-pki.md.


6. Roosevelt deltas (forward-look)

Aspect Testcloud (v1) Roosevelt
Jumphost vopenstack-jesse KVM VM TBD — likely a bastion VM or operator workstation
Cloud target VMs 5 KVM VMs in libvirt on this jumphost bare-metal MAAS-managed servers at the Roosevelt site
MAAS profile name Single profile on the local MAAS May be a Roosevelt-specific MAAS instance
openrc files Manually-maintained $HOME/admin-openrc, $HOME/user1-openrc Vault-issued or app-credential-based with rotation
KVM snapshots Optional safety net N/A — equivalent is MAAS re-deploy
Vault unseal keys Generated fresh each Caracal cycle Managed via Vault's own backup mechanism

7. Change log

Date Change Reference
2026-05-27 Document created. Replaces stale runbook 00 (D-013 graceful teardown + capi-mgmt preservation, both superseded). Batch A drafting