Newer
Older
openstack-caracal-ipv4 / runbooks / v1-do-doc-04-deploy.md

v1 Do-Document 04 — Caracal Bundle Deploy

Status: Second execution document of Batch B. First cloud-mutating step in the v1 deploy sequence. Triggers MAAS provisioning of 4 hosts, LXD container creation, charm installation, and the initial relation cascade.

Position in sequence: Runs after v1-do-doc-03-destroy.md (state confirmed Clean: 5 VMs Ready, no openstack model). Runs before v1-do-doc-05-vault-init.md (manual Vault init).

Cross-references:

  • D-001 (Path 2A: Juju-bundle paradigm)
  • D-002 (channel matrix)
  • D-006 (Vault HA backend; etcd + easyrsa bootstrap)
  • D-007 (Magnum Layer A only at this step; Layer B is Batch C)
  • D-017 (CAPI bootstrap cluster lifecycle; not touched in this doc — that's Batch C)
  • bundle.yaml (canonical deploy artifact)
  • overlays/octavia-pki.yaml (gitignored; from v1-do-doc-02)

1. Purpose & scope

Execute the Caracal-bundle deploy and watch the model settle to a known-incomplete state: every charm reaches active/idle EXCEPT those waiting on vault:certificates, which sit in blocked: 'certs not present yet' until Vault is initialized in v1-do-doc-05.

What this document does:

  • juju add-model openstack
  • juju deploy ./bundle.yaml --overlay overlays/octavia-pki.yaml
  • Watch the model settle (~60-90 minutes typical for this testcloud size)
  • Verify Octavia received the PKI material on disk (the explicit on-disk verification — §8)
  • Verify expected pre-Vault state: Vault charm blocked awaiting init; cert-dependent charms blocked awaiting Vault; non-cert-dependent charms active/idle

What this document does NOT do:

  • Initialize Vault (next doc, v1-do-doc-05)
  • Run any post-Vault functional verification of Octavia LB (deferred to v1-do-doc-02 §14, executed after v1-do-doc-05 completes)
  • Magnum Keystone domain setup (Batch C, v1-do-doc-06)
  • CAPI bootstrap or Magnum driver graft (Batch C)
  • Tenant resources (Batch D)

Out of scope:

  • KVM snapshot capture (per D-017, snapshots are not refreshed each cycle; pre-existing snapshots remain as last-resort safety net)
  • NetBox VIP IPAddress writes (pinned post-deploy for external NetBox-engineer review; D-010 relaxed for v1)

2. Decisions captured

Decision Choice Notes
Model name openstack Matches runbooks/01-destroy-model.md Phase B target
Deploy command juju deploy ./bundle.yaml --overlay overlays/octavia-pki.yaml One overlay; vr0-dc0-testcloud.yaml was a placeholder and is empty per pre-deploy review
--trust flag Not used Standard OpenStack charms on MAAS do not require bundle-level trust. If a specific charm needs it post-deploy (none expected for Caracal v1), apply targeted juju trust <app> then.
Settle wait Manual watch via juju status --watch 30s; ~60-90 min typical Charms cycle blocked → maintenance → active/idle
Expected pre-Vault end state Vault blocked; cert-relation consumers blocked; everything else active/idle See §7.3 for the explicit blocked-charm list
PKI on-disk verification Files-on-disk + fingerprint compare after Octavia config-changed hook completes §8 — the explicit operator-asked confirmation

3. Prerequisites

Prereq Verification
v1-do-doc-01-prep.md ✓ (state-check passed) Manual confirmation
v1-do-doc-02-pki.md ✓ (overlay generated) test -f "$REPO/overlays/octavia-pki.yaml"
v1-do-doc-03-destroy.md ✓ (no openstack model, 5 VMs Ready) Run §2 state-detection block from doc-03 again — should still return CLEAN
Pre-deploy fixes all committed and pulled locally Verified in v1-do-doc-01 §4.3

Shell context — paste once:

export REPO="$HOME/openstack-caracal-ipv4"
cd "$REPO"
echo "REPO=$REPO"
test -f overlays/octavia-pki.yaml && echo "[OK] PKI overlay present" || echo "[FAIL] missing overlay"
git status --short
# Expect: clean working tree

4. Pre-flight checks (all must pass)

These are READ-ONLY safety checks. Stop if any FAIL.

cd "$REPO"

echo "=== 4.1 bundle.yaml YAML parses ==="
python3 -c "import yaml; yaml.safe_load(open('bundle.yaml'))" \
  && echo "[OK] bundle.yaml parses" || echo "[FAIL] bundle.yaml YAML error"
echo ""

echo "=== 4.2 octavia-pki overlay YAML parses ==="
python3 -c "
import yaml
d = yaml.safe_load(open('overlays/octavia-pki.yaml'))
o = d['applications']['octavia']['options']
keys = sorted(o.keys())
expected = ['lb-mgmt-controller-cacert','lb-mgmt-controller-cert','lb-mgmt-issuing-ca-key-passphrase','lb-mgmt-issuing-ca-private-key','lb-mgmt-issuing-cacert']
print('Keys in overlay:', keys)
print('All 5 keys present:', keys == expected)
print('All values non-empty:', all(v for v in o.values()))
"
echo ""

echo "=== 4.3 ceph-osd has no storage block (pre-deploy fix #1) ==="
grep -A 12 "^  ceph-osd:" bundle.yaml | grep "^    storage:" \
  && echo "[FAIL] storage block present — pre-deploy fix not applied" \
  || echo "[OK] no storage block under ceph-osd"
echo ""

echo "=== 4.4 expected-osd-count is 4 (matches one OSD per host) ==="
grep -A 8 "^  ceph-mon:" bundle.yaml | grep "expected-osd-count: 4" \
  && echo "[OK] expected-osd-count: 4" \
  || echo "[FAIL] expected-osd-count is not 4"
echo ""

echo "=== 4.5 11 VIPs declared (Designate deferred to v2 per D-019) ==="
VIP_COUNT=$(grep -cE "^[[:space:]]+vip: 10\.12\.4\." bundle.yaml)
echo "VIP count: $VIP_COUNT (expect 11)"
echo ""

echo "=== 4.6 No model named 'openstack' exists ==="
juju models | grep "^openstack" \
  && echo "[FAIL] model 'openstack' already exists — re-run doc-03 destroy" \
  || echo "[OK] no openstack model"
echo ""

echo "=== 4.7 All 5 cloud-target VMs MAAS-Ready ==="
export MAAS_PROFILE=$(maas list 2>/dev/null | awk 'NR==1 {print $1}')
maas "$MAAS_PROFILE" machines read 2>/dev/null \
  | python3 -c "
import json, sys
machines = json.load(sys.stdin)
targets = ['openstack0', 'openstack1', 'openstack2', 'openstack3', 'capi-mgmt']
ready = sum(1 for m in machines if m.get('hostname') in targets and m.get('status_name') == 'Ready' and not m.get('owner'))
print(f'Ready + unowned: {ready} / 5')
print('[OK]' if ready == 5 else '[FAIL]')
"
echo ""

echo "=== 4.8 Juju controller available ==="
juju controllers
juju show-controller 2>/dev/null | head -10
echo ""

echo "=== 4.9 Disk space on /var/lib/libvirt/images ==="
df -h /var/lib/libvirt/images 2>/dev/null
echo "  Need ≥ 4 × 8 GiB for openstack0-3 root + LXD container space; ≥ 4 × 512 GiB OSD qcow2 already allocated"

If any check above does not show [OK] (or the expected value), stop and investigate before continuing.


5. Add the Juju model

juju add-model openstack
juju model-config -m openstack | head -20

Expected: model openstack created on the current controller. juju models should now show it.

Optional model-config tweaks (only if you have a reason):

  • default-base: ubuntu@22.04/stable — already in the bundle's top-level config; model-level override not needed.
  • transmit-vendor-metrics=false — privacy posture; testcloud doesn't need to phone home. Optional.

For this v1 cycle, leave model-config at defaults. Tweaks are easier to debug when only one is changed at a time.


6. Deploy command

cd "$REPO"

# Confirm working dir and overlay
pwd
ls -la bundle.yaml overlays/octavia-pki.yaml
echo ""

# Deploy — this returns in a few seconds; actual provisioning runs in background
juju deploy ./bundle.yaml --overlay overlays/octavia-pki.yaml -m openstack

Expected output: a long list of deploy actions ("Deploying ...", "Resolving ...", "Located bundle ..."). Then control returns to the prompt.

If the deploy command itself errors (YAML syntax, charm-not-found, etc.), stop here. The bundle has not started provisioning yet — fix and rerun.


7. Settle watch

Provisioning takes 60-90 minutes typical for this testcloud size on this jumphost. The bundle requests MAAS-deploy of 4 hosts + creation of ~25 LXD containers + 30+ charm installs + relation establishment.

In a dedicated terminal (don't share with the destroy / deploy terminal — interaction can interrupt screen redraws):

juju status --color --watch 30s -m openstack

Refreshes every 30 seconds. Ctrl+C to exit.

7.2 Expected progression

Rough timeline (your mileage may vary):

Elapsed What to expect
0-3 min MAAS commissioning starts on openstack0-3 (boot, fingerprint, partition)
3-10 min Ubuntu install on openstack0-3 via MAAS preseed
10-15 min Hosts in Juju show pendingstarted. LXD service comes up on each
15-30 min LXD containers being created; subordinate charms (mysql-router, hacluster) attaching
30-50 min Charm config-changed hooks running; relations forming; databases bootstrapping
50-90 min Most charms reach blocked (waiting on Vault) or active/idle (no Vault dep)
90+ min Settle stabilizes at the pre-Vault end state

If progress visibly stalls for >15 minutes in the middle of the timeline, see §7.4.

7.3 Pre-Vault expected end state

When the model has settled (stops progressing for >5 minutes), this is what juju status should show:

blocked (waiting on Vault):

The following charms have :certificates relations to vault:certificates and CANNOT reach active/idle until Vault is initialized in v1-do-doc-05:

  • vault itself — status: Vault needs to be initialized (this is the trigger to run doc-05)
  • mysql-innodb-cluster (needs vault cert for inter-instance TLS)
  • keystone (Keystone API TLS)
  • glance (Glance API TLS)
  • nova-cloud-controller, placement, neutron-api, neutron-api-plugin-ovn, ovn-central, ovn-chassis, ovn-chassis-octavia
  • cinder, octavia, octavia-dashboard
  • barbican, barbican-vault
  • magnum, magnum-dashboard
  • glance-simplestreams-sync, openstack-dashboard, ceph-radosgw
  • octavia-diskimage-retrofit (subordinate of glance-simplestreams-sync)
  • All 11 *-hacluster subordinates indirectly (because their principal is blocked; Designate's hacluster removed per D-019)

active/idle (no Vault dependency):

  • rabbitmq-server
  • etcd (uses easyrsa for its OWN TLS, not Vault)
  • easyrsa
  • nova-compute (no :certificates relation directly; it gets ceph keys via ceph-mon)
  • ceph-mon, ceph-osd (Ceph cluster bootstraps independently)

Note on the chicken-and-egg:

Per D-006 (Vault HA backend), etcd's TLS is bootstrapped by easyrsa via the easyrsa:client ↔ etcd:certificates relation. This is what lets etcd come up active/idle BEFORE Vault is initialized. Then Vault uses etcd as its HA backend. Watch that easyrsa/0 and etcd/{0,1,2} reach active/idle within the first 30 minutes; if etcd stays blocked beyond that, easyrsa-related certs likely didn't flow.

7.4 Stalls and remediation

If progress stalls for >15 min in the middle of the timeline:

# Find which units are blocked or in error state
juju status -m openstack | grep -E "(blocked|error|maintenance)"

# For any unit in error state, get its log
juju show-status-log <unit-name> -m openstack
# E.g.: juju show-status-log keystone/0

# For deeper inspection
juju ssh <unit-name> -m openstack -- sudo tail -200 /var/log/juju/unit-<unit-name>.log

Common stalls:

  1. MAAS commissioning failure on a host → check MAAS UI; may need to re-commission the host
  2. LXD container creation failure → check lxc list on the host; container quotas, image availability
  3. Charm hook error → check unit log; often a transient cloud-init issue; juju resolved <unit> may help
  4. Relation never forms → both ends must declare correct endpoint names; cross-check bundle

Per D-018, do not pursue graceful recovery from major errors at this stage — full teardown via v1-do-doc-03 and redeploy is the canonical "reset" path.


8. Post-deploy PKI verification — explicit on-disk confirmation

This section addresses the operator-asked confirmation that the PKI overlay made it onto the Octavia unit's filesystem after the charm's config-changed hook completes.

Run this section after octavia/0 has progressed past pending/maintenance and reached at least a blocked state (i.e., the charm has run its install + config-changed hooks but is waiting on Vault for the API TLS cert). The lb-mgmt-* options are consumed by the config-changed hook regardless of Vault status — so on-disk material should be present even with octavia/0 in blocked.

8.1 Confirm the unit is past config-changed

echo "=== Octavia unit status ==="
juju status octavia -m openstack
# Expect: octavia/0 in 'blocked' (cert relation pending) or 'maintenance' (still configuring).
# If status is still 'pending' or 'allocating', wait and re-run this section.

8.2 Inspect on-disk PKI directory

echo "=== /etc/octavia/certs/ contents ==="
juju ssh octavia/0 -m openstack -- sudo ls -la /etc/octavia/certs/

Expected: 4-5 PEM files. The exact filenames depend on the charm revision; commonly:

  • server_ca.cert.pem — Issuing CA cert (consumed from lb-mgmt-issuing-cacert)
  • server_ca.key.pem — Issuing CA encrypted private key (consumed from lb-mgmt-issuing-ca-private-key)
  • client_ca.cert.pem — Controller CA cert (consumed from lb-mgmt-controller-cacert)
  • client.cert-and-key.pem — Controller cert + key bundle (consumed from lb-mgmt-controller-cert)

If the directory is empty or missing, the config-changed hook hasn't run yet or failed. Re-check unit status; see §7.4 remediation.

[unverified, flagging]: the exact filenames above are typical for recent charm-octavia revisions but may vary. The verification below uses fingerprint comparison (content-based), which is filename-agnostic — adapt the filenames in the loop if ls shows different ones.

8.3 Stage cert content from unit for comparison

mkdir -p "$HOME/pki-verify"
chmod 700 "$HOME/pki-verify"
cd "$HOME/pki-verify"

# Pull whatever PEM files are present in /etc/octavia/certs/
juju ssh octavia/0 -m openstack -- sudo ls /etc/octavia/certs/ 2>/dev/null | \
  while read -r f; do
    case "$f" in
      *.pem|*.crt)
        echo "Pulling $f ..."
        juju ssh octavia/0 -m openstack -- sudo cat "/etc/octavia/certs/$f" \
          > "$HOME/pki-verify/unit-$f"
        ;;
    esac
  done

ls -la "$HOME/pki-verify/"

8.4 Fingerprint comparison — CA certs

echo "=== Issuing CA fingerprint comparison ==="
# Find the issuing CA on the unit (charm naming: typically server_ca.cert.pem)
UNIT_FILE=$(ls "$HOME/pki-verify/" | grep -E "^unit-server.*ca.*\.cert\.pem$" | head -1)
if [ -z "$UNIT_FILE" ]; then
  echo "[WARN] no unit-server*ca*.cert.pem file found — list and adapt:"
  ls "$HOME/pki-verify/"
else
  UNIT_FP=$(openssl x509 -in "$HOME/pki-verify/$UNIT_FILE" -noout -fingerprint -sha256 2>/dev/null | cut -d= -f2)
  SRC_FP=$(openssl x509 -in "$HOME/octavia-pki/issuing-ca/issuing-ca.cert.pem" -noout -fingerprint -sha256 | cut -d= -f2)
  echo "Unit ($UNIT_FILE):    $UNIT_FP"
  echo "Jumphost (issuing-ca): $SRC_FP"
  if [ "$UNIT_FP" = "$SRC_FP" ]; then
    echo "[OK] Issuing CA cert on unit matches jumphost source"
  else
    echo "[FAIL] fingerprints DIFFER — investigate before continuing"
  fi
fi
echo ""

echo "=== Controller CA fingerprint comparison ==="
UNIT_FILE=$(ls "$HOME/pki-verify/" | grep -E "^unit-client.*ca.*\.cert\.pem$" | head -1)
if [ -z "$UNIT_FILE" ]; then
  echo "[WARN] no unit-client*ca*.cert.pem file found — list and adapt:"
  ls "$HOME/pki-verify/"
else
  UNIT_FP=$(openssl x509 -in "$HOME/pki-verify/$UNIT_FILE" -noout -fingerprint -sha256 2>/dev/null | cut -d= -f2)
  SRC_FP=$(openssl x509 -in "$HOME/octavia-pki/controller-ca/controller-ca.cert.pem" -noout -fingerprint -sha256 | cut -d= -f2)
  echo "Unit ($UNIT_FILE):       $UNIT_FP"
  echo "Jumphost (controller-ca): $SRC_FP"
  if [ "$UNIT_FP" = "$SRC_FP" ]; then
    echo "[OK] Controller CA cert on unit matches jumphost source"
  else
    echo "[FAIL] fingerprints DIFFER — investigate before continuing"
  fi
fi

8.5 Controller cert bundle verification

The lb-mgmt-controller-cert value contains BOTH the controller cert AND its key, concatenated. On the unit it lands as a single PEM bundle. Confirm:

  1. The cert in the bundle matches the cert we generated.
  2. The key in the bundle matches the cert (private key proves possession).
echo "=== Controller cert+key bundle verification ==="
UNIT_FILE=$(ls "$HOME/pki-verify/" | grep -E "^unit-client.*\.pem$" | grep -v "ca" | head -1)
# Common name: client.cert-and-key.pem
if [ -z "$UNIT_FILE" ]; then
  echo "[WARN] no controller cert bundle found on unit — list and adapt:"
  ls "$HOME/pki-verify/"
else
  echo "Found bundle: $UNIT_FILE"

  # Compare cert fingerprint
  UNIT_FP=$(openssl x509 -in "$HOME/pki-verify/$UNIT_FILE" -noout -fingerprint -sha256 2>/dev/null | cut -d= -f2)
  SRC_FP=$(openssl x509 -in "$HOME/octavia-pki/controller/controller.cert.pem" -noout -fingerprint -sha256 | cut -d= -f2)
  echo "Unit cert FP:    $UNIT_FP"
  echo "Source cert FP:  $SRC_FP"
  [ "$UNIT_FP" = "$SRC_FP" ] && echo "[OK] cert match" || echo "[FAIL] cert mismatch"

  # Confirm cert+key in bundle match each other (proof of possession)
  CERT_PUB=$(openssl x509 -in "$HOME/pki-verify/$UNIT_FILE" -noout -pubkey 2>/dev/null | openssl md5)
  KEY_PUB=$(openssl pkey -in "$HOME/pki-verify/$UNIT_FILE" -pubout 2>/dev/null | openssl md5)
  echo "Cert pubkey md5: $CERT_PUB"
  echo "Key pubkey md5:  $KEY_PUB"
  [ "$CERT_PUB" = "$KEY_PUB" ] && echo "[OK] cert and key in bundle are paired" || echo "[FAIL] cert and key DO NOT match"
fi

8.6 Issuing CA encrypted key + passphrase verification

The Issuing CA's encrypted key sits on the unit. Confirm the passphrase from octavia.conf can decrypt it. This is the test that proves the runtime amphora-signing path will work once Octavia comes up post-Vault.

echo "=== Passphrase round-trip test ==="

# Pull the passphrase from octavia.conf
UNIT_PASS=$(juju ssh octavia/0 -m openstack -- \
  sudo grep "^ca_private_key_passphrase" /etc/octavia/octavia.conf 2>/dev/null | head -1 | cut -d= -f2- | sed 's/^[[:space:]]*//' | sed 's/[[:space:]]*$//')

# Pull the encrypted key
UNIT_KEY=$(ls "$HOME/pki-verify/" | grep -E "^unit-server.*key.*\.pem$" | head -1)

if [ -z "$UNIT_PASS" ]; then
  echo "[WARN] passphrase line not found in octavia.conf — may not be present until vault-init"
elif [ -z "$UNIT_KEY" ]; then
  echo "[WARN] no unit-server*key*.pem found"
else
  # Try to decrypt the key using the passphrase from the unit
  if openssl pkey -in "$HOME/pki-verify/$UNIT_KEY" -passin "pass:$UNIT_PASS" -noout 2>/dev/null; then
    echo "[OK] passphrase in octavia.conf decrypts the on-disk Issuing CA key"
  else
    echo "[FAIL] passphrase did NOT decrypt the key — overlay value mismatch"
  fi

  # Also confirm passphrase matches what we generated on jumphost
  SRC_PASS=$(cat "$HOME/octavia-pki/issuing-ca/passphrase.txt")
  if [ "$UNIT_PASS" = "$SRC_PASS" ]; then
    echo "[OK] passphrase on unit matches jumphost source"
  else
    echo "[FAIL] passphrase on unit does NOT match jumphost source"
  fi
fi

# Clear the passphrase from shell
unset UNIT_PASS SRC_PASS

Note: the ca_private_key_passphrase line in octavia.conf may not appear until the cert relation completes. If §8.6 reports [WARN] passphrase line not found, that is expected at pre-Vault state — the charm may defer writing the full [certificates] section until the cert relation has flowed. Re-run §8.6 after v1-do-doc-05 completes.

8.7 Cleanup

# Optional: shred the temp copies of the cert material
shred -uvz "$HOME/pki-verify/"*.pem 2>/dev/null
rmdir "$HOME/pki-verify" 2>/dev/null

The unit retains the originals; the jumphost-side originals are at $HOME/octavia-pki/ (left in place per v1-do-doc-02 §13).


9. Acceptance criteria — go/no-go for v1-do-doc-05-vault-init

Before proceeding to Vault init:

  • §4 all pre-flight checks [OK]
  • §5 model openstack created
  • §6 deploy command completed without bundle-level errors
  • §7 model has settled (no progress for >5 min); pre-Vault end state matches §7.3
  • §7.3 critical infra active/idle: rabbitmq-server, etcd/{0,1,2}, easyrsa, ceph-mon/{0,1,2}, ceph-osd/{0,1,2,3}, nova-compute/{0,1,2,3}
  • §7.3 Vault is blocked: Vault needs to be initialized (the trigger for doc-05)
  • §8.2 PKI files present in /etc/octavia/certs/ on octavia/0
  • §8.4 Issuing CA fingerprint match [OK]
  • §8.4 Controller CA fingerprint match [OK]
  • §8.5 Controller cert bundle cert+key paired [OK]
  • §8.6 passphrase round-trip [OK] (or [WARN] not yet in conf — acceptable if pre-Vault)

If all checked, proceed to v1-do-doc-05-vault-init.md.


10. Roosevelt deltas (forward-look)

Aspect Testcloud (v1) Roosevelt
Bundle bundle.yaml (single file) Multi-environment overlay structure
Hosts 4 KVM VMs Bare-metal MAAS-managed servers
LXD container layout Dense (10+ on machine 8) More spread; possibly real units instead of LXD for some apps
Overlay set overlays/octavia-pki.yaml only Site overlay (machine assignments, NIC MACs) + Vault overlay + PKI overlay
Settle time 60-90 minutes Likely 2-4 hours (more hosts, real provisioning)
Octavia PKI source Operator-generated, overlay-distributed Vault PKI engine
Octavia PKI verification This §8 procedure Vault-side audit trail; no manual comparison needed

11. Change log

Date Change Reference
2026-05-27 Document created. Replaces runbooks/deprecated/02-deploy.md (placeholder). Adds explicit §8 on-disk PKI verification per operator request. Batch B drafting