Newer
Older
openstack-caracal-ipv4 / runbooks / phase-01-bundle-deploy.md

Phase 01 -- Bundle Deploy

Deploy the hardened bundle + the octavia-pki overlay onto the freshly-prepped MAAS machines, and verify it settles to the expected PRE-vault-init state (zero errors, vault awaiting init, the TLS consumers awaiting vault certs). Vault init is phase-02.

Decisions: B5 (IP-only), D-019 (no designate), D-020 (dual provider+metal VIPs), R14 (VIPs front-loaded to .50-.60), Section-G NIC bindings. Troubleshooting: appendix-A -- R14 (VIP relocation), R15 (.10 phantom resolver), L1 (no set -e on count-gate blocks), L3 (metal-side dual-VIP eyeball check), DOCFIX-016 (maas list leak).


Prerequisites (must be true entering phase-01)

  • phase-00 done: 4 machines Ready/power=off; MAAS carve applied (front-loaded VIP /26 reserved, FIP pool reserved, stale iprange gone); enp8s0 data NIC linked on ALL four hosts; OSD /dev/vdb wiped blank.
  • overlays/octavia-pki.yaml present (Step 1.0).
  • Hardened bundle.yaml in the working dir (channels pinned; VIPs .50-.60; reserved-host-memory 8192; image-conversion; use-policyd-override).

Constants and env-literals

  • MAAS system_ids: openstack0=4na83t, openstack1=qdbqd6, openstack2=h8frng, openstack3=tmsafc.
  • MAAS subnet ids: 1=provider 10.12.4.0/22, 2=metal 10.12.8.0/22, 6=data 10.12.12.0/22, 7=storage 10.12.16.0/22, 8=replication 10.12.20.0/22, 9=lbaas 10.12.32.0/22.
  • expected plan: 50 apps, 97 relations, 4 machines (bundle 8/9/10/11 -> juju 0/1/2/3), 24 LXD.

Run-location legend

  • # RUN: jumphost -- juju + maas admin (MAAS profile is admin; never maas list -- DOCFIX-016).

Step 1.0 -- Octavia PKI overlay (secret-handling prereq) DISCRETE

overlays/octavia-pki.yaml carries the 5 lb-mgmt-* PKI keys (controller CA/cert, issuing CA key+passphrase+cert). It is the ONLY overlay in the deploy command and is secret-safe + ASCII. PRIMARY path: reuse the existing validated overlay (the CAs are 10y, so it survives rebuilds). REGENERATION path (fresh CAs): run the discrete secret procedure inlined as "Step 1.0-GEN" at the end of this phase. Either way, confirm the overlay parses and contains exactly the 5 keys (sanity block below) before deploying.

# RUN: jumphost -- sanity only (does NOT print key material)
[ -f overlays/octavia-pki.yaml ] && grep -cE 'lb-mgmt-' overlays/octavia-pki.yaml   # expect 5 keys
LC_ALL=C grep -nP '[^\x00-\x7F]' overlays/octavia-pki.yaml && echo "NON-ASCII" || echo "ASCII clean"

Step 1.1 -- Pre-deploy verify (read-only; 4 checks)

# RUN: jumphost One consolidated read-only block. NO set -e (a guarded count of 0 is a valid answer, not a failure -- appendix-A: L1); count greps are || true.

( {
  echo "=== CHECK 1: bundle VIPs (quote-tolerant, octet-anchored) ==="
  grep -nE '^[[:space:]]+vip:' bundle.yaml
  TOT=$(grep -cE '^[[:space:]]+vip:[[:space:]]*"?10\.12\.4\.' bundle.yaml || true)
  HI=$(grep -cE '^[[:space:]]+vip:[[:space:]]*"?10\.12\.4\.(5[0-9]|60)("|$|[[:space:]])' bundle.yaml || true)
  LO=$(grep -cE '^[[:space:]]+vip:[[:space:]]*"?10\.12\.4\.(1[0-9]|20)("|$|[[:space:]])' bundle.yaml || true)
  echo "  provider VIPs total=$TOT  in .50-.60=$HI  in .10-.20(stale)=$LO   (want 11/11/0)"
  # metal side is the second token of each dual vip; eyeball that all 11 are .8.50-.60,
  # clear of metal infra .8.10(maas)/.8.20(lxd)/.8.21(capi)/.8.30(juju) -- appendix-A: L3.

  echo "=== CHECK 2: enp8s0 data NIC linked on ALL FOUR hosts (10.12.12.0/22) ==="
  for SID in 4na83t qdbqd6 h8frng tmsafc; do
    echo -n "  $SID: "
    maas admin interfaces read "$SID" | jq -r '.[] | select(.name=="enp8s0")
      | [.links[]? | select(.subnet.cidr=="10.12.12.0/22") | .ip_address] | join(",")'
  done   # expect 10.12.12.40 / .41 / .42 / .43 (select by .subnet.cidr -> robust to id drift)

  echo "=== CHECK 3: subnet DNS resolvers ==="
  for ID in 1 2 6 7 8 9; do maas admin subnet read "$ID" | jq -c '{id,cidr,dns_servers}'; done
  # expect subnet 1 (provider) -> [10.12.4.1]; 2/6/7/8/9 -> [10.12.8.1]

  echo "=== CHECK 4a: nodes Ready / power off ==="
  maas admin machines read | jq -r '.[] | select(.system_id|IN("4na83t","qdbqd6","h8frng","tmsafc"))
    | "\(.hostname) \(.status_name) power=\(.power_state)"'
} )
# CHECK 4b: OSD /dev/vdb blank (run on each host; sudo required -- appendix-A: R7)
for h in openstack0 openstack1 openstack2 openstack3; do
  echo "== $h =="
  ssh jessea123@$h "sudo qemu-img info /var/lib/libvirt/images/${h}-1.qcow2 | grep -E 'virtual size|disk size'" </dev/null
done   # expect virtual 512 GiB, disk ~KiB (sparse/blank)

GATE: VIPs 11/11/0; enp8s0 linked on all 4; subnet DNS as above; 4 nodes Ready; OSD blank.

Step 1.2 -- Dry-run (guarded)

# RUN: jumphost Refuse to add a model if openstack already exists; require the overlay.

( {
  juju models 2>&1 | tee /tmp/jmodels.txt
  if grep -qE '(^|[[:space:]]|/)openstack([[:space:]*]|$)' /tmp/jmodels.txt; then
    echo "ABORT: an 'openstack' model already exists (teardown is phase-00)"; 
  elif [ ! -f overlays/octavia-pki.yaml ]; then
    echo "ABORT: overlays/octavia-pki.yaml missing (Step 1.0)";
  else
    juju add-model openstack
    juju deploy ./bundle.yaml --overlay overlays/octavia-pki.yaml -m openstack --dry-run
  fi
} )

GATE (from the plan): 50 apps, 97 relations, 4 machines (8/9/10/11 -> 0/1/2/3), 24 LXD; ceph-osd/0-3 one per node; nova-compute/0-2 on machines 1/2/3 ONLY (machine 0 = OSD+LXD host, no compute); channels match the matrix; relations include octavia:certificates - vault:certificates, vault:shared-db - vault-mysql-router, mysql-innodb-cluster:certificates - vault:certificates; NO vault:ha, NO designate (D-019). Only the two benign R11 warnings (L34 name, L55 variables).

Step 1.3 -- Deploy (VIP-guarded)

# RUN: jumphost Re-run the VIP guard inline (the dry-run never echoes vip values), then deploy only if 11/11/0.

( {
  TOT=$(grep -cE '^[[:space:]]+vip:[[:space:]]*"?10\.12\.4\.' bundle.yaml || true)
  HI=$(grep -cE '^[[:space:]]+vip:[[:space:]]*"?10\.12\.4\.(5[0-9]|60)("|$|[[:space:]])' bundle.yaml || true)
  LO=$(grep -cE '^[[:space:]]+vip:[[:space:]]*"?10\.12\.4\.(1[0-9]|20)("|$|[[:space:]])' bundle.yaml || true)
  if [ "$TOT" = 11 ] && [ "$HI" = 11 ] && [ "$LO" = 0 ]; then
    juju deploy ./bundle.yaml --overlay overlays/octavia-pki.yaml -m openstack
  else
    echo "ABORT: VIP guard failed (total=$TOT hi=$HI lo=$LO; want 11/11/0)"
  fi
} )

Step 1.4 -- DNS gate during deploy (as machines come up)

# RUN: jumphost Run when machine 0 reaches started, then per LXD unit as they appear (flag BEFORE the target; logic inside the remote quotes; no outer 2>/dev/null):

juju ssh -m openstack 0 -- 'resolvectl status | grep -i "DNS Server"; getent hosts api.snapcraft.io && echo OK || echo FAIL'
# repeat for ceph-mon/0, mysql-innodb-cluster/0 as they appear

GATE: each returns OK (api.snapcraft.io resolves -> the snap install storm proceeds clean). FINDING (non-blocking, R15): the unreachable region resolver 10.12.8.10 (MAAS region/rack controller, advertised on the metal VLAN independent of the subnet field) may still appear in a node's resolver list -- resolution succeeds because systemd-resolved deprioritizes .10 and falls through to .1. Latent fragility if .1 ever drops; understand/eliminate for Roosevelt. (appendix-A: R15.)


EXIT GATE (phase-01 complete)

  • Deploy settled to the PRE-vault-init end state:
    • ZERO units in error.
    • mysql-innodb-cluster x3 ACTIVE ("Cluster is ONLINE").
    • vault/0 BLOCKED "Vault needs to be initialized" (the phase-02 trigger, not a fault).
    • Waiting on vault certs (expected pre-init): ovn-central x3, ovn-chassis x3 (incl nova-compute subordinates), ovn-chassis-octavia, neutron-api-plugin-ovn, barbican-vault.
    • octavia BLOCKED "Awaiting configure-resources" (D-021); gss unknown (pre-run).
  • Section-G NIC payoff confirmed (no subset/binding errors): ceph-mon -> storage 10.12.16.x; octavia -> data 10.12.12.1; nova-compute -> data 10.12.12.4x; vault -> metal 10.12.8.x.
  • Proceed to phase-02 (vault init).

As-built reference (2026-06-03 second redeploy -- audit trail)

  • juju deploy ./bundle.yaml --overlay overlays/octavia-pki.yaml -m openstack on maas/default (cred maas-api).
  • Plan: 50 apps / 97 relations / 4 machines / 24 LXD; placement as above.
  • Pre-deploy verify: VIPs 11/11/0; enp8s0 -> 10.12.12.40-43 (all 4); subnet DNS as above; nodes Ready; OSD blank.
  • Settled: zero errors; mysql /0 R/W (10.12.8.173), /1 (.179) /2 (.185) R/O; vault blocked needs-init.

Next

phase-02 -- vault bring-up.


Step 1.0-GEN -- Octavia management-PKI generation (regeneration path) DISCRETE / SECRET

Run ONLY if you are not reusing an existing overlays/octavia-pki.yaml. Produces the two-tier EC PKI for Charmed Octavia's amphora trust domain and writes the overlay. Decisions (Workstream 3a, 2026-05-22): fresh generation; EC P-384 CAs (SHA-384, 10y); EC P-256 controller cert (2y); overlay-file distribution (gitignored); artifacts under $HOME/octavia-pki/; passphrases = 32 random bytes base64 (44 chars). SECRET step -- do NOT echo key material; the only printed values are cert dates/subjects and verify OK.

The five octavia charm options the overlay sets:

  • lb-mgmt-issuing-cacert = base64(issuing CA cert)
  • lb-mgmt-issuing-ca-private-key = base64(issuing CA ENCRYPTED key)
  • lb-mgmt-issuing-ca-key-passphrase = the issuing CA passphrase (PLAIN string, NOT base64)
  • lb-mgmt-controller-cacert = base64(controller CA cert)
  • lb-mgmt-controller-cert = base64(controller cert + key, concatenated)

1.0-GEN.0 -- workspace (openssl 3.x; $HOME only -- snap home-confinement, never /tmp)

# RUN: jumphost
WORKDIR="$HOME/octavia-pki"
mkdir -p "$WORKDIR"/issuing-ca "$WORKDIR"/controller-ca "$WORKDIR"/controller
chmod 700 "$WORKDIR"
openssl version    # expect OpenSSL 3.x

1.0-GEN.a -- Issuing CA (EC P-384, AES-256 encrypted key, self-signed 10y)

( {
  WORKDIR="$HOME/octavia-pki"; cd "$WORKDIR/issuing-ca" || exit 1   # dir from 1.0-GEN.a
  openssl rand -base64 32 | tr -d '\n' > passphrase.txt
  chmod 600 passphrase.txt
  test "$(wc -c < passphrase.txt)" -eq 44 || { echo "ABORT: issuing passphrase length != 44"; exit 1; }
  openssl genpkey -algorithm EC -pkeyopt ec_paramgen_curve:P-384 \
    -aes-256-cbc -pass file:passphrase.txt -out issuing-ca.key.enc
  chmod 600 issuing-ca.key.enc
  openssl req -new -x509 -sha384 -key issuing-ca.key.enc -passin file:passphrase.txt \
    -days 3650 -subj "/CN=VR0 DC0 Omega Cloud Octavia Issuing CA/O=Neumatrix" \
    -out issuing-ca.cert.pem
  openssl x509 -in issuing-ca.cert.pem -noout -dates -subject
  openssl verify -CAfile issuing-ca.cert.pem issuing-ca.cert.pem    # expect: OK
} )

1.0-GEN.b -- Controller CA (EC P-384, AES-256 encrypted key, self-signed 10y; own passphrase)

The controller CA key is encrypted (its own passphrase) for future controller-cert rotation -- Octavia never receives this key, only the controller CA cert.

( {
  WORKDIR="$HOME/octavia-pki"; cd "$WORKDIR/controller-ca" || exit 1   # dir from 1.0-GEN.a
  openssl rand -base64 32 | tr -d '\n' > passphrase.txt
  chmod 600 passphrase.txt
  test "$(wc -c < passphrase.txt)" -eq 44 || { echo "ABORT: controller passphrase length != 44"; exit 1; }
  openssl genpkey -algorithm EC -pkeyopt ec_paramgen_curve:P-384 \
    -aes-256-cbc -pass file:passphrase.txt -out controller-ca.key.enc
  chmod 600 controller-ca.key.enc
  openssl req -new -x509 -sha384 -key controller-ca.key.enc -passin file:passphrase.txt \
    -days 3650 -subj "/CN=VR0 DC0 Omega Cloud Octavia Controller CA/O=Neumatrix" \
    -out controller-ca.cert.pem
  openssl x509 -in controller-ca.cert.pem -noout -dates -subject
  openssl verify -CAfile controller-ca.cert.pem controller-ca.cert.pem    # expect: OK
} )

1.0-GEN.c -- Controller cert (EC P-256 UNENCRYPTED, SAN, signed by Controller CA, 2y)

The P-256 key is unencrypted -- Octavia reads it at startup. SAN carries the controller FQDN, the octavia API FQDN, and the Octavia API VIP 10.12.4.233.

( {
  WORKDIR="$HOME/octavia-pki"; cd "$WORKDIR/controller" || exit 1   # dir from 1.0-GEN.a
  openssl genpkey -algorithm EC -pkeyopt ec_paramgen_curve:P-256 -out controller.key
  chmod 600 controller.key
  cat > controller.cnf <<'CNF'
[req]
distinguished_name = req_distinguished_name
req_extensions = v3_req
prompt = no

[req_distinguished_name]
CN = octavia-controller.omega.dc0.vr0.cloud.neumatrix.local
O = Neumatrix

[v3_req]
keyUsage = critical, digitalSignature, keyEncipherment
extendedKeyUsage = clientAuth, serverAuth
subjectAltName = @alt_names

[alt_names]
DNS.1 = octavia-controller.omega.dc0.vr0.cloud.neumatrix.local
DNS.2 = octavia.omega.dc0.vr0.cloud.neumatrix.local
IP.1 = 10.12.4.233
CNF
  openssl req -new -sha256 -key controller.key -config controller.cnf -out controller.csr
  openssl x509 -req -sha256 -in controller.csr \
    -CA ../controller-ca/controller-ca.cert.pem \
    -CAkey ../controller-ca/controller-ca.key.enc \
    -passin file:../controller-ca/passphrase.txt \
    -CAcreateserial -days 730 \
    -extfile controller.cnf -extensions v3_req \
    -out controller.cert.pem
  openssl verify -CAfile ../controller-ca/controller-ca.cert.pem controller.cert.pem  # expect: OK
  openssl x509 -in controller.cert.pem -noout -ext subjectAltName     # DNS x2 + IP present
  openssl x509 -in controller.cert.pem -noout -dates
  cat controller.cert.pem controller.key > controller.bundle.pem
  chmod 600 controller.bundle.pem
} )

1.0-GEN.d -- Write overlays/octavia-pki.yaml (base64 blobs + plaintext passphrase)

Four values are base64(PEM); the issuing-CA passphrase is a PLAIN string. The file is gitignored. Set $REPO to the jumphost clone (the dir holding bundle.yaml + overlays/).

( {
  WORKDIR="$HOME/octavia-pki"; cd "$WORKDIR" || exit 1   # dir from 1.0-GEN.a
  REPO="${REPO:-$HOME/openstack-caracal-ipv4}"   # adjust to the actual clone path
  mkdir -p "$REPO/overlays"
  ISS_CERT=$(base64 -w0 issuing-ca/issuing-ca.cert.pem)
  ISS_KEY=$(base64 -w0 issuing-ca/issuing-ca.key.enc)
  ISS_PASS=$(cat issuing-ca/passphrase.txt)
  CON_CACERT=$(base64 -w0 controller-ca/controller-ca.cert.pem)
  CON_CERT=$(base64 -w0 controller/controller.bundle.pem)
  cat > "$REPO/overlays/octavia-pki.yaml" <<OVL
applications:
  octavia:
    options:
      lb-mgmt-issuing-cacert: "$ISS_CERT"
      lb-mgmt-issuing-ca-private-key: "$ISS_KEY"
      lb-mgmt-issuing-ca-key-passphrase: "$ISS_PASS"
      lb-mgmt-controller-cacert: "$CON_CACERT"
      lb-mgmt-controller-cert: "$CON_CERT"
OVL
  chmod 600 "$REPO/overlays/octavia-pki.yaml"
  echo "wrote $REPO/overlays/octavia-pki.yaml"
} )

Then run the Step 1.0 sanity block (5 keys present; ASCII clean) before deploying. Keep $HOME/octavia-pki/ (the CA keys + passphrases) OFF the repo and backed up securely; the 10y CAs are reused across rebuilds -- regenerate only on key compromise or CA expiry.