Newer
Older
openstack-caracal-ipv4 / runbooks / v1-do-doc-05-vault-init.md

v1 Do-Document 05 — Vault Initialization & Cert-Relation Cascade

Status: Third execution document of Batch B. Manual three-step Vault bring-up plus regeneration of admin-openrc. Last document in Batch B.

Position in sequence: Runs after v1-do-doc-04-deploy.md (model settled at pre-Vault end state). Runs before v1-do-doc-06-magnum-domain.md (Batch C).

Cross-references:

  • D-006 (Vault HA backend — etcd + easyrsa)
  • D-009 (Hacluster modeling at testcloud scale)
  • D-011 §6 (validation: Vault unseal + auto-unseal-after-reboot pattern)
  • bundle.yaml Vault block (channel 1.8/stable, vip 10.12.4.236, vault-mysql-router subordinate)
  • OpenStack charm-deployment-guide Appendix C — Vault (initialise / unseal / authorise)

1. Purpose & scope

Initialize Vault, unseal it, authorize the charm, then watch the vault:certificates relation cascade flow certs to every API charm. The cascade unblocks roughly 20 charms that were blocked after doc-04.

The three manual Vault steps:

  1. Initialise — generate the master encryption key + unseal keys + root token. ONE-SHOT per Vault lifetime. The unseal keys are the disaster-recovery material.
  2. Unseal — provide 3-of-5 unseal keys to decrypt Vault's master key. Required after every Vault restart (including post-bundle-deploy and post-host-reboot).
  3. Authorize — give the Vault charm a Vault token so it can manage policies, app roles, secrets storage for OpenStack consumers.

What this document does:

  • Discovers Vault's address (HTTP at this point — pre-TLS)
  • Runs vault operator init and captures the 5 unseal keys + 1 root token
  • Runs vault operator unseal 3 times (with 3 different keys)
  • Runs juju run vault/leader authorize-charm token=... to graft the charm
  • Watches the cert-relation cascade settle
  • Regenerates $HOME/admin-openrc against the new Keystone
  • Smokes the post-Vault state: every charm active/idle

What this document does NOT do:

  • Set up auto-unseal (Vault's transit-engine-based auto-unseal pattern) — out of scope for v1; manual unseal after host reboot is acceptable. Roosevelt may revisit.
  • Set up Vault PKI engine for tenant-side use — out of scope for v1.
  • Provision tenant resources or DNS zones (Batch D)
  • Magnum domain or CAPI work (Batch C)

Out of scope security note: the unseal keys captured in §3 are the disaster-recovery material. Per the Caracal_Rebuild handoff, the prior cycle's keys are accepted lost. The keys generated HERE need a secure off-host home — operator decision (admin workstation encrypted vault, password manager attachment, dedicated secrets store). For Roosevelt this becomes a real key-management procedure.


2. Decisions captured

Decision Choice Notes
Key shares / threshold 5 keys, threshold 3 Standard Shamir's-secret-sharing posture; allows quorum-of-3 unseal
Vault address scheme HTTP via unit IP for init/unseal/authorize; HTTPS via VIP thereafter Vault has no TLS until authorize-charm flips it on
Authorize-charm pattern Direct token=<root-token> parameter (channel 1.8/stable convention) Newer revisions may require token-secret-id= via Juju secret; verify with juju show-action vault authorize-charm first
Admin-openrc location $HOME/admin-openrc Same path as prior cloud; overwritten
Admin domain/project Charmed-Keystone defaults: user=admin, user-domain=admin_domain, project=admin_domain [unverified, flagging] for the project — older charm versions used admin for project; verify by openstack token issue
Unseal key storage Operator decision — secure off-host This document warns; doesn't dictate the where

3. Prerequisites

Prereq Verification
v1-do-doc-04-deploy.md ✓ (model settled, pre-Vault end state confirmed) Manual; re-check via §4.1 below
Octavia PKI on-disk verification [OK] (doc-04 §8) Manual
vault CLI installed on jumphost command -v vault && vault --version (any 1.7+ works for client)
Juju controller still reachable juju controllers

Shell context — paste once:

export REPO="$HOME/openstack-caracal-ipv4"
cd "$REPO"
echo "REPO=$REPO"

Install vault CLI if missing (using the Hashicorp APT repo; one-time per jumphost):

if ! command -v vault >/dev/null 2>&1; then
  echo "vault CLI not present. Install via Hashicorp APT repo:"
  echo "  wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg"
  echo "  echo \"deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com \$(lsb_release -cs) main\" | sudo tee /etc/apt/sources.list.d/hashicorp.list"
  echo "  sudo apt update && sudo apt install vault"
  echo "  (After install, you may need 'sudo setcap cap_ipc_lock= /usr/bin/vault' on hosts without IPC_LOCK capability.)"
else
  vault --version
fi

4. Pre-flight: confirm pre-Vault state

echo "=== 4.1 Vault status (expect: blocked, 'Vault needs to be initialized') ==="
juju status vault -m openstack

echo ""
echo "=== 4.2 Other charms expected blocked on vault (sample) ==="
juju status -m openstack keystone glance neutron-api octavia magnum 2>/dev/null | grep -E "(blocked|active|maintenance)"

echo ""
echo "=== 4.3 Critical infra expected active/idle ==="
juju status -m openstack rabbitmq-server etcd easyrsa ceph-mon ceph-osd nova-compute 2>/dev/null | grep -E "(active|blocked|error)"

If vault/0 is not in blocked: Vault needs to be initialized, stop. Either Vault hasn't reached config-changed yet (re-run §4 after a few minutes), or it's in a different blocked state that needs investigation.

If etcd is NOT active/idle (e.g., still maintenance or blocked), Vault cannot use it as a backend. Investigate easyrsa→etcd cert flow before continuing.


5. Discover Vault's address & set environment

Vault has no TLS yet. We connect via HTTP on the unit's port 8200.

# Get the unit's IP (NOT the VIP — VIP is hacluster-managed and only active after Vault is up)
VAULT_UNIT_IP=$(juju show-unit vault/0 -m openstack 2>/dev/null | grep "public-address:" | head -1 | awk '{print $2}')

if [ -z "$VAULT_UNIT_IP" ]; then
  echo "[FAIL] could not resolve vault/0 public-address. Check 'juju show-unit vault/0'"
else
  echo "vault/0 public-address: $VAULT_UNIT_IP"
fi

# Set VAULT_ADDR for the vault CLI
export VAULT_ADDR="http://${VAULT_UNIT_IP}:8200"
echo "VAULT_ADDR=$VAULT_ADDR"

# Confirm reachable
vault status 2>&1 | head -20
# Expected output:
#   Sealed: true
#   Initialized: false
#   ... (or similar — exit code 2 is expected when uninitialized)

Note on VAULT_ADDR scheme: HTTP at this stage. After authorize-charm, Vault enables HTTPS using its own internal CA cert. From that point onward, VAULT_ADDR=https://vault.omega.dc0.vr0.cloud.neumatrix.local:8200 (or https://10.12.4.236:8200) is the right address, but vault CLI will need the Vault CA root cert via VAULT_CACERT or -tls-skip-verify. For this document, we only use the HTTP address — once authorize-charm completes, the charm handles all subsequent Vault interactions internally.


6. Initialise Vault (one-shot per Vault lifetime)

WARNING: the output of this command contains the unseal keys and the root token. If lost, Vault is unrecoverable — there is no "forgot password" path. If exposed, an attacker with the unseal keys can decrypt everything Vault holds.

Capture the output to a file in $HOME (filesystem-encrypted assumed; if not, work on a tmpfs):

mkdir -p "$HOME/vault-init"
chmod 700 "$HOME/vault-init"

# Init with 5 key shares, threshold 3
vault operator init -key-shares=5 -key-threshold=3 \
  > "$HOME/vault-init/init-output-$(date +%Y%m%d-%H%M%S).txt"

# Permissions: tighten immediately
chmod 600 "$HOME/vault-init/"*.txt

# Display the output
INIT_FILE=$(ls -t "$HOME/vault-init/"*.txt | head -1)
echo "Init output captured to: $INIT_FILE"
cat "$INIT_FILE"

Expected output format:

Unseal Key 1: <44-char base64>
Unseal Key 2: <44-char base64>
Unseal Key 3: <44-char base64>
Unseal Key 4: <44-char base64>
Unseal Key 5: <44-char base64>

Initial Root Token: hvs.<long-token>

Vault initialized with 5 key shares and a key threshold of 3. ...

Immediately:

  1. Copy this output (the entire file) to your secure off-host store (admin workstation encrypted drive, password manager, secrets vault).
  2. Verify you have it stored AND retrievable before proceeding to §7.

The unseal keys are needed every time Vault restarts (including the deploy unit reboot). The root token is needed for authorize-charm in §8 and (potentially) for future Vault admin operations.

Re-running init is destructive. If something goes wrong here and you decide to wipe Vault, the procedure is: juju run vault/leader reissue-certificates (does NOT re-init); or worst case, destroy + redeploy Vault (which discards encrypted state — anything stored in Vault is lost).


7. Unseal Vault (3 of 5)

Provide three different unseal keys. Vault decrypts its master key progressively; after the third key, Sealed: false.

# Extract keys to shell variables (do NOT print them all together)
INIT_FILE=$(ls -t "$HOME/vault-init/"*.txt | head -1)

# Unseal step 1 — paste Key 1 when prompted (interactive; -prompts safer than passing on CLI)
vault operator unseal

# Unseal step 2 — paste Key 2
vault operator unseal

# Unseal step 3 — paste Key 3
vault operator unseal

After the third unseal, output should show:

Key                    Value
---                    -----
Seal Type              shamir
Initialized            true
Sealed                 false   <-- this is the win condition
Total Shares           5
Threshold              3
Version                1.8.x
Cluster Name           vault-cluster-XXXX
...

Verify:

vault status
# Expect: Sealed: false, Initialized: true, HA Enabled: true (since etcd backend)

8. Authorize the charm

Vault is now unsealed. The charm needs a token to create its own policies and app roles for managing OpenStack-consumer secrets and certs.

8.1 Verify the action signature

The authorize-charm action signature has shifted across vault charm revisions. Check first:

juju show-action authorize-charm --application vault -m openstack 2>/dev/null || \
  juju actions vault -m openstack | grep authorize

Look for one of these patterns:

  • Direct-token (older revisions, expected on 1.8/stable): parameter is token=<root-token>
  • Juju-secret (newer revisions): parameter is token-secret-id=<juju-secret-id>; the token must be in a Juju secret first

For channel 1.8/stable (what the bundle pins), the direct-token pattern is expected. If juju show-action indicates the secret-based pattern instead, use §8.3.

8.2 Direct-token authorize (expected path)

# Extract the root token
INIT_FILE=$(ls -t "$HOME/vault-init/"*.txt | head -1)
ROOT_TOKEN=$(grep "^Initial Root Token:" "$INIT_FILE" | awk '{print $NF}')

if [ -z "$ROOT_TOKEN" ]; then
  echo "[FAIL] could not extract root token from $INIT_FILE"
else
  echo "Root token captured (length: ${#ROOT_TOKEN})"
fi

# Run the action
juju run vault/leader authorize-charm token="$ROOT_TOKEN" -m openstack

# Clear from shell
unset ROOT_TOKEN

8.3 Juju-secret authorize (fallback if §8.1 shows the secret-based signature)

INIT_FILE=$(ls -t "$HOME/vault-init/"*.txt | head -1)
ROOT_TOKEN=$(grep "^Initial Root Token:" "$INIT_FILE" | awk '{print $NF}')

# Create a Juju secret containing the token
SECRET_ID=$(juju add-secret vault-root-token token="$ROOT_TOKEN" -m openstack | grep -oE "secret:[a-z0-9]+")
echo "Secret created: $SECRET_ID"
unset ROOT_TOKEN

# Grant the secret to the vault application
juju grant-secret "$SECRET_ID" vault -m openstack

# Run the action with the secret-id parameter (parameter name may vary; check §8.1)
juju run vault/leader authorize-charm token-secret-id="$SECRET_ID" -m openstack

# After authorize completes successfully, the secret can be removed
juju remove-secret "$SECRET_ID" -m openstack

8.4 Verify authorize succeeded

echo "=== Vault status after authorize-charm ==="
juju status vault -m openstack
# Expect: vault/0 transitions out of 'blocked' to maintenance, then active/idle within 1-2 min

If vault/0 stays blocked after authorize-charm, check the unit log:

juju ssh vault/0 -m openstack -- sudo tail -100 /var/log/juju/unit-vault-0.log

Common failures: invalid token format; token already revoked; charm trying to write to a path the token can't access.


9. Watch the cert-relation cascade

After Vault is active/idle, the vault:certificates relation flows certs to ~20 charms. They progress from blockedmaintenance (writing certs, restarting services) → active/idle.

Expected duration: 15-30 minutes for the full cascade to settle.

juju status --color --watch 30s -m openstack

9.1 Expected progression

Tier Charms unblocked Approximate time after authorize-charm
Tier 1 (direct certs) mysql-innodb-cluster, ovn-central, keystone, glance, neutron-api, cinder 2-8 min
Tier 2 (waited on Tier 1) nova-cloud-controller, placement, octavia, barbican, designate, magnum, openstack-dashboard 8-15 min
Tier 3 (subordinates + plugins) *-mysql-router, *-hacluster, neutron-api-plugin-ovn, ovn-chassis, ovn-chassis-octavia, barbican-vault, octavia-dashboard, magnum-dashboard 15-25 min
Tier 4 (downstream) glance-simplestreams-sync, octavia-diskimage-retrofit, designate-bind, ceph-radosgw 20-30 min

9.2 Final post-Vault end state

When settled, every unit should be active/idle. Verify:

echo "=== Any unit not in active/idle? ==="
juju status -m openstack --format=yaml \
  | python3 -c "
import yaml, sys
d = yaml.safe_load(sys.stdin)
apps = d.get('applications', {})
issues = []

def check_unit(uname, udata):
    ws = udata.get('workload-status', {}).get('current', '')
    js = udata.get('juju-status', {}).get('current', '')
    msg = udata.get('workload-status', {}).get('message', '')
    if ws != 'active' or js != 'idle':
        issues.append(f'{uname}: workload={ws}, juju={js}, msg={msg}')

for app, info in apps.items():
    units = info.get('units', {}) or {}
    for uname, udata in units.items():
        check_unit(uname, udata)
        # Walk subordinates too (hacluster, mysql-router, etc.)
        subs = udata.get('subordinates', {}) or {}
        for sname, sdata in subs.items():
            check_unit(sname, sdata)

print(f'Non-active/idle units: {len(issues)}')
for i in issues:
    print(f'  {i}')
"

Expected output: Non-active/idle units: 0. Anything else needs investigation before the openrc-regeneration step.


10. Regenerate admin-openrc

Once Keystone is active/idle and Vault has issued its TLS cert, the new admin-openrc points at the new Caracal cloud.

10.1 Pull the admin password from the keystone charm

juju run keystone/leader get-admin-password -m openstack
# Output is YAML — operator extracts the 'admin-password' value manually OR via jq below

For scripted extraction:

ADMIN_PASS=$(juju run keystone/leader get-admin-password -m openstack --format json 2>/dev/null \
  | python3 -c "
import json, sys
d = json.load(sys.stdin)
# Action result format varies; try common shapes
for k, v in d.items():
    if isinstance(v, dict):
        r = v.get('results', {})
        for key in ('admin-password', 'Stdout', 'password'):
            if key in r:
                print(r[key].strip())
                exit(0)
print('', end='')
")

if [ -z "$ADMIN_PASS" ]; then
  echo "[FAIL] could not extract admin password from action output. Run 'juju run keystone/leader get-admin-password' manually."
else
  echo "[OK] admin password captured (length: ${#ADMIN_PASS})"
fi

10.2 Pull the Vault CA root for openrc trust

Keystone's TLS cert is signed by Vault's internal CA. To validate that cert from the openstack CLI, we need the Vault CA root.

juju run vault/leader get-root-ca -m openstack > "$HOME/vault-init/vault-ca-root.pem"
# Strip any YAML wrapping if present (the action returns the cert inline in YAML)
# Inspect:
head -5 "$HOME/vault-init/vault-ca-root.pem"

If the output is wrapped (e.g., starts with Running operation ... or unit-vault-0:), extract just the PEM block. Common pattern:

# If the action output wraps the cert, extract just the BEGIN/END CERTIFICATE block
python3 -c "
import re
with open('$HOME/vault-init/vault-ca-root.pem') as f:
    content = f.read()
m = re.search(r'-----BEGIN CERTIFICATE-----.*?-----END CERTIFICATE-----', content, re.DOTALL)
if m:
    print(m.group(0))
else:
    print('NO_CERT_FOUND')
" > "$HOME/vault-init/vault-ca-root-clean.pem"

mv "$HOME/vault-init/vault-ca-root-clean.pem" "$HOME/vault-init/vault-ca-root.pem"
openssl x509 -in "$HOME/vault-init/vault-ca-root.pem" -noout -subject -dates
# Expect: a valid cert with the Vault-charm-generated subject

10.3 Write the new admin-openrc

# Move any existing admin-openrc out of the way (the prior cycle's pointed at the destroyed cloud)
if [ -f "$HOME/admin-openrc" ]; then
  mv "$HOME/admin-openrc" "$HOME/admin-openrc.pre-caracal-$(date +%Y%m%d-%H%M%S)"
fi

cat > "$HOME/admin-openrc" <<EOF
# Caracal admin openrc — VR0 DC0 Omega Cloud (v1)
# Generated: $(date -u +%Y-%m-%dT%H:%M:%SZ) UTC
# Source: v1-do-doc-05-vault-init §10
export OS_AUTH_URL=https://keystone.omega.dc0.vr0.cloud.neumatrix.local:5000/v3
export OS_USERNAME=admin
export OS_PASSWORD='$ADMIN_PASS'
export OS_PROJECT_NAME=admin_domain
export OS_USER_DOMAIN_NAME=admin_domain
export OS_PROJECT_DOMAIN_NAME=admin_domain
export OS_IDENTITY_API_VERSION=3
export OS_REGION_NAME=RegionOne
export OS_CACERT=$HOME/vault-init/vault-ca-root.pem
EOF

chmod 600 "$HOME/admin-openrc"
unset ADMIN_PASS

echo "Wrote $HOME/admin-openrc — verify by sourcing and running 'openstack token issue'"

[unverified, flagging] OS_PROJECT_NAME default: Charmed-Keystone's admin user lives in admin_domain. The default admin project name has varied across charm revisions — common values are admin_domain (matching the domain) or admin. If the first openstack token issue (§10.4) fails with a project-not-found error, try OS_PROJECT_NAME=admin instead.

10.4 Verify

( source "$HOME/admin-openrc"; \
  echo "Testing auth against $OS_AUTH_URL ..."; \
  openstack token issue 2>&1 | head -20 )

Expected: a token dump (id, expires, project_id, user_id). If you get ProjectNotFoundException, see the flagging note above and try OS_PROJECT_NAME=admin in the openrc.

If you get a TLS error (certificate verify failed), the OS_CACERT path is wrong or the cert extraction in §10.2 didn't produce a clean cert.


11. /etc/hosts sanity (jumphost-side)

The openrc uses FQDN keystone.omega.dc0.vr0.cloud.neumatrix.local. That hostname must resolve from the jumphost to the Keystone VIP (10.12.4.229) for openrc to work pre-Designate.

echo "=== Jumphost /etc/hosts has the API VIPs ==="
grep -E "10\.12\.4\.(22[4-9]|23[0-6])" /etc/hosts || echo "[WARN] no API VIP hosts found in /etc/hosts"

If absent, add a block:

sudo tee -a /etc/hosts > /dev/null <<EOF

# Caracal v1 API VIPs — v1-do-doc-05 §11
# These are temporary until Designate zones are populated (v1-do-doc-10).
10.12.4.224 barbican.omega.dc0.vr0.cloud.neumatrix.local
10.12.4.226 cinder.omega.dc0.vr0.cloud.neumatrix.local
10.12.4.227 designate.omega.dc0.vr0.cloud.neumatrix.local
10.12.4.228 glance.omega.dc0.vr0.cloud.neumatrix.local
10.12.4.229 keystone.omega.dc0.vr0.cloud.neumatrix.local
10.12.4.230 magnum.omega.dc0.vr0.cloud.neumatrix.local
10.12.4.231 neutron.omega.dc0.vr0.cloud.neumatrix.local
10.12.4.232 nova.omega.dc0.vr0.cloud.neumatrix.local
10.12.4.233 octavia.omega.dc0.vr0.cloud.neumatrix.local
10.12.4.234 horizon.omega.dc0.vr0.cloud.neumatrix.local
10.12.4.235 placement.omega.dc0.vr0.cloud.neumatrix.local
10.12.4.236 vault.omega.dc0.vr0.cloud.neumatrix.local
EOF

# Verify
grep "omega.dc0.vr0" /etc/hosts | wc -l
# Expect: 12

This is a bootstrap measure per D-008. Tenant resolution uses Designate (set up in Batch D).


Now that Octavia is active/idle and the LBaaS Mgmt PKI chain is fully wired, the functional Octavia smoketest documented in v1-do-doc-02-pki.md §14 is ready to run. That section requires Glance to have the amphora image AND the LBaaS Mgmt network to be wired.

Glance amphora image status check:

( source "$HOME/admin-openrc"; \
  openstack image list --status active | grep -i amphora )
# Expect: at least one row with name containing 'amphora'. May take 15-30 min after Octavia
# active/idle for glance-simplestreams-sync + octavia-diskimage-retrofit to populate.

If the amphora image isn't present yet, defer the §14 smoketest until it appears. The pipeline:

  • glance-simplestreams-sync pulls upstream cloud images into Glance
  • octavia-diskimage-retrofit builds the amphora image from one of those base images, tags it as octavia-amphora, and pushes to Glance

Both are charms with active relations; they run their pipelines automatically after Keystone is up. Just give them time.

Octavia smoketest — execute v1-do-doc-02-pki.md §14 once the amphora image is present.


13. Acceptance criteria — go/no-go for v1-do-doc-06 (Batch C entry)

Before proceeding to Batch C:

  • §6 Vault init output captured AND verified stored in secure off-host location
  • §7 Vault unsealed; vault status shows Sealed: false, Initialized: true
  • §8 authorize-charm action completed; vault/0 reaches active/idle
  • §9 All charms active/idle; Python check returns "Non-active/idle units: 0"
  • §10 admin-openrc regenerated; openstack token issue succeeds
  • §11 /etc/hosts has the 12 API VIP entries
  • (Recommended) §12 amphora image present in Glance, and v1-do-doc-02-pki.md §14 smoketest passes

If all checked, proceed to v1-do-doc-06-magnum-domain.md (Batch C).


14. Roosevelt deltas (forward-look)

Aspect Testcloud (v1) Roosevelt
Vault topology num_units=1, hacluster decorative num_units=3, hacluster active, etcd quorum operative
Unseal procedure Manual, operator types 3 keys Auto-unseal via transit engine OR HSM-backed seal
Unseal key storage Operator-decided off-host Formal key-escrow procedure
Auto-unseal on reboot No (host reboot → vault stays sealed → operator must re-unseal) Yes (transit engine or HSM)
admin password rotation Manual (juju config keystone admin-password) Vault-managed rotation
/etc/hosts bootstrap Manual (this §11) Bastion-pre-populated; or DNS via local resolver pointed at Designate-on-management
TLS trust distribution Manual VAULT_CACERT export Bastion preloaded with Vault root

15. Change log

Date Change Reference
2026-05-27 Document created. Replaces runbooks/deprecated/03-vault-init.md (placeholder). Covers Vault init/unseal/authorize, cert-cascade watch, admin-openrc regeneration, /etc/hosts bootstrap. Flags channel-revision uncertainty on the authorize-charm action signature and the Charmed-Keystone admin project name. Batch B drafting