Newer
Older
openstack-caracal-ipv4 / runbooks / v1-do-doc-07-capi-bootstrap.md

v1 Do-Document 07 — CAPI Bootstrap Cluster + Workload Pivot

Status: Second execution document of Batch C. Stands up the CAPI bootstrap cluster on capi-mgmt.maas, creates the workload cluster on the cloud, pivots cluster state into the workload via clusterctl move, and stages the workload kubeconfig for v1-do-doc-08.

Position in sequence: Runs after v1-do-doc-06-magnum-domain.md (Magnum domain setup). Runs before v1-do-doc-08-magnum-driver.md (driver graft consumes the workload kubeconfig produced here).

Replaces: runbooks/04a-capi-bootstrap-cluster.md — same substantive procedure with fixes applied. The old runbook moves to runbooks/deprecated/ as part of this batch's commits.

Fixes applied vs the prior runbook (runbooks/04a-capi-bootstrap-cluster.md):

  • $REPO corrected from $HOME/repos/openstack-caracal-ipv4 to $HOME/openstack-caracal-ipv4
  • $VAULT_CA corrected from $HOME/vault-pki/root-ca.pem to $HOME/vault-init/vault-ca-root.pem (matches v1-do-doc-05 §10.2 output)
  • $MAAS_PROFILE now explicitly set in §3 shell context (prior version referenced it without setting it)
  • §4 adds KUBERNETES_VERSION to dynamic pin discovery (was hardcoded v1.31.4 in §13)
  • §5 MAAS deploy poll's exit 1 on Failed-deployment converted to non-exiting [FAIL] report
  • §11 noble-amd64-missing branch's exit 1 converted to non-exiting [FAIL] report
  • Cross-references updated: "runbook 02" → v1-do-doc-04; "runbook 03" → v1-do-doc-05; "runbook 04" → v1-do-doc-06; "runbook 05" → v1-do-doc-08

Cross-references:

  • D-017 (CAPI bootstrap cluster lifecycle — full rebuild every cycle)
  • D-007 (Magnum two-layer install — this is Layer B preparation)
  • D-002 (channel matrix — informs Vault CA chain)
  • Workstream 3b decision (2026-05-22): ship Vault CA (no tls-insecure); pivot mandatory

1. Purpose & scope

Stand up the CAPI bootstrap cluster on capi-mgmt.maas and pivot cluster state into a self-managing workload cluster. Output:

  1. Workload K8s cluster (capi-mgmt-cluster) running in tenant VMs on the cloud, self-managing post-pivot.
  2. Workload kubeconfig copied to jumphost at a known path. Consumed by v1-do-doc-08-magnum-driver.md for the Magnum CAPI Helm driver graft.
  3. No remaining state on the bootstrap k3s VM after pivot. capi-mgmt becomes a disposable jump host.

D-017 posture: L3 full teardown and rebuild every deployment cycle. Nothing is preserved across cycles. capi-mgmt is wiped to MAAS Ready on teardown; rebuilt from scratch by this runbook.

Scope: v1 testcloud. Roosevelt deltas in §20.

Out of scope:

  • Magnum-side configuration (v1-do-doc-08)
  • Workload cluster's tenant lifecycle (Magnum's job, not this runbook's)
  • Backup / DR for the workload cluster (Roosevelt concern)

2. Decisions captured

Per workstream 3b sign-off (2026-05-22):

Decision Choice Roosevelt parallel
Version pinning Pin-at-execution with discovery in §4 Same pattern; pins captured in deploy record
Cloud TLS trust Ship Vault CA to capi-mgmt + workload nodes (no tls-insecure) Image-baked CA; CK8sConfig redundancy
clusterctl move pivot Mandatory; workload cluster becomes self-managing Same
K8s flavor Canonical Kubernetes (CK8s) Same
OpenStack auth v3applicationcredential Same
Pod CIDR 10.244.0.0/16 Same (no conflict with cloud 10.12.0.0/16 or tenant pool 10.20.0.0/16)
Service CIDR 10.96.0.0/12 Same
Workload cluster name capi-mgmt-cluster Same
Workload node SSH user ubuntu (MAAS/cloud-init convention) Same

Naming convention:

  • Keystone project for CAPI: capi-mgmt (in admin_domain)
  • Keystone user for CAPI: capo (CAPO operator)
  • App credential: capo-app-cred
  • Workload image (Glance): noble-amd64 (do NOT duplicate as ubuntu-24.04-capi — Bobcat lesson)
  • Workload flavor: capi-mgmt-node (4 vCPU / 4 GiB / 30 GB) — control plane node sizing

3. Prerequisites

Prereq Verification
Cloud deployed; all charms active/idle per D-011 `juju status --color grep -v "active.*idle"` returns only the header
Vault initialized + unsealed (v1-do-doc-05) juju ssh vault/leader -- sudo vault status shows Sealed=false
Vault root CA available on jumphost test -f $HOME/vault-init/vault-ca-root.pem && openssl x509 -in $HOME/vault-init/vault-ca-root.pem -noout -subject
Keystone reachable via FQDN `curl -sf --cacert $HOME/vault-init/vault-ca-root.pem https://keystone.omega.dc0.vr0.cloud.neumatrix.local:5000/v3 jq .version.idreturns"v3.14"` or current
Magnum domain set up (v1-do-doc-06) ( source $HOME/admin-openrc; openstack domain show magnum -f value -c enabled ) returns True
capi-mgmt VM exists in MAAS as Ready `maas $MAAS_PROFILE machines read jq '.[] select(.hostname=="capi-mgmt") .status_name'returns"Ready"`
Admin openrc available `test -f $HOME/admin-openrc && ( source $HOME/admin-openrc && openstack token issue head -3 )`
Workspace path under $HOME (snap confinement) WORK=$HOME/capi-bootstrap; mkdir -p "$WORK"; cd "$WORK"; pwd shows under home

Set shell context for the runbook:

export REPO="$HOME/openstack-caracal-ipv4"
export WORK="$HOME/capi-bootstrap"
export VAULT_CA="$HOME/vault-init/vault-ca-root.pem"
export CAPI_MGMT_METAL_IP=10.12.8.21
export CAPI_MGMT_PROVIDER_IP=10.12.4.21
export CLUSTER_NAME=capi-mgmt-cluster
export MAAS_PROFILE=$(maas list 2>/dev/null | awk 'NR==1 {print $1}')

mkdir -p "$WORK"
cd "$WORK"

# Sanity-check setup
echo "REPO=$REPO"
echo "WORK=$WORK"
echo "VAULT_CA=$VAULT_CA"
echo "MAAS_PROFILE=$MAAS_PROFILE"
test -f "$VAULT_CA" && echo "[OK] Vault CA present" || echo "[FAIL] Vault CA missing"
test -n "$MAAS_PROFILE" && echo "[OK] MAAS_PROFILE set" || echo "[FAIL] MAAS_PROFILE empty — run 'maas login' first"

4. Version discovery (set pins)

Pin-at-execution with discovery procedure documented inline so each rebuild's pins are reproducible AND traceable.

GitHub API: authenticated vs unauthenticated. Unauth has 60 req/hr; authenticated has 5000. For multiple rebuilds in a day, set a token:

# Optional but recommended — avoids rate-limit headaches during rebuild
export GITHUB_TOKEN=<your-PAT-with-public_repo-read>
# Or skip if you can tolerate ~10 API calls slowly

Discover current stable releases:

cd "$WORK"

# Helper: fetch latest stable release tag from a GitHub repo
gh_latest() {
  local repo=$1
  local auth=""
  [ -n "$GITHUB_TOKEN" ] && auth="-H Authorization: Bearer $GITHUB_TOKEN"
  curl -sfL $auth "https://api.github.com/repos/$repo/releases/latest" \
    | jq -r '.tag_name'
}

# Pin captures (one file per pin)
mkdir -p pins
gh_latest "kubernetes-sigs/cluster-api"                | tee pins/CAPI_VERSION
gh_latest "kubernetes-sigs/cluster-api-provider-openstack" | tee pins/CAPO_VERSION
gh_latest "canonical/cluster-api-k8s"                  | tee pins/CK8S_VERSION
gh_latest "cert-manager/cert-manager"                  | tee pins/CERT_MANAGER_VERSION
gh_latest "k-orc/openstack-resource-controller"        | tee pins/ORC_VERSION
gh_latest "k3s-io/k3s"                                 | tee pins/K3S_VERSION
gh_latest "helm/helm"                                  | tee pins/HELM_VERSION

# Load into shell
export CAPI_VERSION=$(cat pins/CAPI_VERSION)
export CAPO_VERSION=$(cat pins/CAPO_VERSION)
export CK8S_VERSION=$(cat pins/CK8S_VERSION)
export CERT_MANAGER_VERSION=$(cat pins/CERT_MANAGER_VERSION)
export ORC_VERSION=$(cat pins/ORC_VERSION)
export K3S_VERSION=$(cat pins/K3S_VERSION)
export HELM_VERSION=$(cat pins/HELM_VERSION)

Discover Kubernetes version supported by the pinned CK8s release:

The CK8s release publishes a metadata.yaml alongside its components that names the Kubernetes versions it supports. Discover the latest supported patch:

gh_supported_k8s() {
  local ck8s_ver=$1
  # CK8s release metadata.yaml is typically published as a release asset
  curl -sfL "https://github.com/canonical/cluster-api-k8s/releases/download/${ck8s_ver}/metadata.yaml" 2>/dev/null \
    | grep -oE "v1\.[0-9]+\.[0-9]+" | sort -uV | tail -1
}

KUBERNETES_VERSION=$(gh_supported_k8s "$CK8S_VERSION")

if [ -z "$KUBERNETES_VERSION" ]; then
  echo "[WARN] could not auto-discover k8s version for CK8s $CK8S_VERSION via metadata.yaml"
  echo "        Consult release notes at: https://github.com/canonical/cluster-api-k8s/releases/tag/$CK8S_VERSION"
  echo "        Then set manually: export KUBERNETES_VERSION=v1.X.Y"
  echo "        (Re-run rest of §4 after setting.)"
else
  echo "[OK] Discovered KUBERNETES_VERSION=$KUBERNETES_VERSION for CK8s=$CK8S_VERSION"
  echo "$KUBERNETES_VERSION" > pins/KUBERNETES_VERSION
  export KUBERNETES_VERSION
fi

# Display for the deploy log
echo ""
echo "=== Pinned versions ==="
for f in pins/*_VERSION; do
  printf "%-30s %s\n" "$(basename "$f")" "$(cat "$f")"
done

Sanity check: all values should look like v1.X.Y or v0.X.Y. If any returned null or empty, the GitHub API call failed — most likely rate-limited. Wait an hour or set $GITHUB_TOKEN and retry.

Capture pins to deploy record:

DEPLOY_RECORD=$HOME/deploy-records/$(date +%Y%m%d-%H%M%S)/capi-pins
mkdir -p "$DEPLOY_RECORD"
cp pins/*_VERSION "$DEPLOY_RECORD/"
ls -la "$DEPLOY_RECORD/"

5. MAAS-deploy capi-mgmt

Prerequisite: capi-mgmt MAAS machine is in Ready state (see §3). Network config in MAAS:

  • eth0 on metal fabric, DHCP → 10.12.8.21 (MAAS-pinned static lease)
  • eth1 on provider fabric, static → 10.12.4.21

Deploy Ubuntu 24.04 (Noble):

# Get the capi-mgmt system_id from MAAS
CAPI_MGMT_SYSTEM_ID=$(maas $MAAS_PROFILE machines read \
  | jq -r '.[] | select(.hostname=="capi-mgmt") | .system_id')
echo "capi-mgmt system_id: $CAPI_MGMT_SYSTEM_ID"

# Deploy
maas $MAAS_PROFILE machine deploy "$CAPI_MGMT_SYSTEM_ID" \
  distro_series=noble \
  hwe_kernel=ga-24.04

Poll for Deployed:

DEPLOY_OK=1
for i in $(seq 1 60); do
  STATUS=$(maas $MAAS_PROFILE machine read "$CAPI_MGMT_SYSTEM_ID" | jq -r '.status_name')
  echo "$(date -Is) capi-mgmt status: $STATUS"
  if [ "$STATUS" = "Deployed" ]; then
    echo "[OK] capi-mgmt Deployed"
    DEPLOY_OK=0
    break
  fi
  if [ "$STATUS" = "Failed deployment" ]; then
    echo "[FAIL] capi-mgmt deployment failed — STOP here, investigate via MAAS UI before continuing"
    DEPLOY_OK=2
    break
  fi
  sleep 30
done

if [ "$DEPLOY_OK" -ne 0 ]; then
  echo "[FAIL] poll exited without a clean Deployed state. STATUS=$STATUS. Stop and investigate."
fi

Typical deploy time: 5-8 minutes on this hardware.

SSH reachability:

# MAAS .maas zone may not resolve from jumphost — use IP directly per handoff lessons
ssh -o StrictHostKeyChecking=accept-new ubuntu@$CAPI_MGMT_METAL_IP -- hostname
# Expect: capi-mgmt

Gotcha: MAAS-deployed Ubuntu uses the ubuntu user, not jessea123. See handoff "recurring technical pitfalls."


6. SSH bootstrap + Vault CA install

On the jumphost, prepare a transport bundle of essentials:

mkdir -p "$WORK/bootstrap-bundle"
cp "$VAULT_CA" "$WORK/bootstrap-bundle/vault-ca.crt"
chmod 644 "$WORK/bootstrap-bundle/vault-ca.crt"

# Bundle pin files so capi-mgmt can read versions
cp -r "$WORK/pins" "$WORK/bootstrap-bundle/"

SCP and install Vault CA on capi-mgmt:

scp -r "$WORK/bootstrap-bundle" ubuntu@$CAPI_MGMT_METAL_IP:/home/ubuntu/

ssh ubuntu@$CAPI_MGMT_METAL_IP <<'EOF'
set -euo pipefail

# Install Vault CA as a system-trusted root
sudo cp /home/ubuntu/bootstrap-bundle/vault-ca.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates 2>&1 | tail -3

# Verify
openssl s_client -connect keystone.omega.dc0.vr0.cloud.neumatrix.local:5000 \
  -CApath /etc/ssl/certs -verify_return_error </dev/null 2>&1 \
  | grep -E "(Verify return code|subject=)" || \
  { echo "TLS chain verify failed against Keystone — investigate before proceeding"; exit 1; }

# Update apt + base utilities
sudo apt-get update -qq
sudo apt-get install -y -qq jq curl yq

# Confirm
which jq curl yq
EOF

Expected:

  • update-ca-certificates reports "1 added"
  • openssl s_client shows Verify return code: 0 (ok) and a Keystone cert whose chain terminates at the Vault CA

Why this matters: Bobcat used tls-insecure=true in cloud.conf which skipped this entire trust path. Our workstream 3b decision (ship Vault CA) means OCCM and CAPO will validate certs against this trust store. If TLS verify fails here, OCCM will crashloop later.

exit 1 inside ssh heredoc: the heredoc body runs on the remote host inside its own bash session. exit 1 there exits the REMOTE session, propagating a non-zero exit back to the local ssh — it does NOT kill the operator's shell.


7. k3s install

On capi-mgmt:

ssh ubuntu@$CAPI_MGMT_METAL_IP "K3S_VERSION=$K3S_VERSION CAPI_MGMT_METAL_IP=$CAPI_MGMT_METAL_IP bash -s" <<'REMOTE_EOF'
set -euo pipefail

# Install k3s with explicit bind/advertise/SAN flags
curl -sfL https://get.k3s.io | \
  INSTALL_K3S_VERSION="$K3S_VERSION" \
  sh -s - server \
    --bind-address="$CAPI_MGMT_METAL_IP" \
    --advertise-address="$CAPI_MGMT_METAL_IP" \
    --node-ip="$CAPI_MGMT_METAL_IP" \
    --tls-san="$CAPI_MGMT_METAL_IP" \
    --tls-san=capi-mgmt.maas \
    --write-kubeconfig-mode=0644 \
    --disable=traefik

# Wait for k3s API to respond
for i in $(seq 1 30); do
  if sudo kubectl get nodes 2>/dev/null | grep -q "Ready"; then
    echo "k3s ready"; break
  fi
  echo "Waiting for k3s API... ($i/30)"
  sleep 5
done

sudo kubectl get nodes
sudo kubectl get pods -A
REMOTE_EOF

Gotcha: --bind-address=$IP makes k3s listen ONLY on that IP — not also on 127.0.0.1. The default kubeconfig at /etc/rancher/k3s/k3s.yaml has server: https://127.0.0.1:6443 and will NOT work as-is. Sed-rewrite below.


8. Kubeconfig server-URL rewrite

ssh ubuntu@$CAPI_MGMT_METAL_IP "CAPI_MGMT_METAL_IP=$CAPI_MGMT_METAL_IP bash -s" <<'REMOTE_EOF'
set -euo pipefail

# Copy k3s kubeconfig to ubuntu user; rewrite server URL
mkdir -p /home/ubuntu/.kube
sudo cp /etc/rancher/k3s/k3s.yaml /home/ubuntu/.kube/config
sudo chown ubuntu:ubuntu /home/ubuntu/.kube/config
chmod 600 /home/ubuntu/.kube/config

# Rewrite 127.0.0.1 → metal IP
sed -i "s|server: https://127.0.0.1:6443|server: https://$CAPI_MGMT_METAL_IP:6443|" \
  /home/ubuntu/.kube/config

# Verify rewrite
grep "server:" /home/ubuntu/.kube/config
# Expect: server: https://10.12.8.21:6443

# Confirm kubectl works as ubuntu user (no sudo)
kubectl get nodes
REMOTE_EOF

9. helm + clusterctl install

ssh ubuntu@$CAPI_MGMT_METAL_IP "HELM_VERSION=$HELM_VERSION CAPI_VERSION=$CAPI_VERSION bash -s" <<'REMOTE_EOF'
set -euo pipefail

# helm install (get-helm-3 fetches the version we specify)
cd /tmp
curl -sfL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 \
  | DESIRED_VERSION="$HELM_VERSION" bash
helm version --short

# clusterctl install
CLUSTERCTL_URL="https://github.com/kubernetes-sigs/cluster-api/releases/download/${CAPI_VERSION}/clusterctl-linux-amd64"
sudo curl -sfL "$CLUSTERCTL_URL" -o /usr/local/bin/clusterctl
sudo chmod +x /usr/local/bin/clusterctl
clusterctl version
REMOTE_EOF

10. clusterctl init (CAPI controllers + cert-manager + ORC + CAPO + CK8s)

ssh ubuntu@$CAPI_MGMT_METAL_IP "CK8S_VERSION=$CK8S_VERSION CERT_MANAGER_VERSION=$CERT_MANAGER_VERSION ORC_VERSION=$ORC_VERSION CAPO_VERSION=$CAPO_VERSION CAPI_VERSION=$CAPI_VERSION bash -s" <<'REMOTE_EOF'
set -euo pipefail

# Configure clusterctl with provider URLs
mkdir -p ~/.cluster-api
cat > ~/.cluster-api/clusterctl.yaml <<EOF
providers:
  - name: "canonical-kubernetes"
    url: "https://github.com/canonical/cluster-api-k8s/releases/${CK8S_VERSION}/bootstrap-components.yaml"
    type: "BootstrapProvider"
  - name: "canonical-kubernetes"
    url: "https://github.com/canonical/cluster-api-k8s/releases/${CK8S_VERSION}/control-plane-components.yaml"
    type: "ControlPlaneProvider"
EOF

# Initialize CAPI with explicit versions
clusterctl init \
  --core "cluster-api:${CAPI_VERSION}" \
  --infrastructure "openstack:${CAPO_VERSION}" \
  --bootstrap "canonical-kubernetes:${CK8S_VERSION}" \
  --control-plane "canonical-kubernetes:${CK8S_VERSION}" \
  --cert-manager-version "${CERT_MANAGER_VERSION}"

# Wait for controllers to be Ready
kubectl wait --for=condition=Available --timeout=5m deployment --all -n capi-system
kubectl wait --for=condition=Available --timeout=5m deployment --all -n capi-kubeadm-bootstrap-system 2>/dev/null || true
kubectl wait --for=condition=Available --timeout=5m deployment --all -n capo-system
kubectl wait --for=condition=Available --timeout=5m deployment --all -n cert-manager

# Install ORC
kubectl apply -f "https://github.com/k-orc/openstack-resource-controller/releases/${ORC_VERSION}/orc.yaml"
kubectl wait --for=condition=Available --timeout=5m deployment --all -n orc-system

# Confirm all controllers
kubectl get pods -A | grep -v "Running\|Completed" | grep -v NAME
# Expected: empty output (all pods Running or no abnormal state)
REMOTE_EOF

Gotcha: the actual namespace names (capi-system, capo-system, etc.) are conventions. If a controller fails to land in the expected namespace, kubectl get deployment -A lists all deployments — diagnose from there.


11. Cloud-side prep (Keystone, Nova, Glance)

Back on the jumphost:

source $HOME/admin-openrc

# Inventory existing resources FIRST (Bobcat lesson: don't create duplicates)
echo "=== Existing images ==="
openstack image list -c ID -c Name -f json | jq -r '.[] | "\(.Name)\t\(.ID)"'
echo ""
echo "=== Existing flavors ==="
openstack flavor list -c Name -c ID -c RAM -c VCPUs -c Disk -f json \
  | jq -r '.[] | "\(.Name)\tRAM=\(.RAM)\tCPU=\(.VCPUs)\tDisk=\(.Disk)\tID=\(.ID)"'
echo ""
echo "=== Existing keypairs ==="
openstack keypair list
echo ""
echo "=== Existing projects in admin_domain ==="
openstack project list --domain admin_domain

Create / verify resources:

# Keystone project + user
openstack project show capi-mgmt --domain admin_domain 2>/dev/null \
  || openstack project create capi-mgmt --domain admin_domain --description "CAPI management plane"

openstack user show capo --domain admin_domain 2>/dev/null \
  || openstack user create capo --domain admin_domain --password-prompt --description "CAPO operator"

# Role assignments (CAPO needs member + load-balancer_member at minimum;
# admin works for testcloud — Roosevelt should use least-privilege)
openstack role add --user capo --user-domain admin_domain \
  --project capi-mgmt --project-domain admin_domain \
  member

openstack role add --user capo --user-domain admin_domain \
  --project capi-mgmt --project-domain admin_domain \
  load-balancer_member 2>/dev/null || \
  echo "(load-balancer_member role may not exist if Octavia not deployed yet)"

# Application credential — captured to file under $HOME (snap confinement)
APP_CRED_FILE=$WORK/capo-app-cred.json
openstack --os-username capo --os-user-domain-name admin_domain \
          --os-project-name capi-mgmt --os-project-domain-name admin_domain \
  application credential create capo-app-cred \
  --description "CAPO operator app credential" \
  -f json > "$APP_CRED_FILE"
chmod 600 "$APP_CRED_FILE"

# Extract credential ID + secret
export APP_CRED_ID=$(jq -r '.id' "$APP_CRED_FILE")
export APP_CRED_SECRET=$(jq -r '.secret' "$APP_CRED_FILE")
echo "App cred ID: $APP_CRED_ID"

Nova keypair (workload node SSH key):

# Generate fresh keypair locally (do NOT reuse jumphost personal key)
ssh-keygen -t ed25519 -N '' -f "$WORK/capi-workload-key" \
  -C "capi-workload-$(date +%Y%m%d)"
chmod 600 "$WORK/capi-workload-key"

# Upload public key to Keystone as a Nova keypair
openstack keypair create --public-key "$WORK/capi-workload-key.pub" capi-workload-key
openstack keypair show capi-workload-key

Workload image:

# Inventory check — use noble-amd64 if it exists (Bobcat lesson: do NOT create ubuntu-24.04-capi as a dup)
NOBLE_IMAGE_ID=$(openstack image show noble-amd64 -c id -f value 2>/dev/null || echo "")

if [ -z "$NOBLE_IMAGE_ID" ]; then
  echo "[FAIL] noble-amd64 image not found in Glance. Upload required before proceeding:"
  echo ""
  echo "  wget https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img -O $WORK/noble-server-cloudimg-amd64.img"
  echo "  openstack image create --disk-format qcow2 --container-format bare \\"
  echo "    --public --file $WORK/noble-server-cloudimg-amd64.img noble-amd64"
  echo ""
  echo "  Then re-run this section."
else
  echo "[OK] Using image: noble-amd64 ($NOBLE_IMAGE_ID)"
  export WORKLOAD_IMAGE_ID=$NOBLE_IMAGE_ID
fi

If the image was missing, the rest of §11 cannot complete. Stop here, upload the image, and rerun §11 from the top.

Workload flavor:

openstack flavor show capi-mgmt-node 2>/dev/null \
  || openstack flavor create capi-mgmt-node \
       --vcpus 4 --ram 4096 --disk 30 \
       --description "CAPI workload node (control plane sizing)"

export WORKLOAD_FLAVOR=capi-mgmt-node

12. clouds.yaml + cloud.conf composition (with Vault CA, no tls-insecure)

The workload cluster's OCCM (OpenStack Cloud Controller Manager) and CAPO both need to call OpenStack APIs. Two files:

  • clouds.yaml — CAPO's view of how to reach OpenStack (used at cluster creation time on capi-mgmt)
  • cloud.conf — OCCM's view, injected into the workload cluster's k8s Secret (used continuously by OCCM running in the workload cluster)

Compose clouds.yaml:

cat > "$WORK/clouds.yaml" <<EOF
clouds:
  capi-mgmt:
    region_name: RegionOne
    interface: public
    identity_api_version: 3
    auth_type: v3applicationcredential
    auth:
      auth_url: https://keystone.omega.dc0.vr0.cloud.neumatrix.local:5000/v3
      application_credential_id: $APP_CRED_ID
      application_credential_secret: $APP_CRED_SECRET
    cacert: /usr/local/share/ca-certificates/vault-ca.crt
    verify: true
EOF
chmod 600 "$WORK/clouds.yaml"

# base64-encode for cluster template embedding (no newline wrapping)
base64 -w0 "$WORK/clouds.yaml" > "$WORK/clouds.yaml.b64"

Compose cloud.conf (INI format, NOT YAML):

cat > "$WORK/cloud.conf" <<EOF
[Global]
auth-url=https://keystone.omega.dc0.vr0.cloud.neumatrix.local:5000/v3
application-credential-id=$APP_CRED_ID
application-credential-secret=$APP_CRED_SECRET
region=RegionOne
domain-name=admin_domain
ca-file=/usr/local/share/ca-certificates/vault-ca.crt

[LoadBalancer]
use-octavia=true
EOF
chmod 600 "$WORK/cloud.conf"

base64 -w0 "$WORK/cloud.conf" > "$WORK/cloud.conf.b64"

Critical delta from Bobcat: the ca-file line replaces tls-insecure=true. The path /usr/local/share/ca-certificates/vault-ca.crt exists on capi-mgmt (from §6) AND will be injected into workload nodes via CK8sConfig in §13.

base64-encode Vault CA for CK8sConfig injection:

base64 -w0 "$VAULT_CA" > "$WORK/vault-ca.crt.b64"
wc -c "$WORK/vault-ca.crt.b64"

13. Cluster template rendering (with Vault CA injection)

The cluster template defines: Cluster, OpenStackCluster, CK8sControlPlane, CK8sConfigTemplate (control plane + workers), MachineDeployment, Secrets for clouds.yaml and cloud.conf.

Variables (18 total):

export CLUSTER_NAME=capi-mgmt-cluster
export CLUSTER_NAMESPACE=default
# KUBERNETES_VERSION was discovered in §4; verify it's set
test -n "$KUBERNETES_VERSION" || { echo "[FAIL] KUBERNETES_VERSION not set; rerun §4 discovery"; }
echo "Using KUBERNETES_VERSION=$KUBERNETES_VERSION"
export CONTROL_PLANE_MACHINE_COUNT=1           # 3 for HA on Roosevelt
export WORKER_MACHINE_COUNT=2                  # 3 on Roosevelt
export OPENSTACK_DNS_NAMESERVERS=1.1.1.1,1.0.0.1   # public DNS, per D-019 (Designate deferred to v2)
export OPENSTACK_FAILURE_DOMAIN=nova
export OPENSTACK_EXTERNAL_NETWORK_ID=$(openstack network show ext_net -c id -f value)
export OPENSTACK_IMAGE_NAME=noble-amd64
export OPENSTACK_FLAVOR=capi-mgmt-node
export OPENSTACK_SSH_KEY_NAME=capi-workload-key
export POD_CIDR=10.244.0.0/16
export SERVICE_CIDR=10.96.0.0/12
export CLOUDS_YAML_B64=$(cat "$WORK/clouds.yaml.b64")
export CLOUD_CONF_B64=$(cat "$WORK/cloud.conf.b64")
export VAULT_CA_B64=$(cat "$WORK/vault-ca.crt.b64")
export CLUSTER_DOMAIN=cluster.local
export OPENSTACK_CLOUD=capi-mgmt

# Sanity print
env | grep -E "^(CLUSTER|KUBERNETES|CONTROL_PLANE|WORKER|OPENSTACK|POD|SERVICE|VAULT|CLOUD)" \
  | grep -v "B64\|SECRET\|PASS" | sort

Render the cluster template:

cat > "$WORK/cluster-template.yaml" <<'TEMPLATE_EOF'
apiVersion: v1
kind: Secret
metadata:
  name: ${CLUSTER_NAME}-cloud-config
  namespace: ${CLUSTER_NAMESPACE}
type: Opaque
data:
  clouds.yaml: ${CLOUDS_YAML_B64}
  cloud.conf: ${CLOUD_CONF_B64}
  cacert: ${VAULT_CA_B64}
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: ${CLUSTER_NAME}
  namespace: ${CLUSTER_NAMESPACE}
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
        - ${POD_CIDR}
    services:
      cidrBlocks:
        - ${SERVICE_CIDR}
    serviceDomain: ${CLUSTER_DOMAIN}
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: OpenStackCluster
    name: ${CLUSTER_NAME}
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta2
    kind: CK8sControlPlane
    name: ${CLUSTER_NAME}-control-plane
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackCluster
metadata:
  name: ${CLUSTER_NAME}
  namespace: ${CLUSTER_NAMESPACE}
spec:
  identityRef:
    name: ${CLUSTER_NAME}-cloud-config
    cloudName: ${OPENSTACK_CLOUD}
  externalNetwork:
    id: ${OPENSTACK_EXTERNAL_NETWORK_ID}
  managedSecurityGroups:
    allowAllInClusterTraffic: true
  apiServerLoadBalancer:
    enabled: true
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta2
kind: CK8sControlPlane
metadata:
  name: ${CLUSTER_NAME}-control-plane
  namespace: ${CLUSTER_NAMESPACE}
spec:
  replicas: ${CONTROL_PLANE_MACHINE_COUNT}
  version: ${KUBERNETES_VERSION}
  machineTemplate:
    infrastructureTemplate:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: OpenStackMachineTemplate
      name: ${CLUSTER_NAME}-control-plane
  spec:
    files:
      - path: /usr/local/share/ca-certificates/vault-ca.crt
        owner: root:root
        permissions: "0644"
        contentFrom:
          secret:
            name: ${CLUSTER_NAME}-cloud-config
            key: cacert
    preRunCommands:
      - update-ca-certificates
    extraKubeAPIServerArgs:
      "--cloud-provider": external
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackMachineTemplate
metadata:
  name: ${CLUSTER_NAME}-control-plane
  namespace: ${CLUSTER_NAMESPACE}
spec:
  template:
    spec:
      flavor: ${OPENSTACK_FLAVOR}
      image:
        filter:
          name: ${OPENSTACK_IMAGE_NAME}
      sshKeyName: ${OPENSTACK_SSH_KEY_NAME}
      identityRef:
        name: ${CLUSTER_NAME}-cloud-config
        cloudName: ${OPENSTACK_CLOUD}
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: ${CLUSTER_NAME}-md-0
  namespace: ${CLUSTER_NAMESPACE}
spec:
  clusterName: ${CLUSTER_NAME}
  replicas: ${WORKER_MACHINE_COUNT}
  selector:
    matchLabels: {}
  template:
    spec:
      clusterName: ${CLUSTER_NAME}
      version: ${KUBERNETES_VERSION}
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
          kind: CK8sConfigTemplate
          name: ${CLUSTER_NAME}-md-0
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: OpenStackMachineTemplate
        name: ${CLUSTER_NAME}-md-0
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackMachineTemplate
metadata:
  name: ${CLUSTER_NAME}-md-0
  namespace: ${CLUSTER_NAMESPACE}
spec:
  template:
    spec:
      flavor: ${OPENSTACK_FLAVOR}
      image:
        filter:
          name: ${OPENSTACK_IMAGE_NAME}
      sshKeyName: ${OPENSTACK_SSH_KEY_NAME}
      identityRef:
        name: ${CLUSTER_NAME}-cloud-config
        cloudName: ${OPENSTACK_CLOUD}
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
kind: CK8sConfigTemplate
metadata:
  name: ${CLUSTER_NAME}-md-0
  namespace: ${CLUSTER_NAMESPACE}
spec:
  template:
    spec:
      files:
        - path: /usr/local/share/ca-certificates/vault-ca.crt
          owner: root:root
          permissions: "0644"
          contentFrom:
            secret:
              name: ${CLUSTER_NAME}-cloud-config
              key: cacert
      preRunCommands:
        - update-ca-certificates
TEMPLATE_EOF

# envsubst to render
envsubst < "$WORK/cluster-template.yaml" > "$WORK/cluster-rendered.yaml"

# Validate as YAML
python3 -c "import yaml; list(yaml.safe_load_all(open('$WORK/cluster-rendered.yaml'))); print('YAML OK')"

# Quick visual check — no leftover ${...} markers
grep -n '\${' "$WORK/cluster-rendered.yaml" || echo "No unsubstituted variables — good"

CK8sConfig field name caveat: the exact field names (files, preRunCommands) and their contentFrom.secret schema are CK8s-version-dependent. If clusterctl init failed earlier with schema warnings, consult the CK8s release notes for the pinned $CK8S_VERSION.


14. Apply + poll-to-Ready

Transfer rendered template to capi-mgmt and apply:

scp "$WORK/cluster-rendered.yaml" ubuntu@$CAPI_MGMT_METAL_IP:/home/ubuntu/cluster.yaml

ssh ubuntu@$CAPI_MGMT_METAL_IP <<'EOF'
set -euo pipefail
kubectl apply -f /home/ubuntu/cluster.yaml
echo "Applied. Waiting for cluster Available status (15-min timeout)..."

for i in $(seq 1 90); do
  STATUS=$(kubectl get cluster capi-mgmt-cluster -o json 2>/dev/null \
    | jq -r '.status.phase // "Unknown"')
  READY=$(kubectl get cluster capi-mgmt-cluster -o json 2>/dev/null \
    | jq -r '.status.conditions[]? | select(.type=="Ready") | .status' \
    | head -1)
  echo "$(date -Is) phase=$STATUS ready=$READY"
  [ "$READY" = "True" ] && { echo "Cluster Ready"; break; }
  sleep 10
done

kubectl get cluster,machines,kubeadmcontrolplane,machinedeployment -A
EOF

If the poll times out before Ready, typical diagnosis:

ssh ubuntu@$CAPI_MGMT_METAL_IP -- kubectl describe cluster capi-mgmt-cluster
ssh ubuntu@$CAPI_MGMT_METAL_IP -- kubectl get machines -A
ssh ubuntu@$CAPI_MGMT_METAL_IP -- kubectl logs -n capo-system deployment/capo-controller-manager --tail=100

Common causes:

  • OpenStack API unreachable from capi-mgmt → check Vault CA install on capi-mgmt (§6)
  • Image / flavor / network ID wrong in cluster template → re-check §11 variables
  • Security group rules block kube-api LB → CAPO usually handles this; check OpenStackCluster status
  • Application credential expired / wrong → re-check $APP_CRED_ID

15. Extract workload kubeconfig

ssh ubuntu@$CAPI_MGMT_METAL_IP -- clusterctl get kubeconfig capi-mgmt-cluster \
  > "$WORK/capi-mgmt-cluster.kubeconfig"
chmod 600 "$WORK/capi-mgmt-cluster.kubeconfig"

# Sanity-check the workload cluster is reachable
kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" get nodes
# Expect: 1 control plane + 2 workers, all Ready

If get nodes times out, the cluster's API LB may not have allocated its external IP yet, or the firewall rules don't permit jumphost → workload API:

# What IP is the cluster's API LB on?
ssh ubuntu@$CAPI_MGMT_METAL_IP -- kubectl get openstackcluster capi-mgmt-cluster \
  -o json | jq '.status.externalNetwork, .status.controlPlaneEndpoint'

# Test reachability
curl -sk --max-time 10 "https://<API-IP>:6443/version" && echo " ← reachable" || echo "API LB unreachable"

16. clusterctl init on target (workload cluster)

The workload cluster must have the same CAPI providers installed before move.

# Run from jumphost using the workload kubeconfig
KUBECONFIG="$WORK/capi-mgmt-cluster.kubeconfig" clusterctl init \
  --core "cluster-api:${CAPI_VERSION}" \
  --infrastructure "openstack:${CAPO_VERSION}" \
  --bootstrap "canonical-kubernetes:${CK8S_VERSION}" \
  --control-plane "canonical-kubernetes:${CK8S_VERSION}" \
  --cert-manager-version "${CERT_MANAGER_VERSION}"

# ORC into workload cluster too
kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" apply \
  -f "https://github.com/k-orc/openstack-resource-controller/releases/${ORC_VERSION}/orc.yaml"

# Wait for everything Available
kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" wait \
  --for=condition=Available --timeout=5m deployment --all -n capi-system
kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" wait \
  --for=condition=Available --timeout=5m deployment --all -n capo-system
kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" wait \
  --for=condition=Available --timeout=5m deployment --all -n cert-manager
kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" wait \
  --for=condition=Available --timeout=5m deployment --all -n orc-system

cert-manager double-install caveat: if CK8s already installed cert-manager during workload bootstrap, the second clusterctl init may warn or skip. Check existing cert-manager version against $CERT_MANAGER_VERSION — if they differ, version-skew issues may surface post-pivot. Adjust the pin in §4 or accept the existing version. Roosevelt's standard practice is to install cert-manager via clusterctl init only (don't pre-install via CK8s) — same approach valid here if you want clean version control.


17. clusterctl move pivot

Move all CAPI CRs from bootstrap k3s → workload cluster:

# Stage the target kubeconfig on capi-mgmt (where clusterctl move runs)
scp "$WORK/capi-mgmt-cluster.kubeconfig" ubuntu@$CAPI_MGMT_METAL_IP:/home/ubuntu/target.kubeconfig

# Dry-run first to catch issues before commit
ssh ubuntu@$CAPI_MGMT_METAL_IP -- clusterctl move \
  --to-kubeconfig=/home/ubuntu/target.kubeconfig \
  --dry-run

# Inspect dry-run output: list of objects to be moved. Should include:
#   - Cluster, OpenStackCluster, OpenStackClusterTemplate
#   - Secrets (cloud-config)
#   - Machine objects, OpenStackMachineTemplate
#   - CK8sControlPlane, CK8sConfigTemplate
#   - MachineDeployment
# Should NOT include cert-manager state (cert-manager manages its own state
# on each cluster independently)

If dry-run looks correct, execute the move:

ssh ubuntu@$CAPI_MGMT_METAL_IP -- clusterctl move \
  --to-kubeconfig=/home/ubuntu/target.kubeconfig

# Move can take several minutes. Output ends with: "moved successfully"

18. Post-pivot verification

echo "=== Bootstrap k3s (should now be empty of cluster CRs) ==="
ssh ubuntu@$CAPI_MGMT_METAL_IP -- kubectl get cluster -A
# Expect: No resources found (or only a header)

ssh ubuntu@$CAPI_MGMT_METAL_IP -- kubectl get machines -A
# Expect: No resources found

ssh ubuntu@$CAPI_MGMT_METAL_IP -- kubectl get openstackcluster -A
# Expect: No resources found

echo ""
echo "=== Workload cluster (should now own its own cluster CRs) ==="
kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" get cluster -A
# Expect: capi-mgmt-cluster shown, phase=Provisioned

kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" get machines -A
# Expect: 3 machines (1 control-plane + 2 workers), all Running

kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" get openstackcluster -A

echo ""
echo "=== CAPI controllers in workload ==="
kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" get pods -A \
  | grep -E "(capi|capo|orc|cert-manager)" | grep -v "Running\|Completed"
# Expect: empty (all controller pods Running)

echo ""
echo "=== OCCM not crash-looping (CRITICAL — main goal of TLS-verify work) ==="
kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" get pods -n kube-system \
  -l k8s-app=openstack-cloud-controller-manager
# Expect: 1 pod Running, NOT CrashLoopBackOff

kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" logs -n kube-system \
  -l k8s-app=openstack-cloud-controller-manager --tail=50 \
  | grep -iE "(tls|cert|error)" | head -20
# Expect: no TLS/cert errors; OCCM should be healthy

If OCCM crash-loops with "x509: certificate signed by unknown authority": Vault CA distribution failed. Check (a) /usr/local/share/ca-certificates/vault-ca.crt exists on workload nodes; (b) update-ca-certificates ran (check /etc/ssl/certs/ca-certificates.crt for the Vault CA's subject); (c) the secret reference in CK8sConfigTemplate matched the secret name. SSH into a worker via the jumphost key (ssh -i $WORK/capi-workload-key ubuntu@<worker-IP-via-FIP>) to diagnose.


19. Handoff to v1-do-doc-08

The workload kubeconfig at $WORK/capi-mgmt-cluster.kubeconfig is the input to v1-do-doc-08-magnum-driver.md. Copy it to a stable path:

mkdir -p $HOME/magnum-capi
cp "$WORK/capi-mgmt-cluster.kubeconfig" $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig
chmod 600 $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig
echo "Workload kubeconfig staged at: $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig"

Important — post-pivot semantic: Magnum's kubeconfig_file setting (under [capi_helm] in /etc/magnum/magnum.conf.d/99-capi.conf, per D-007 corrected language) points to the workload cluster, not the bootstrap k3s. With pivot mandatory, Magnum's CAPI calls flow:

Magnum/leader → workload cluster API → CAPI controllers (running in workload)
→ create new Cluster CRs (tenant Magnum clusters)

The bootstrap k3s on capi-mgmt is now disposable. For v1 testcloud, leave capi-mgmt running so its k3s can be inspected for diagnostics. Roosevelt may destroy capi-mgmt entirely at this point for cost savings.


20. Acceptance criteria — go/no-go for v1-do-doc-08

  • §4 pins captured to $DEPLOY_RECORD and KUBERNETES_VERSION set
  • §5 capi-mgmt MAAS-deployed (status Deployed)
  • §6 Vault CA installed on capi-mgmt; openssl s_client against Keystone returns Verify return code: 0 (ok)
  • §7-10 k3s + CAPI controllers + ORC all Running
  • §11 cloud-side resources present (project, user, role assignments, app cred, keypair, image, flavor)
  • §13 cluster template renders with no unsubstituted ${...} markers; YAML parses
  • §14 workload cluster Ready
  • §15 workload kubeconfig extracted; kubectl get nodes shows 3 nodes Ready
  • §16 workload cluster has CAPI providers installed
  • §17 clusterctl move reported "moved successfully"
  • §18 bootstrap k3s now empty; workload cluster owns Cluster/Machines/etc.; OCCM Running not CrashLoopBackOff
  • §19 workload kubeconfig staged at $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig

If all checked, proceed to v1-do-doc-08-magnum-driver.md.


21. Roosevelt deltas (forward-look)

Aspect Testcloud (v1) Roosevelt
Workload image Default noble-amd64 from cloud-images.ubuntu.com Custom image baked with Vault CA pre-installed (no runtime install step)
Vault CA distribution CK8sConfig files: + preRunCommands: (this runbook) Image-baked + CK8sConfig (defense in depth)
App credential lifetime No expiry set (testcloud) Short-lived rotating credentials via Vault auth method
Workload cluster control plane 1 node 3 nodes (HA)
Workload cluster workers 2 nodes Per-tenant sizing; HPA-driven
clusterctl init --cert-manager-version Pin from §4 Pin to Vault PKI cert-manager profile (separate Roosevelt prep)
capi-mgmt VM lifecycle post-pivot Kept running for diagnostics Destroyed (cost savings; pivot makes it disposable)
Version pinning record $HOME/deploy-records/<timestamp>/capi-pins/ Same pattern, captured in Vault as audit artifact
Authentication to GitHub API Optional PAT Mandatory PAT (avoid rate-limit during automated rebuilds)

22. Change log

Date Change Reference
2026-05-22 Original runbook 04a created. Vault CA distribution (no tls-insecure), mandatory clusterctl move pivot, pin-at-execution version model. Workstream 3b
2026-05-27 Adapted into v1-do-doc-07. Fixes: $REPO path; $VAULT_CA path; $MAAS_PROFILE set; §4 dynamic KUBERNETES_VERSION discovery; §5 MAAS poll exit converted to non-exiting [FAIL]; §11 noble-amd64 missing branch converted to [FAIL]; cross-references updated to v1-do-doc set. Batch C drafting