Status: Second execution document of Batch C. Stands up the CAPI bootstrap cluster on capi-mgmt.maas, creates the workload cluster on the cloud, pivots cluster state into the workload via clusterctl move, and stages the workload kubeconfig for v1-do-doc-08.
Position in sequence: Runs after v1-do-doc-06-magnum-domain.md (Magnum domain setup). Runs before v1-do-doc-08-magnum-driver.md (driver graft consumes the workload kubeconfig produced here).
Replaces: runbooks/04a-capi-bootstrap-cluster.md — same substantive procedure with fixes applied. The old runbook moves to runbooks/deprecated/ as part of this batch's commits.
Fixes applied vs the prior runbook (runbooks/04a-capi-bootstrap-cluster.md):
$REPO corrected from $HOME/repos/openstack-caracal-ipv4 to $HOME/openstack-caracal-ipv4$VAULT_CA corrected from $HOME/vault-pki/root-ca.pem to $HOME/vault-init/vault-ca-root.pem (matches v1-do-doc-05 §10.2 output)$MAAS_PROFILE now explicitly set in §3 shell context (prior version referenced it without setting it)KUBERNETES_VERSION to dynamic pin discovery (was hardcoded v1.31.4 in §13)exit 1 on Failed-deployment converted to non-exiting [FAIL] reportexit 1 converted to non-exiting [FAIL] reportCross-references:
Stand up the CAPI bootstrap cluster on capi-mgmt.maas and pivot cluster state into a self-managing workload cluster. Output:
capi-mgmt-cluster) running in tenant VMs on the cloud, self-managing post-pivot.v1-do-doc-08-magnum-driver.md for the Magnum CAPI Helm driver graft.D-017 posture: L3 full teardown and rebuild every deployment cycle. Nothing is preserved across cycles. capi-mgmt is wiped to MAAS Ready on teardown; rebuilt from scratch by this runbook.
Scope: v1 testcloud. Roosevelt deltas in §20.
Out of scope:
Per workstream 3b sign-off (2026-05-22):
| Decision | Choice | Roosevelt parallel |
|---|---|---|
| Version pinning | Pin-at-execution with discovery in §4 | Same pattern; pins captured in deploy record |
| Cloud TLS trust | Ship Vault CA to capi-mgmt + workload nodes (no tls-insecure) |
Image-baked CA; CK8sConfig redundancy |
clusterctl move pivot |
Mandatory; workload cluster becomes self-managing | Same |
| K8s flavor | Canonical Kubernetes (CK8s) | Same |
| OpenStack auth | v3applicationcredential | Same |
| Pod CIDR | 10.244.0.0/16 |
Same (no conflict with cloud 10.12.0.0/16 or tenant pool 10.20.0.0/16) |
| Service CIDR | 10.96.0.0/12 |
Same |
| Workload cluster name | capi-mgmt-cluster |
Same |
| Workload node SSH user | ubuntu (MAAS/cloud-init convention) |
Same |
Naming convention:
capi-mgmt (in admin_domain)capo (CAPO operator)capo-app-crednoble-amd64 (do NOT duplicate as ubuntu-24.04-capi — Bobcat lesson)capi-mgmt-node (4 vCPU / 4 GiB / 30 GB) — control plane node sizing| Prereq | Verification | |||
|---|---|---|---|---|
Cloud deployed; all charms active/idle per D-011 |
`juju status --color | grep -v "active.*idle"` returns only the header | ||
| Vault initialized + unsealed (v1-do-doc-05) | juju ssh vault/leader -- sudo vault status shows Sealed=false |
|||
| Vault root CA available on jumphost | test -f $HOME/vault-init/vault-ca-root.pem && openssl x509 -in $HOME/vault-init/vault-ca-root.pem -noout -subject |
|||
| Keystone reachable via FQDN | `curl -sf --cacert $HOME/vault-init/vault-ca-root.pem https://keystone.omega.dc0.vr0.cloud.neumatrix.local:5000/v3 | jq .version.idreturns"v3.14"` or current |
||
| Magnum domain set up (v1-do-doc-06) | ( source $HOME/admin-openrc; openstack domain show magnum -f value -c enabled ) returns True |
|||
| capi-mgmt VM exists in MAAS as Ready | `maas $MAAS_PROFILE machines read | jq '.[] | select(.hostname=="capi-mgmt") | .status_name'returns"Ready"` |
| Admin openrc available | `test -f $HOME/admin-openrc && ( source $HOME/admin-openrc && openstack token issue | head -3 )` | ||
| Workspace path under $HOME (snap confinement) | WORK=$HOME/capi-bootstrap; mkdir -p "$WORK"; cd "$WORK"; pwd shows under home |
Set shell context for the runbook:
export REPO="$HOME/openstack-caracal-ipv4"
export WORK="$HOME/capi-bootstrap"
export VAULT_CA="$HOME/vault-init/vault-ca-root.pem"
export CAPI_MGMT_METAL_IP=10.12.8.21
export CAPI_MGMT_PROVIDER_IP=10.12.4.21
export CLUSTER_NAME=capi-mgmt-cluster
export MAAS_PROFILE=$(maas list 2>/dev/null | awk 'NR==1 {print $1}')
mkdir -p "$WORK"
cd "$WORK"
# Sanity-check setup
echo "REPO=$REPO"
echo "WORK=$WORK"
echo "VAULT_CA=$VAULT_CA"
echo "MAAS_PROFILE=$MAAS_PROFILE"
test -f "$VAULT_CA" && echo "[OK] Vault CA present" || echo "[FAIL] Vault CA missing"
test -n "$MAAS_PROFILE" && echo "[OK] MAAS_PROFILE set" || echo "[FAIL] MAAS_PROFILE empty — run 'maas login' first"
Pin-at-execution with discovery procedure documented inline so each rebuild's pins are reproducible AND traceable.
GitHub API: authenticated vs unauthenticated. Unauth has 60 req/hr; authenticated has 5000. For multiple rebuilds in a day, set a token:
# Optional but recommended — avoids rate-limit headaches during rebuild export GITHUB_TOKEN=<your-PAT-with-public_repo-read> # Or skip if you can tolerate ~10 API calls slowly
Discover current stable releases:
cd "$WORK"
# Helper: fetch latest stable release tag from a GitHub repo
gh_latest() {
local repo=$1
local auth=""
[ -n "$GITHUB_TOKEN" ] && auth="-H Authorization: Bearer $GITHUB_TOKEN"
curl -sfL $auth "https://api.github.com/repos/$repo/releases/latest" \
| jq -r '.tag_name'
}
# Pin captures (one file per pin)
mkdir -p pins
gh_latest "kubernetes-sigs/cluster-api" | tee pins/CAPI_VERSION
gh_latest "kubernetes-sigs/cluster-api-provider-openstack" | tee pins/CAPO_VERSION
gh_latest "canonical/cluster-api-k8s" | tee pins/CK8S_VERSION
gh_latest "cert-manager/cert-manager" | tee pins/CERT_MANAGER_VERSION
gh_latest "k-orc/openstack-resource-controller" | tee pins/ORC_VERSION
gh_latest "k3s-io/k3s" | tee pins/K3S_VERSION
gh_latest "helm/helm" | tee pins/HELM_VERSION
# Load into shell
export CAPI_VERSION=$(cat pins/CAPI_VERSION)
export CAPO_VERSION=$(cat pins/CAPO_VERSION)
export CK8S_VERSION=$(cat pins/CK8S_VERSION)
export CERT_MANAGER_VERSION=$(cat pins/CERT_MANAGER_VERSION)
export ORC_VERSION=$(cat pins/ORC_VERSION)
export K3S_VERSION=$(cat pins/K3S_VERSION)
export HELM_VERSION=$(cat pins/HELM_VERSION)
Discover Kubernetes version supported by the pinned CK8s release:
The CK8s release publishes a metadata.yaml alongside its components that names the Kubernetes versions it supports. Discover the latest supported patch:
gh_supported_k8s() {
local ck8s_ver=$1
# CK8s release metadata.yaml is typically published as a release asset
curl -sfL "https://github.com/canonical/cluster-api-k8s/releases/download/${ck8s_ver}/metadata.yaml" 2>/dev/null \
| grep -oE "v1\.[0-9]+\.[0-9]+" | sort -uV | tail -1
}
KUBERNETES_VERSION=$(gh_supported_k8s "$CK8S_VERSION")
if [ -z "$KUBERNETES_VERSION" ]; then
echo "[WARN] could not auto-discover k8s version for CK8s $CK8S_VERSION via metadata.yaml"
echo " Consult release notes at: https://github.com/canonical/cluster-api-k8s/releases/tag/$CK8S_VERSION"
echo " Then set manually: export KUBERNETES_VERSION=v1.X.Y"
echo " (Re-run rest of §4 after setting.)"
else
echo "[OK] Discovered KUBERNETES_VERSION=$KUBERNETES_VERSION for CK8s=$CK8S_VERSION"
echo "$KUBERNETES_VERSION" > pins/KUBERNETES_VERSION
export KUBERNETES_VERSION
fi
# Display for the deploy log
echo ""
echo "=== Pinned versions ==="
for f in pins/*_VERSION; do
printf "%-30s %s\n" "$(basename "$f")" "$(cat "$f")"
done
Sanity check: all values should look like v1.X.Y or v0.X.Y. If any returned null or empty, the GitHub API call failed — most likely rate-limited. Wait an hour or set $GITHUB_TOKEN and retry.
Capture pins to deploy record:
DEPLOY_RECORD=$HOME/deploy-records/$(date +%Y%m%d-%H%M%S)/capi-pins mkdir -p "$DEPLOY_RECORD" cp pins/*_VERSION "$DEPLOY_RECORD/" ls -la "$DEPLOY_RECORD/"
Prerequisite: capi-mgmt MAAS machine is in Ready state (see §3). Network config in MAAS:
10.12.8.21 (MAAS-pinned static lease)10.12.4.21Deploy Ubuntu 24.04 (Noble):
# Get the capi-mgmt system_id from MAAS CAPI_MGMT_SYSTEM_ID=$(maas $MAAS_PROFILE machines read \ | jq -r '.[] | select(.hostname=="capi-mgmt") | .system_id') echo "capi-mgmt system_id: $CAPI_MGMT_SYSTEM_ID" # Deploy maas $MAAS_PROFILE machine deploy "$CAPI_MGMT_SYSTEM_ID" \ distro_series=noble \ hwe_kernel=ga-24.04
Poll for Deployed:
DEPLOY_OK=1
for i in $(seq 1 60); do
STATUS=$(maas $MAAS_PROFILE machine read "$CAPI_MGMT_SYSTEM_ID" | jq -r '.status_name')
echo "$(date -Is) capi-mgmt status: $STATUS"
if [ "$STATUS" = "Deployed" ]; then
echo "[OK] capi-mgmt Deployed"
DEPLOY_OK=0
break
fi
if [ "$STATUS" = "Failed deployment" ]; then
echo "[FAIL] capi-mgmt deployment failed — STOP here, investigate via MAAS UI before continuing"
DEPLOY_OK=2
break
fi
sleep 30
done
if [ "$DEPLOY_OK" -ne 0 ]; then
echo "[FAIL] poll exited without a clean Deployed state. STATUS=$STATUS. Stop and investigate."
fi
Typical deploy time: 5-8 minutes on this hardware.
SSH reachability:
# MAAS .maas zone may not resolve from jumphost — use IP directly per handoff lessons ssh -o StrictHostKeyChecking=accept-new ubuntu@$CAPI_MGMT_METAL_IP -- hostname # Expect: capi-mgmt
Gotcha: MAAS-deployed Ubuntu uses the
ubuntuuser, notjessea123. See handoff "recurring technical pitfalls."
On the jumphost, prepare a transport bundle of essentials:
mkdir -p "$WORK/bootstrap-bundle" cp "$VAULT_CA" "$WORK/bootstrap-bundle/vault-ca.crt" chmod 644 "$WORK/bootstrap-bundle/vault-ca.crt" # Bundle pin files so capi-mgmt can read versions cp -r "$WORK/pins" "$WORK/bootstrap-bundle/"
SCP and install Vault CA on capi-mgmt:
scp -r "$WORK/bootstrap-bundle" ubuntu@$CAPI_MGMT_METAL_IP:/home/ubuntu/
ssh ubuntu@$CAPI_MGMT_METAL_IP <<'EOF'
set -euo pipefail
# Install Vault CA as a system-trusted root
sudo cp /home/ubuntu/bootstrap-bundle/vault-ca.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates 2>&1 | tail -3
# Verify
openssl s_client -connect keystone.omega.dc0.vr0.cloud.neumatrix.local:5000 \
-CApath /etc/ssl/certs -verify_return_error </dev/null 2>&1 \
| grep -E "(Verify return code|subject=)" || \
{ echo "TLS chain verify failed against Keystone — investigate before proceeding"; exit 1; }
# Update apt + base utilities
sudo apt-get update -qq
sudo apt-get install -y -qq jq curl yq
# Confirm
which jq curl yq
EOF
Expected:
update-ca-certificates reports "1 added"openssl s_client shows Verify return code: 0 (ok) and a Keystone cert whose chain terminates at the Vault CAWhy this matters: Bobcat used
tls-insecure=truein cloud.conf which skipped this entire trust path. Our workstream 3b decision (ship Vault CA) means OCCM and CAPO will validate certs against this trust store. If TLS verify fails here, OCCM will crashloop later.
exit 1inside ssh heredoc: the heredoc body runs on the remote host inside its own bash session.exit 1there exits the REMOTE session, propagating a non-zero exit back to the local ssh — it does NOT kill the operator's shell.
On capi-mgmt:
ssh ubuntu@$CAPI_MGMT_METAL_IP "K3S_VERSION=$K3S_VERSION CAPI_MGMT_METAL_IP=$CAPI_MGMT_METAL_IP bash -s" <<'REMOTE_EOF'
set -euo pipefail
# Install k3s with explicit bind/advertise/SAN flags
curl -sfL https://get.k3s.io | \
INSTALL_K3S_VERSION="$K3S_VERSION" \
sh -s - server \
--bind-address="$CAPI_MGMT_METAL_IP" \
--advertise-address="$CAPI_MGMT_METAL_IP" \
--node-ip="$CAPI_MGMT_METAL_IP" \
--tls-san="$CAPI_MGMT_METAL_IP" \
--tls-san=capi-mgmt.maas \
--write-kubeconfig-mode=0644 \
--disable=traefik
# Wait for k3s API to respond
for i in $(seq 1 30); do
if sudo kubectl get nodes 2>/dev/null | grep -q "Ready"; then
echo "k3s ready"; break
fi
echo "Waiting for k3s API... ($i/30)"
sleep 5
done
sudo kubectl get nodes
sudo kubectl get pods -A
REMOTE_EOF
Gotcha:
--bind-address=$IPmakes k3s listen ONLY on that IP — not also on 127.0.0.1. The default kubeconfig at/etc/rancher/k3s/k3s.yamlhasserver: https://127.0.0.1:6443and will NOT work as-is. Sed-rewrite below.
ssh ubuntu@$CAPI_MGMT_METAL_IP "CAPI_MGMT_METAL_IP=$CAPI_MGMT_METAL_IP bash -s" <<'REMOTE_EOF' set -euo pipefail # Copy k3s kubeconfig to ubuntu user; rewrite server URL mkdir -p /home/ubuntu/.kube sudo cp /etc/rancher/k3s/k3s.yaml /home/ubuntu/.kube/config sudo chown ubuntu:ubuntu /home/ubuntu/.kube/config chmod 600 /home/ubuntu/.kube/config # Rewrite 127.0.0.1 → metal IP sed -i "s|server: https://127.0.0.1:6443|server: https://$CAPI_MGMT_METAL_IP:6443|" \ /home/ubuntu/.kube/config # Verify rewrite grep "server:" /home/ubuntu/.kube/config # Expect: server: https://10.12.8.21:6443 # Confirm kubectl works as ubuntu user (no sudo) kubectl get nodes REMOTE_EOF
ssh ubuntu@$CAPI_MGMT_METAL_IP "HELM_VERSION=$HELM_VERSION CAPI_VERSION=$CAPI_VERSION bash -s" <<'REMOTE_EOF'
set -euo pipefail
# helm install (get-helm-3 fetches the version we specify)
cd /tmp
curl -sfL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 \
| DESIRED_VERSION="$HELM_VERSION" bash
helm version --short
# clusterctl install
CLUSTERCTL_URL="https://github.com/kubernetes-sigs/cluster-api/releases/download/${CAPI_VERSION}/clusterctl-linux-amd64"
sudo curl -sfL "$CLUSTERCTL_URL" -o /usr/local/bin/clusterctl
sudo chmod +x /usr/local/bin/clusterctl
clusterctl version
REMOTE_EOF
ssh ubuntu@$CAPI_MGMT_METAL_IP "CK8S_VERSION=$CK8S_VERSION CERT_MANAGER_VERSION=$CERT_MANAGER_VERSION ORC_VERSION=$ORC_VERSION CAPO_VERSION=$CAPO_VERSION CAPI_VERSION=$CAPI_VERSION bash -s" <<'REMOTE_EOF'
set -euo pipefail
# Configure clusterctl with provider URLs
mkdir -p ~/.cluster-api
cat > ~/.cluster-api/clusterctl.yaml <<EOF
providers:
- name: "canonical-kubernetes"
url: "https://github.com/canonical/cluster-api-k8s/releases/${CK8S_VERSION}/bootstrap-components.yaml"
type: "BootstrapProvider"
- name: "canonical-kubernetes"
url: "https://github.com/canonical/cluster-api-k8s/releases/${CK8S_VERSION}/control-plane-components.yaml"
type: "ControlPlaneProvider"
EOF
# Initialize CAPI with explicit versions
clusterctl init \
--core "cluster-api:${CAPI_VERSION}" \
--infrastructure "openstack:${CAPO_VERSION}" \
--bootstrap "canonical-kubernetes:${CK8S_VERSION}" \
--control-plane "canonical-kubernetes:${CK8S_VERSION}" \
--cert-manager-version "${CERT_MANAGER_VERSION}"
# Wait for controllers to be Ready
kubectl wait --for=condition=Available --timeout=5m deployment --all -n capi-system
kubectl wait --for=condition=Available --timeout=5m deployment --all -n capi-kubeadm-bootstrap-system 2>/dev/null || true
kubectl wait --for=condition=Available --timeout=5m deployment --all -n capo-system
kubectl wait --for=condition=Available --timeout=5m deployment --all -n cert-manager
# Install ORC
kubectl apply -f "https://github.com/k-orc/openstack-resource-controller/releases/${ORC_VERSION}/orc.yaml"
kubectl wait --for=condition=Available --timeout=5m deployment --all -n orc-system
# Confirm all controllers
kubectl get pods -A | grep -v "Running\|Completed" | grep -v NAME
# Expected: empty output (all pods Running or no abnormal state)
REMOTE_EOF
Gotcha: the actual namespace names (
capi-system,capo-system, etc.) are conventions. If a controller fails to land in the expected namespace,kubectl get deployment -Alists all deployments — diagnose from there.
Back on the jumphost:
source $HOME/admin-openrc # Inventory existing resources FIRST (Bobcat lesson: don't create duplicates) echo "=== Existing images ===" openstack image list -c ID -c Name -f json | jq -r '.[] | "\(.Name)\t\(.ID)"' echo "" echo "=== Existing flavors ===" openstack flavor list -c Name -c ID -c RAM -c VCPUs -c Disk -f json \ | jq -r '.[] | "\(.Name)\tRAM=\(.RAM)\tCPU=\(.VCPUs)\tDisk=\(.Disk)\tID=\(.ID)"' echo "" echo "=== Existing keypairs ===" openstack keypair list echo "" echo "=== Existing projects in admin_domain ===" openstack project list --domain admin_domain
Create / verify resources:
# Keystone project + user
openstack project show capi-mgmt --domain admin_domain 2>/dev/null \
|| openstack project create capi-mgmt --domain admin_domain --description "CAPI management plane"
openstack user show capo --domain admin_domain 2>/dev/null \
|| openstack user create capo --domain admin_domain --password-prompt --description "CAPO operator"
# Role assignments (CAPO needs member + load-balancer_member at minimum;
# admin works for testcloud — Roosevelt should use least-privilege)
openstack role add --user capo --user-domain admin_domain \
--project capi-mgmt --project-domain admin_domain \
member
openstack role add --user capo --user-domain admin_domain \
--project capi-mgmt --project-domain admin_domain \
load-balancer_member 2>/dev/null || \
echo "(load-balancer_member role may not exist if Octavia not deployed yet)"
# Application credential — captured to file under $HOME (snap confinement)
APP_CRED_FILE=$WORK/capo-app-cred.json
openstack --os-username capo --os-user-domain-name admin_domain \
--os-project-name capi-mgmt --os-project-domain-name admin_domain \
application credential create capo-app-cred \
--description "CAPO operator app credential" \
-f json > "$APP_CRED_FILE"
chmod 600 "$APP_CRED_FILE"
# Extract credential ID + secret
export APP_CRED_ID=$(jq -r '.id' "$APP_CRED_FILE")
export APP_CRED_SECRET=$(jq -r '.secret' "$APP_CRED_FILE")
echo "App cred ID: $APP_CRED_ID"
Nova keypair (workload node SSH key):
# Generate fresh keypair locally (do NOT reuse jumphost personal key) ssh-keygen -t ed25519 -N '' -f "$WORK/capi-workload-key" \ -C "capi-workload-$(date +%Y%m%d)" chmod 600 "$WORK/capi-workload-key" # Upload public key to Keystone as a Nova keypair openstack keypair create --public-key "$WORK/capi-workload-key.pub" capi-workload-key openstack keypair show capi-workload-key
Workload image:
# Inventory check — use noble-amd64 if it exists (Bobcat lesson: do NOT create ubuntu-24.04-capi as a dup) NOBLE_IMAGE_ID=$(openstack image show noble-amd64 -c id -f value 2>/dev/null || echo "") if [ -z "$NOBLE_IMAGE_ID" ]; then echo "[FAIL] noble-amd64 image not found in Glance. Upload required before proceeding:" echo "" echo " wget https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img -O $WORK/noble-server-cloudimg-amd64.img" echo " openstack image create --disk-format qcow2 --container-format bare \\" echo " --public --file $WORK/noble-server-cloudimg-amd64.img noble-amd64" echo "" echo " Then re-run this section." else echo "[OK] Using image: noble-amd64 ($NOBLE_IMAGE_ID)" export WORKLOAD_IMAGE_ID=$NOBLE_IMAGE_ID fi
If the image was missing, the rest of §11 cannot complete. Stop here, upload the image, and rerun §11 from the top.
Workload flavor:
openstack flavor show capi-mgmt-node 2>/dev/null \
|| openstack flavor create capi-mgmt-node \
--vcpus 4 --ram 4096 --disk 30 \
--description "CAPI workload node (control plane sizing)"
export WORKLOAD_FLAVOR=capi-mgmt-node
The workload cluster's OCCM (OpenStack Cloud Controller Manager) and CAPO both need to call OpenStack APIs. Two files:
clouds.yaml — CAPO's view of how to reach OpenStack (used at cluster creation time on capi-mgmt)cloud.conf — OCCM's view, injected into the workload cluster's k8s Secret (used continuously by OCCM running in the workload cluster)Compose clouds.yaml:
cat > "$WORK/clouds.yaml" <<EOF
clouds:
capi-mgmt:
region_name: RegionOne
interface: public
identity_api_version: 3
auth_type: v3applicationcredential
auth:
auth_url: https://keystone.omega.dc0.vr0.cloud.neumatrix.local:5000/v3
application_credential_id: $APP_CRED_ID
application_credential_secret: $APP_CRED_SECRET
cacert: /usr/local/share/ca-certificates/vault-ca.crt
verify: true
EOF
chmod 600 "$WORK/clouds.yaml"
# base64-encode for cluster template embedding (no newline wrapping)
base64 -w0 "$WORK/clouds.yaml" > "$WORK/clouds.yaml.b64"
Compose cloud.conf (INI format, NOT YAML):
cat > "$WORK/cloud.conf" <<EOF [Global] auth-url=https://keystone.omega.dc0.vr0.cloud.neumatrix.local:5000/v3 application-credential-id=$APP_CRED_ID application-credential-secret=$APP_CRED_SECRET region=RegionOne domain-name=admin_domain ca-file=/usr/local/share/ca-certificates/vault-ca.crt [LoadBalancer] use-octavia=true EOF chmod 600 "$WORK/cloud.conf" base64 -w0 "$WORK/cloud.conf" > "$WORK/cloud.conf.b64"
Critical delta from Bobcat: the
ca-fileline replacestls-insecure=true. The path/usr/local/share/ca-certificates/vault-ca.crtexists on capi-mgmt (from §6) AND will be injected into workload nodes via CK8sConfig in §13.
base64-encode Vault CA for CK8sConfig injection:
base64 -w0 "$VAULT_CA" > "$WORK/vault-ca.crt.b64" wc -c "$WORK/vault-ca.crt.b64"
The cluster template defines: Cluster, OpenStackCluster, CK8sControlPlane, CK8sConfigTemplate (control plane + workers), MachineDeployment, Secrets for clouds.yaml and cloud.conf.
Variables (18 total):
export CLUSTER_NAME=capi-mgmt-cluster
export CLUSTER_NAMESPACE=default
# KUBERNETES_VERSION was discovered in §4; verify it's set
test -n "$KUBERNETES_VERSION" || { echo "[FAIL] KUBERNETES_VERSION not set; rerun §4 discovery"; }
echo "Using KUBERNETES_VERSION=$KUBERNETES_VERSION"
export CONTROL_PLANE_MACHINE_COUNT=1 # 3 for HA on Roosevelt
export WORKER_MACHINE_COUNT=2 # 3 on Roosevelt
export OPENSTACK_DNS_NAMESERVERS=1.1.1.1,1.0.0.1 # public DNS, per D-019 (Designate deferred to v2)
export OPENSTACK_FAILURE_DOMAIN=nova
export OPENSTACK_EXTERNAL_NETWORK_ID=$(openstack network show ext_net -c id -f value)
export OPENSTACK_IMAGE_NAME=noble-amd64
export OPENSTACK_FLAVOR=capi-mgmt-node
export OPENSTACK_SSH_KEY_NAME=capi-workload-key
export POD_CIDR=10.244.0.0/16
export SERVICE_CIDR=10.96.0.0/12
export CLOUDS_YAML_B64=$(cat "$WORK/clouds.yaml.b64")
export CLOUD_CONF_B64=$(cat "$WORK/cloud.conf.b64")
export VAULT_CA_B64=$(cat "$WORK/vault-ca.crt.b64")
export CLUSTER_DOMAIN=cluster.local
export OPENSTACK_CLOUD=capi-mgmt
# Sanity print
env | grep -E "^(CLUSTER|KUBERNETES|CONTROL_PLANE|WORKER|OPENSTACK|POD|SERVICE|VAULT|CLOUD)" \
| grep -v "B64\|SECRET\|PASS" | sort
Render the cluster template:
cat > "$WORK/cluster-template.yaml" <<'TEMPLATE_EOF'
apiVersion: v1
kind: Secret
metadata:
name: ${CLUSTER_NAME}-cloud-config
namespace: ${CLUSTER_NAMESPACE}
type: Opaque
data:
clouds.yaml: ${CLOUDS_YAML_B64}
cloud.conf: ${CLOUD_CONF_B64}
cacert: ${VAULT_CA_B64}
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: ${CLUSTER_NAME}
namespace: ${CLUSTER_NAMESPACE}
spec:
clusterNetwork:
pods:
cidrBlocks:
- ${POD_CIDR}
services:
cidrBlocks:
- ${SERVICE_CIDR}
serviceDomain: ${CLUSTER_DOMAIN}
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackCluster
name: ${CLUSTER_NAME}
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta2
kind: CK8sControlPlane
name: ${CLUSTER_NAME}-control-plane
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackCluster
metadata:
name: ${CLUSTER_NAME}
namespace: ${CLUSTER_NAMESPACE}
spec:
identityRef:
name: ${CLUSTER_NAME}-cloud-config
cloudName: ${OPENSTACK_CLOUD}
externalNetwork:
id: ${OPENSTACK_EXTERNAL_NETWORK_ID}
managedSecurityGroups:
allowAllInClusterTraffic: true
apiServerLoadBalancer:
enabled: true
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta2
kind: CK8sControlPlane
metadata:
name: ${CLUSTER_NAME}-control-plane
namespace: ${CLUSTER_NAMESPACE}
spec:
replicas: ${CONTROL_PLANE_MACHINE_COUNT}
version: ${KUBERNETES_VERSION}
machineTemplate:
infrastructureTemplate:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackMachineTemplate
name: ${CLUSTER_NAME}-control-plane
spec:
files:
- path: /usr/local/share/ca-certificates/vault-ca.crt
owner: root:root
permissions: "0644"
contentFrom:
secret:
name: ${CLUSTER_NAME}-cloud-config
key: cacert
preRunCommands:
- update-ca-certificates
extraKubeAPIServerArgs:
"--cloud-provider": external
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackMachineTemplate
metadata:
name: ${CLUSTER_NAME}-control-plane
namespace: ${CLUSTER_NAMESPACE}
spec:
template:
spec:
flavor: ${OPENSTACK_FLAVOR}
image:
filter:
name: ${OPENSTACK_IMAGE_NAME}
sshKeyName: ${OPENSTACK_SSH_KEY_NAME}
identityRef:
name: ${CLUSTER_NAME}-cloud-config
cloudName: ${OPENSTACK_CLOUD}
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
name: ${CLUSTER_NAME}-md-0
namespace: ${CLUSTER_NAMESPACE}
spec:
clusterName: ${CLUSTER_NAME}
replicas: ${WORKER_MACHINE_COUNT}
selector:
matchLabels: {}
template:
spec:
clusterName: ${CLUSTER_NAME}
version: ${KUBERNETES_VERSION}
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
kind: CK8sConfigTemplate
name: ${CLUSTER_NAME}-md-0
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackMachineTemplate
name: ${CLUSTER_NAME}-md-0
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackMachineTemplate
metadata:
name: ${CLUSTER_NAME}-md-0
namespace: ${CLUSTER_NAMESPACE}
spec:
template:
spec:
flavor: ${OPENSTACK_FLAVOR}
image:
filter:
name: ${OPENSTACK_IMAGE_NAME}
sshKeyName: ${OPENSTACK_SSH_KEY_NAME}
identityRef:
name: ${CLUSTER_NAME}-cloud-config
cloudName: ${OPENSTACK_CLOUD}
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
kind: CK8sConfigTemplate
metadata:
name: ${CLUSTER_NAME}-md-0
namespace: ${CLUSTER_NAMESPACE}
spec:
template:
spec:
files:
- path: /usr/local/share/ca-certificates/vault-ca.crt
owner: root:root
permissions: "0644"
contentFrom:
secret:
name: ${CLUSTER_NAME}-cloud-config
key: cacert
preRunCommands:
- update-ca-certificates
TEMPLATE_EOF
# envsubst to render
envsubst < "$WORK/cluster-template.yaml" > "$WORK/cluster-rendered.yaml"
# Validate as YAML
python3 -c "import yaml; list(yaml.safe_load_all(open('$WORK/cluster-rendered.yaml'))); print('YAML OK')"
# Quick visual check — no leftover ${...} markers
grep -n '\${' "$WORK/cluster-rendered.yaml" || echo "No unsubstituted variables — good"
CK8sConfig field name caveat: the exact field names (
files,preRunCommands) and theircontentFrom.secretschema are CK8s-version-dependent. Ifclusterctl initfailed earlier with schema warnings, consult the CK8s release notes for the pinned$CK8S_VERSION.
Transfer rendered template to capi-mgmt and apply:
scp "$WORK/cluster-rendered.yaml" ubuntu@$CAPI_MGMT_METAL_IP:/home/ubuntu/cluster.yaml
ssh ubuntu@$CAPI_MGMT_METAL_IP <<'EOF'
set -euo pipefail
kubectl apply -f /home/ubuntu/cluster.yaml
echo "Applied. Waiting for cluster Available status (15-min timeout)..."
for i in $(seq 1 90); do
STATUS=$(kubectl get cluster capi-mgmt-cluster -o json 2>/dev/null \
| jq -r '.status.phase // "Unknown"')
READY=$(kubectl get cluster capi-mgmt-cluster -o json 2>/dev/null \
| jq -r '.status.conditions[]? | select(.type=="Ready") | .status' \
| head -1)
echo "$(date -Is) phase=$STATUS ready=$READY"
[ "$READY" = "True" ] && { echo "Cluster Ready"; break; }
sleep 10
done
kubectl get cluster,machines,kubeadmcontrolplane,machinedeployment -A
EOF
If the poll times out before Ready, typical diagnosis:
ssh ubuntu@$CAPI_MGMT_METAL_IP -- kubectl describe cluster capi-mgmt-cluster ssh ubuntu@$CAPI_MGMT_METAL_IP -- kubectl get machines -A ssh ubuntu@$CAPI_MGMT_METAL_IP -- kubectl logs -n capo-system deployment/capo-controller-manager --tail=100
Common causes:
$APP_CRED_IDssh ubuntu@$CAPI_MGMT_METAL_IP -- clusterctl get kubeconfig capi-mgmt-cluster \ > "$WORK/capi-mgmt-cluster.kubeconfig" chmod 600 "$WORK/capi-mgmt-cluster.kubeconfig" # Sanity-check the workload cluster is reachable kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" get nodes # Expect: 1 control plane + 2 workers, all Ready
If get nodes times out, the cluster's API LB may not have allocated its external IP yet, or the firewall rules don't permit jumphost → workload API:
# What IP is the cluster's API LB on? ssh ubuntu@$CAPI_MGMT_METAL_IP -- kubectl get openstackcluster capi-mgmt-cluster \ -o json | jq '.status.externalNetwork, .status.controlPlaneEndpoint' # Test reachability curl -sk --max-time 10 "https://<API-IP>:6443/version" && echo " ← reachable" || echo "API LB unreachable"
clusterctl init on target (workload cluster)The workload cluster must have the same CAPI providers installed before move.
# Run from jumphost using the workload kubeconfig
KUBECONFIG="$WORK/capi-mgmt-cluster.kubeconfig" clusterctl init \
--core "cluster-api:${CAPI_VERSION}" \
--infrastructure "openstack:${CAPO_VERSION}" \
--bootstrap "canonical-kubernetes:${CK8S_VERSION}" \
--control-plane "canonical-kubernetes:${CK8S_VERSION}" \
--cert-manager-version "${CERT_MANAGER_VERSION}"
# ORC into workload cluster too
kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" apply \
-f "https://github.com/k-orc/openstack-resource-controller/releases/${ORC_VERSION}/orc.yaml"
# Wait for everything Available
kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" wait \
--for=condition=Available --timeout=5m deployment --all -n capi-system
kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" wait \
--for=condition=Available --timeout=5m deployment --all -n capo-system
kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" wait \
--for=condition=Available --timeout=5m deployment --all -n cert-manager
kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" wait \
--for=condition=Available --timeout=5m deployment --all -n orc-system
cert-manager double-install caveat: if CK8s already installed cert-manager during workload bootstrap, the second
clusterctl initmay warn or skip. Check existing cert-manager version against$CERT_MANAGER_VERSION— if they differ, version-skew issues may surface post-pivot. Adjust the pin in §4 or accept the existing version. Roosevelt's standard practice is to install cert-manager viaclusterctl initonly (don't pre-install via CK8s) — same approach valid here if you want clean version control.
clusterctl move pivotMove all CAPI CRs from bootstrap k3s → workload cluster:
# Stage the target kubeconfig on capi-mgmt (where clusterctl move runs) scp "$WORK/capi-mgmt-cluster.kubeconfig" ubuntu@$CAPI_MGMT_METAL_IP:/home/ubuntu/target.kubeconfig # Dry-run first to catch issues before commit ssh ubuntu@$CAPI_MGMT_METAL_IP -- clusterctl move \ --to-kubeconfig=/home/ubuntu/target.kubeconfig \ --dry-run # Inspect dry-run output: list of objects to be moved. Should include: # - Cluster, OpenStackCluster, OpenStackClusterTemplate # - Secrets (cloud-config) # - Machine objects, OpenStackMachineTemplate # - CK8sControlPlane, CK8sConfigTemplate # - MachineDeployment # Should NOT include cert-manager state (cert-manager manages its own state # on each cluster independently)
If dry-run looks correct, execute the move:
ssh ubuntu@$CAPI_MGMT_METAL_IP -- clusterctl move \ --to-kubeconfig=/home/ubuntu/target.kubeconfig # Move can take several minutes. Output ends with: "moved successfully"
echo "=== Bootstrap k3s (should now be empty of cluster CRs) ===" ssh ubuntu@$CAPI_MGMT_METAL_IP -- kubectl get cluster -A # Expect: No resources found (or only a header) ssh ubuntu@$CAPI_MGMT_METAL_IP -- kubectl get machines -A # Expect: No resources found ssh ubuntu@$CAPI_MGMT_METAL_IP -- kubectl get openstackcluster -A # Expect: No resources found echo "" echo "=== Workload cluster (should now own its own cluster CRs) ===" kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" get cluster -A # Expect: capi-mgmt-cluster shown, phase=Provisioned kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" get machines -A # Expect: 3 machines (1 control-plane + 2 workers), all Running kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" get openstackcluster -A echo "" echo "=== CAPI controllers in workload ===" kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" get pods -A \ | grep -E "(capi|capo|orc|cert-manager)" | grep -v "Running\|Completed" # Expect: empty (all controller pods Running) echo "" echo "=== OCCM not crash-looping (CRITICAL — main goal of TLS-verify work) ===" kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" get pods -n kube-system \ -l k8s-app=openstack-cloud-controller-manager # Expect: 1 pod Running, NOT CrashLoopBackOff kubectl --kubeconfig "$WORK/capi-mgmt-cluster.kubeconfig" logs -n kube-system \ -l k8s-app=openstack-cloud-controller-manager --tail=50 \ | grep -iE "(tls|cert|error)" | head -20 # Expect: no TLS/cert errors; OCCM should be healthy
If OCCM crash-loops with "x509: certificate signed by unknown authority": Vault CA distribution failed. Check (a)
/usr/local/share/ca-certificates/vault-ca.crtexists on workload nodes; (b)update-ca-certificatesran (check/etc/ssl/certs/ca-certificates.crtfor the Vault CA's subject); (c) the secret reference in CK8sConfigTemplate matched the secret name. SSH into a worker via the jumphost key (ssh -i $WORK/capi-workload-key ubuntu@<worker-IP-via-FIP>) to diagnose.
The workload kubeconfig at $WORK/capi-mgmt-cluster.kubeconfig is the input to v1-do-doc-08-magnum-driver.md. Copy it to a stable path:
mkdir -p $HOME/magnum-capi cp "$WORK/capi-mgmt-cluster.kubeconfig" $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig chmod 600 $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig echo "Workload kubeconfig staged at: $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig"
Important — post-pivot semantic: Magnum's
kubeconfig_filesetting (under[capi_helm]in/etc/magnum/magnum.conf.d/99-capi.conf, per D-007 corrected language) points to the workload cluster, not the bootstrap k3s. With pivot mandatory, Magnum's CAPI calls flow:Magnum/leader → workload cluster API → CAPI controllers (running in workload) → create new Cluster CRs (tenant Magnum clusters)The bootstrap k3s on capi-mgmt is now disposable. For v1 testcloud, leave capi-mgmt running so its k3s can be inspected for diagnostics. Roosevelt may destroy capi-mgmt entirely at this point for cost savings.
$DEPLOY_RECORD and KUBERNETES_VERSION setDeployed)openssl s_client against Keystone returns Verify return code: 0 (ok)${...} markers; YAML parseskubectl get nodes shows 3 nodes Readyclusterctl move reported "moved successfully"$HOME/magnum-capi/capi-mgmt-cluster.kubeconfigIf all checked, proceed to v1-do-doc-08-magnum-driver.md.
| Aspect | Testcloud (v1) | Roosevelt |
|---|---|---|
| Workload image | Default noble-amd64 from cloud-images.ubuntu.com |
Custom image baked with Vault CA pre-installed (no runtime install step) |
| Vault CA distribution | CK8sConfig files: + preRunCommands: (this runbook) |
Image-baked + CK8sConfig (defense in depth) |
| App credential lifetime | No expiry set (testcloud) | Short-lived rotating credentials via Vault auth method |
| Workload cluster control plane | 1 node | 3 nodes (HA) |
| Workload cluster workers | 2 nodes | Per-tenant sizing; HPA-driven |
clusterctl init --cert-manager-version |
Pin from §4 | Pin to Vault PKI cert-manager profile (separate Roosevelt prep) |
| capi-mgmt VM lifecycle post-pivot | Kept running for diagnostics | Destroyed (cost savings; pivot makes it disposable) |
| Version pinning record | $HOME/deploy-records/<timestamp>/capi-pins/ |
Same pattern, captured in Vault as audit artifact |
| Authentication to GitHub API | Optional PAT | Mandatory PAT (avoid rate-limit during automated rebuilds) |
| Date | Change | Reference |
|---|---|---|
| 2026-05-22 | Original runbook 04a created. Vault CA distribution (no tls-insecure), mandatory clusterctl move pivot, pin-at-execution version model. |
Workstream 3b |
| 2026-05-27 | Adapted into v1-do-doc-07. Fixes: $REPO path; $VAULT_CA path; $MAAS_PROFILE set; §4 dynamic KUBERNETES_VERSION discovery; §5 MAAS poll exit converted to non-exiting [FAIL]; §11 noble-amd64 missing branch converted to [FAIL]; cross-references updated to v1-do-doc set. |
Batch C drafting |