Stand up the CAPI/Magnum management cluster as a single-homed in-cloud tenant VM (capi-mgmt-v2), bootstrap k8s-snap on it, prove pod egress through the hard gate, and install the pinned CAPI provider stack. This is the persistent v1 management cluster -- there is NO clusterctl move/pivot.
Decisions: D-035 (in-cloud single-homed tenant VM; retires D-033/D-017), D-034 (CAPI versions sourced from the capi-helm-charts tag's dependencies.json, never hardcoded), D-031 (Magnum + magnum-capi-helm + capi-helm-charts engine). Troubleshooting: appendix-A entries DOCFIX-021, DOCFIX-024, DOCFIX-025a, D-035.
admin-openrc sourced on the jumphost; openstack, jq, kubectl available.capi-mgmt Keystone project, the flavors, and the ubuntu-24.04-noble image exist -- on a FRESH deploy NONE of these survive teardown; Step 6.0-BOOT below verifies-or-creates all of them (run it first). The Magnum trustee domain is auto-configured by the magnum charm via its keystone (identity-credentials) relation -- verify [trust] (trustee_domain_id / trustee_domain_admin_id / trustee_domain_admin_password) is populated in magnum.conf; no manual step.capi-mgmt-net tenant network yet (this phase creates it).Literals below are tagged ENV(...) so the later generalization pass is mechanical. Discover everything else dynamically at run time.
ENV(project) capi-mgmt (id 674171fd28d446d3a37073b6a761e910)ENV(ext-net) provider-ext (id 70b34bb2-3afb-4b43-96d3-f520dbcbf9a8)ENV(image) ubuntu-24.04-noble (id c66342ce-f402-4e6e-a324-ae27032396d7)ENV(flavor) gp.large (16384 MB / 4 vCPU / 80 GB)ENV(mgmt-cidr) 10.20.0.0/24 (capi-mgmt-subnet; overlay, non-IPAM)ENV(keystone-vip) 10.12.4.50:5000 (the gate target -- the deployed VIP)ENV(mgmt-fip) 10.12.7.40 (assigned in 6.2; apiserver SAN)ENV(pod-cidr) 10.1.0.0/16 ENV(svc-cidr) 10.152.183.0/24 (snap defaults; non-colliding)ENV(capi-tag) 0.25.1 (capi-helm-charts release; dependencies.json source)# RUN: jumphost -- on vopenstack-jesse as jessea123, admin-openrc sourced.# RUN: mgmt VM -- shipped to the VM over SSH via the FIP (heredoc below).</dev/null on every sudo): ssh -i ~/.ssh/id_ed25519 -o BatchMode=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=10 ubuntu@10.12.7.40 bash -s <<'REOF' ... REOF# RUN: jumphost REQUIRED on a fresh deploy: post-teardown the cloud has no tenant projects, NO flavors, and NO images -- this is the substance of the retired do-doc-06 tenant setup, restored after the phase-NN consolidation dropped it (found in the 2026-06-10 pre-redeploy review). Everything is verify-or-create, so it is safe (all [SKIP]) on an existing cloud.
Flavor specs are as-built ground truth (2026-06-08 verified-state checkpoint): gp.large 16384/4/80 (mgmt VM, 6.2), gp.mid 8192/2/40 (workload masters, 8.0 template), capi.node 4096/2/40 (workload workers, 8.0 template); gp.small and m1.lbtest are as-built parity. The 40/80 GB root disks schedule because the bundle sets nova-compute libvirt-image-backend: rbd (B3) -- DISK_GB comes from the Ceph pool, not the ~9 GB local ephemeral ceiling.
The noble image imports via the interoperable import path (glance-direct), the VERBATIM-proven path from the 2026-06-08 kube-image upload (plain web-download 403s on this cloud). With the hardened bundle's glance image-conversion: true, the stored disk_format lands raw on the redeploy (expected; D-021 Ceph fast-clone alignment).
LIVE-REVIEW (two as-built facts not in the record -- capture from the OLD cloud BEFORE teardown if still possible): openstack project show capi-mgmt -f yaml (the project's domain) and openstack image show ubuntu-24.04-noble -f yaml (visibility). The block below defaults the domain from the admin token and lets glance default visibility (the 06-08 import landed shared with no flag); if 6.2 later fails image-not-found under capi-mgmt scope, openstack image set --public ubuntu-24.04-noble is the one-line repair.
( {
set -u
source ~/admin-openrc
echo "=== project capi-mgmt (verify-or-create) ==="
PROJ_DOMAIN="${OS_PROJECT_DOMAIN_NAME:-default}" # LIVE-REVIEW: as-built domain
openstack project show capi-mgmt >/dev/null 2>&1 \
&& echo "[SKIP] project capi-mgmt exists" \
|| { openstack project create --domain "$PROJ_DOMAIN" capi-mgmt >/dev/null \
&& echo "[OK] project capi-mgmt (domain $PROJ_DOMAIN)"; }
echo "=== role: let $OS_USERNAME scope to capi-mgmt (OS_PROJECT_ID blocks in 6.x/7.8/8.x) ==="
openstack role assignment list --user "$OS_USERNAME" --user-domain "$OS_USER_DOMAIN_NAME" \
--project capi-mgmt --project-domain "$PROJ_DOMAIN" -f value 2>/dev/null | grep -q . \
&& echo "[SKIP] role assignment present" \
|| { openstack role add --user "$OS_USERNAME" --user-domain "$OS_USER_DOMAIN_NAME" \
--project capi-mgmt --project-domain "$PROJ_DOMAIN" admin \
&& echo "[OK] admin role on capi-mgmt"; }
echo "=== flavors (as-built specs; public) ==="
for spec in "gp.large 4 16384 80" "gp.mid 2 8192 40" "capi.node 2 4096 40" \
"gp.small 1 2048 20" "m1.lbtest 1 1024 4"; do
set -- $spec
openstack flavor show "$1" >/dev/null 2>&1 \
&& echo "[SKIP] flavor $1 exists" \
|| { openstack flavor create --vcpus "$2" --ram "$3" --disk "$4" --public "$1" >/dev/null \
&& echo "[OK] $1 ($2 vcpu / $3 MB / $4 GB)"; }
done
echo "=== mgmt VM image ubuntu-24.04-noble (verify-or-import; glance-direct; HOME-staged, L7) ==="
if openstack image show ubuntu-24.04-noble >/dev/null 2>&1; then
echo "[SKIP] image ubuntu-24.04-noble exists"
else
SRC="$HOME/noble-server-cloudimg-amd64.img"
[ -f "$SRC" ] || { echo "ABORT: $SRC missing (re-fetch: cloud-images.ubuntu.com/noble/current/)"; exit 1; }
glance image-create-via-import \
--import-method glance-direct \
--file "$SRC" \
--container-format bare --disk-format qcow2 \
--name ubuntu-24.04-noble
fi
echo "=== poll to active (import + conversion) ==="
for i in $(seq 1 40); do
ST=$(openstack image show ubuntu-24.04-noble -f value -c status 2>/dev/null || echo '?')
echo "[$i] status=$ST"
[ "$ST" = active ] && break
sleep 15
done
} )
GATE: project + role + all five flavors present; ubuntu-24.04-noble active (disk_format raw expected with image-conversion on). Do not proceed to 6.0 until this passes.
# RUN: jumphost Safe/idempotent setup -- consolidated. (LIVE-REVIEW: exact SG rule syntax is standard openstack-client; confirm on the redeploy test.)
( {
set -u
PROJ=capi-mgmt # ENV(project)
echo "=== keypair (import the jumphost pubkey) ==="
openstack keypair show capi-mgmt-key >/dev/null 2>&1 \
|| openstack keypair create --public-key ~/.ssh/id_ed25519.pub capi-mgmt-key
echo "=== security group capi-mgmt-sg (ingress 22 + 6443; egress default-allow) ==="
openstack security group show capi-mgmt-sg >/dev/null 2>&1 \
|| openstack security group create --project "$PROJ" capi-mgmt-sg
SG=$(openstack security group show capi-mgmt-sg -f value -c id)
# add rules only if absent (re-run safe)
openstack security group rule list "$SG" -f value -c "Port Range" | grep -q '^22:22' \
|| openstack security group rule create --proto tcp --dst-port 22 "$SG"
openstack security group rule list "$SG" -f value -c "Port Range" | grep -q '^6443:6443' \
|| openstack security group rule create --proto tcp --dst-port 6443 "$SG"
echo "=== verify ==="
openstack security group rule list "$SG" -f value -c Protocol -c "Port Range"
} )
Expect: capi-mgmt-key present; capi-mgmt-sg with tcp/22 and tcp/6443 ingress.
# RUN: jumphost Idempotent network plumbing -- consolidated. DNS nameservers 1.1.1.1/1.0.0.1 (D-019: public resolvers; image pulls need internet egress).
( {
set -u
PROJ=capi-mgmt # ENV(project)
EXT=provider-ext # ENV(ext-net)
echo "=== network capi-mgmt-net ==="
openstack network show capi-mgmt-net >/dev/null 2>&1 \
|| openstack network create --project "$PROJ" capi-mgmt-net
echo "=== subnet capi-mgmt-subnet 10.20.0.0/24 ===" # ENV(mgmt-cidr)
openstack subnet show capi-mgmt-subnet >/dev/null 2>&1 \
|| openstack subnet create --project "$PROJ" --network capi-mgmt-net \
--subnet-range 10.20.0.0/24 \
--dns-nameserver 1.1.1.1 --dns-nameserver 1.0.0.1 capi-mgmt-subnet
echo "=== router capi-mgmt-router + ext-gw + subnet ==="
openstack router show capi-mgmt-router >/dev/null 2>&1 \
|| openstack router create --project "$PROJ" capi-mgmt-router
openstack router set --external-gateway "$EXT" capi-mgmt-router
openstack router add subnet capi-mgmt-router capi-mgmt-subnet 2>/dev/null || true
echo "=== verify ==="
openstack router show capi-mgmt-router -f value -c external_gateway_info -c status
} )
Expect: subnet 10.20.0.0/24; router ACTIVE with an external gateway on provider-ext.
# RUN: jumphost Creates the VM and pins the management FIP. The FIP is the stable apiserver endpoint for the jumphost AND the Magnum conductor.
( {
set -u
PROJ=capi-mgmt # ENV(project)
EXT=provider-ext # ENV(ext-net)
echo "=== create capi-mgmt-v2 (gp.large / ubuntu-24.04-noble) ==="
openstack server show capi-mgmt-v2 >/dev/null 2>&1 \
|| openstack server create --image ubuntu-24.04-noble --flavor gp.large \
--network capi-mgmt-net --security-group capi-mgmt-sg \
--key-name capi-mgmt-key capi-mgmt-v2
echo "=== wait ACTIVE (re-run until ACTIVE) ==="
openstack server show capi-mgmt-v2 -f value -c status -c addresses
echo "=== floating ip on provider-ext, associate to the VM ==="
FIP=$(openstack floating ip create "$EXT" -f value -c floating_ip_address)
echo "allocated FIP=$FIP # expect this to be 10.12.7.40 on a clean run -- ENV(mgmt-fip)"
openstack server add floating ip capi-mgmt-v2 "$FIP"
openstack server show capi-mgmt-v2 -f value -c addresses
} )
Note: the tenant IP lands on 10.20.0.45 and the FIP on 10.12.7.40 on the as-built run. If the FIP differs on rebuild, carry the new value into 6.4 (extra-sans) and 6.5 (kubeconfig server) and phase-07 (conductor kubeconfig).
# RUN: mgmt VM This is the premise of D-035. PROCEED ONLY IF VIP-OK.
ssh -i ~/.ssh/id_ed25519 -o BatchMode=yes -o StrictHostKeyChecking=no \
-o UserKnownHostsFile=/dev/null -o ConnectTimeout=10 ubuntu@10.12.7.40 bash -s <<'REOF'
set -u
echo "=== VM -> Keystone VIP 10.12.4.50:5000 ===" # ENV(keystone-vip)
timeout 6 bash -c 'exec 3<>/dev/tcp/10.12.4.50/5000' && echo VIP-OK || echo VIP-FAIL
echo "=== VM -> internet 1.1.1.1:443 (image pulls) ==="
timeout 6 bash -c 'exec 3<>/dev/tcp/1.1.1.1/443' && echo NET-OK || echo NET-FAIL
REOF
GATE: require VIP-OK. NET-FAIL means sort provider-ext internet egress (or a registry mirror) before 6.6. Do NOT build k8s on a VM that fails VIP-OK. (appendix-A: D-035 -- single-NIC removes the dual-homed reverse-path bug.)
# RUN: mgmt VM Channel is 1.32-classic/stable (NOT 1.32/stable -- that is the charm-era track and does not exist for the snap). The bootstrap config MUST carry an explicit cluster-config block (appendix-A: DOCFIX-024 -- a config without it disables network+dns and the node never goes Ready). Every sudo gets </dev/null (appendix-A: DOCFIX-021 -- remote bash -s reads the script from stdin).
ssh -i ~/.ssh/id_ed25519 -o BatchMode=yes -o StrictHostKeyChecking=no \
-o UserKnownHostsFile=/dev/null -o ConnectTimeout=10 ubuntu@10.12.7.40 bash -s <<'REOF'
set -euo pipefail
echo "=== install k8s snap 1.32-classic/stable ==="
sudo snap install k8s --classic --channel=1.32-classic/stable </dev/null
echo "=== write bootstrap config (DOCFIX-024: cluster-config block REQUIRED) ==="
sudo tee /root/bootstrap-config.yaml >/dev/null <<'CFG'
cluster-config:
network:
enabled: true
dns:
enabled: true
pod-cidr: 10.1.0.0/16
service-cidr: 10.152.183.0/24
extra-sans:
- 10.12.7.40
- 10.20.0.45
CFG
sudo cat /root/bootstrap-config.yaml
echo "=== bootstrap (timeout 10m) ==="
sudo k8s bootstrap --name capi-mgmt-v2 --file /root/bootstrap-config.yaml --timeout 10m </dev/null
echo "=== status ==="
sudo k8s status --wait-ready --timeout 5m </dev/null
REOF
Expect: k8s status reports cluster ready, network+dns enabled, one node. Retry path: sudo snap remove k8s --purge </dev/null then re-run this block.
The agnhost pod-egress probe is the exact test the dual-homed D-033 node and the old k3s node FAILED. On this single-NIC VM it must Completed.
# RUN: mgmt VM -- emit a jumphost-facing kubeconfig (server = the FIP, not tenant IP)
ssh -i ~/.ssh/id_ed25519 -o BatchMode=yes -o StrictHostKeyChecking=no \
-o UserKnownHostsFile=/dev/null -o ConnectTimeout=10 ubuntu@10.12.7.40 \
"sudo k8s config server=https://10.12.7.40:6443 </dev/null" > ~/capi-mgmt.kubeconfig
# [SENSITIVE] ~/capi-mgmt.kubeconfig contains a cluster-admin credential.
wc -l ~/capi-mgmt.kubeconfig ; head -1 ~/capi-mgmt.kubeconfig # expect >0 lines, "apiVersion: v1"
# RUN: jumphost -- node check + the hard gate
( {
set -u
export KUBECONFIG="$HOME/capi-mgmt.kubeconfig"
echo "=== node ==="
kubectl get nodes -o wide # expect capi-mgmt-v2 Ready, v1.32.13
echo "=== agnhost pod-egress probe -> Keystone VIP 10.12.4.50:5000 ==="
kubectl run egress-test --image=registry.k8s.io/e2e-test-images/agnhost:2.40 \
--restart=Never --command -- /agnhost connect 10.12.4.50:5000 --timeout=5s
echo "(poll the next line until STATUS=Completed)"
kubectl get pod egress-test -o jsonpath='{.status.phase} {.status.containerStatuses[0].state}{"\n"}'
} )
GATE: require the probe pod Completed / exitCode 0 (empty logs = clean TCP connect). That proves pod -> Cilium -> ens3 -> OVN -> router SNAT egress works. Then clean up the throwaway pod:
# RUN: jumphost KUBECONFIG="$HOME/capi-mgmt.kubeconfig" kubectl delete pod egress-test --now
# RUN: mgmt VM Run VM-side as root with KUBECONFIG=/root/kubeconfig (local apiserver 10.20.0.45:6443) so the matched 1.32.13 kubectl is used -- avoids the jumphost kubectl's +3-minor skew. Versions are READ from the tag's dependencies.json, never hardcoded (D-034). The as-built pins are in the reference block below as a known-good cross-check only.
HARDENED ORDER (appendix-A: D-034 install-ordering): cert-manager -> ORC -> clusterctl init -> CAAPH -> janitor. ORC precedes clusterctl init because CAPO v0.14.4's openstackserver controller hard-depends on ORC's Image.openstack.k-orc.cloud CRD; installing CAPO first crash-loops until ORC lands. (The 2026-06-08 run used ORC last and self-healed after 6 restarts -- the runbook corrects the order.)
# RUN: jumphost Installs the CAPI tooling on the mgmt VM at the dependencies.json pins and writes ~/capi-pins.env (sourced by 6.6b-6.6f). kubectl is pinned to the cluster's 1.32.13 (no apiserver skew). The SSH_OPTS/MGMT_VM vars set here are reused by 6.6b-6.6f (same jumphost shell).
# define the mgmt-VM connection once (reused by 6.6b-6.6f)
MGMT_VM=10.12.7.40
SSH_OPTS="-i $HOME/.ssh/id_ed25519 -o BatchMode=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=10"
ssh $SSH_OPTS ubuntu@"$MGMT_VM" bash -s <<'REOF'
set -euo pipefail
sudo apt-get update -qq </dev/null && sudo apt-get install -y jq curl </dev/null
# kubeconfig for the local apiserver (10.20.0.45:6443), readable by ubuntu -> helm/clusterctl/kubectl need no sudo
mkdir -p "$HOME/.kube"; sudo k8s config </dev/null > "$HOME/.kube/config"; chmod 600 "$HOME/.kube/config"
# egress pre-check (the VM pulls charts/binaries/manifests from these)
for h in https://raw.githubusercontent.com https://get.helm.sh https://github.com https://dl.k8s.io; do
printf '%s -> ' "$h"; curl -s -o /dev/null -w '%{http_code}\n' "$h" || echo FAIL
done
# version constellation from the chart tag's dependencies.json (D-034; never hardcoded)
curl -fsSL https://raw.githubusercontent.com/azimuth-cloud/capi-helm-charts/0.25.1/dependencies.json -o "$HOME/deps.json"
CAPI=$(jq -r '."cluster-api"' "$HOME/deps.json")
CAPO=$(jq -r '."cluster-api-provider-openstack"' "$HOME/deps.json")
CERT=$(jq -r '."cert-manager"' "$HOME/deps.json")
ORC=$(jq -r '."openstack-resource-controller"' "$HOME/deps.json")
CAAPH=$(jq -r '."addon-provider"' "$HOME/deps.json")
JANITOR=$(jq -r '."cluster-api-janitor-openstack"' "$HOME/deps.json")
HELM=$(jq -r '.helm' "$HOME/deps.json")
{ echo "CAPI=$CAPI"; echo "CAPO=$CAPO"; echo "CERT=$CERT"; echo "ORC=$ORC"; \
echo "CAAPH=$CAAPH"; echo "JANITOR=$JANITOR"; echo "HELM=$HELM"; } > "$HOME/capi-pins.env"
echo "== pins (cross-check: CAPI v1.13.2 CAPO v0.14.4 CERT v1.20.2 ORC v2.5.0 CAAPH 0.12.0 JANITOR 0.11.0 HELM v3.17.3) =="
cat "$HOME/capi-pins.env"
# install helm (pinned), clusterctl (= CAPI pin), kubectl (= cluster 1.32.13)
curl -fsSL "https://get.helm.sh/helm-${HELM}-linux-amd64.tar.gz" -o /tmp/helm.tgz
sudo tar -xzf /tmp/helm.tgz -C /usr/local/bin --strip-components=1 linux-amd64/helm </dev/null
curl -fsSL "https://github.com/kubernetes-sigs/cluster-api/releases/download/${CAPI}/clusterctl-linux-amd64" -o /tmp/clusterctl
sudo install -m 0755 /tmp/clusterctl /usr/local/bin/clusterctl </dev/null
curl -fsSL "https://dl.k8s.io/release/v1.32.13/bin/linux/amd64/kubectl" -o /tmp/kubectl
sudo install -m 0755 /tmp/kubectl /usr/local/bin/kubectl </dev/null
echo "== tooling =="; helm version --short; clusterctl version; kubectl version --client 2>/dev/null | head -1
REOF
# RUN: jumphost
ssh $SSH_OPTS ubuntu@"$MGMT_VM" bash -s <<'REOF' set -euo pipefail source "$HOME/capi-pins.env" helm repo add jetstack https://charts.jetstack.io helm repo update helm upgrade --install cert-manager jetstack/cert-manager \ --namespace cert-manager --create-namespace \ --version "$CERT" --set crds.enabled=true --wait --timeout 5m kubectl -n cert-manager wait --for=condition=Available deploy --all --timeout=180s kubectl -n cert-manager get pods REOF
# RUN: jumphost server-side apply (large CRDs). Manifest is the k-orc release install.yaml (D-034).
ssh $SSH_OPTS ubuntu@"$MGMT_VM" bash -s <<'REOF'
set -euo pipefail
source "$HOME/capi-pins.env"
kubectl apply --server-side -f \
"https://github.com/k-orc/openstack-resource-controller/releases/download/${ORC}/install.yaml"
kubectl -n orc-system wait --for=condition=Available deploy --all --timeout=180s
kubectl get crd images.openstack.k-orc.cloud
REOF
# RUN: jumphost cert-manager already present -> clusterctl detects and skips it.
ssh $SSH_OPTS ubuntu@"$MGMT_VM" bash -s <<'REOF'
set -euo pipefail
source "$HOME/capi-pins.env"
clusterctl init \
--core "cluster-api:${CAPI}" \
--bootstrap "kubeadm:${CAPI}" \
--control-plane "kubeadm:${CAPI}" \
--infrastructure "openstack:${CAPO}"
for ns in capi-system capi-kubeadm-bootstrap-system capi-kubeadm-control-plane-system capo-system; do
echo "== $ns =="; kubectl -n "$ns" wait --for=condition=Available deploy --all --timeout=240s
done
REOF
# RUN: jumphost
ssh $SSH_OPTS ubuntu@"$MGMT_VM" bash -s <<'REOF' set -euo pipefail source "$HOME/capi-pins.env" helm repo add capi-addon https://azimuth-cloud.github.io/cluster-api-addon-provider helm repo add capi-janitor https://azimuth-cloud.github.io/cluster-api-janitor-openstack helm repo update helm upgrade --install cluster-api-addon-provider capi-addon/cluster-api-addon-provider \ --namespace capi-addon-system --create-namespace --version "$CAAPH" --wait --timeout 5m helm upgrade --install cluster-api-janitor-openstack capi-janitor/cluster-api-janitor-openstack \ --namespace capi-janitor-system --create-namespace --version "$JANITOR" --wait --timeout 5m kubectl -n capi-addon-system get pods kubectl -n capi-janitor-system get pods REOF
# RUN: jumphost
ssh $SSH_OPTS ubuntu@"$MGMT_VM" bash -s <<'REOF' set -euo pipefail clusterctl version echo "== all controllers Running ==" kubectl get pods -A | egrep 'capi-|capo-|cert-manager|orc-system|janitor|addon' || true echo "== key CRDs present ==" kubectl get crd clusters.cluster.x-k8s.io \ openstackclusters.infrastructure.cluster.x-k8s.io \ kubeadmcontrolplanes.controlplane.cluster.x-k8s.io \ images.openstack.k-orc.cloud REOF
Completed both passed.capi-mgmt-v2 Ready (v1.32.13); ~/capi-mgmt.kubeconfig (server = FIP) works from the jumphost.Image CRD present; no crash-looping CAPO.capi-mgmt-v2: gp.large, ubuntu-24.04-noble; tenant IP 10.20.0.45 (ens3); FIP 10.12.7.40.capi-mgmt-net / subnet capi-mgmt-subnet 10.20.0.0/24; router capi-mgmt-router.phase-07 -- conductor graft: place ~/capi-mgmt.kubeconfig at /etc/magnum/kubeconfig on magnum/0 and stage the [capi_helm] conf.d drop-in (D-037), pointing the conductor at the FIP.