Phase 06 -- In-Cloud Management Cluster (D-035)

Stand up the CAPI/Magnum management cluster as a single-homed in-cloud tenant VM (capi-mgmt-v2), bootstrap k8s-snap on it, prove pod egress through the hard gate, and install the pinned CAPI provider stack. This is the persistent v1 management cluster -- there is NO clusterctl move/pivot.

Decisions: D-035 (in-cloud single-homed tenant VM; retires D-033/D-017), D-034 (CAPI versions sourced from the capi-helm-charts tag's dependencies.json, never hardcoded), D-031 (Magnum + magnum-capi-helm + capi-helm-charts engine). Troubleshooting: appendix-A entries DOCFIX-021, DOCFIX-024, DOCFIX-025a, D-035.

Prerequisites (must be true entering phase-06)

Charmed OpenStack live and verified (phase-03 done); Keystone reachable on the provider VIP.
The external provider network exists (phase-04 done) -- the mgmt FIP in Step 6.2 is allocated from it. Octavia is NOT required for the mgmt cluster itself (its apiserver is reached via the FIP directly); Octavia is a phase-08 prereq for workload clusters.
admin-openrc sourced on the jumphost; openstack, jq, kubectl available.
The capi-mgmt Keystone project, the flavors, and the ubuntu-24.04-noble image exist -- on a FRESH deploy NONE of these survive teardown; Step 6.0-BOOT below verifies-or-creates all of them (run it first). The Magnum trustee domain is auto-configured by the magnum charm via its keystone (identity-credentials) relation -- verify [trust] (trustee_domain_id / trustee_domain_admin_id / trustee_domain_admin_password) is populated in magnum.conf; no manual step.
No capi-mgmt-net tenant network yet (this phase creates it).

Constants and env-literals (TAG: regenerate/confirm per site on rebuild)

Literals below are tagged ENV(...) so the later generalization pass is mechanical. Discover everything else dynamically at run time.

ENV(project) capi-mgmt (resolve by name; this rebuild id d5bc125c7c1841d389b76cd0a7b0a915, domain capi)
ENV(ext-net) provider-ext (resolve by name; this rebuild id 0d00ddc1-d2bf-4849-a087-14c07d77f167)
ENV(image) ubuntu-24.04-noble (resolve by name; this rebuild id 899b4b5c-d8f6-4df4-860b-a9210d0eefe8)
ENV(flavor) gp.large (16384 MB / 4 vCPU / 80 GB)
ENV(mgmt-cidr) 10.20.0.0/24 (capi-mgmt-subnet; overlay, non-IPAM)
ENV(keystone-vip) 10.12.4.50:5000 (the gate target -- the deployed VIP)
ENV(mgmt-fip) assigned in 6.2 (apiserver SAN; resolve dynamically. This rebuild capi-mgmt-v2 = 10.12.5.103, tenant 10.20.0.107; the old 10.12.7.40 / 10.20.0.45 was the pre-teardown mgmt VM -- DOCFIX-038)
ENV(pod-cidr) 10.1.0.0/16 ENV(svc-cidr) 10.152.183.0/24 (snap defaults; non-colliding)
ENV(capi-tag) 0.25.1 (capi-helm-charts release; dependencies.json source)

Run-location legend (every block states where it runs)

# RUN: jumphost -- on vopenstack-jesse as jessea123, admin-openrc sourced.
# RUN: mgmt VM -- shipped to the VM over SSH via the FIP (heredoc below).
VM SSH form (used verbatim throughout; DOCFIX-021 </dev/null on every sudo): ssh -i ~/.ssh/id_ed25519 -o BatchMode=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=10 ubuntu@10.12.5.103 bash -s <<'REOF' ... REOF (10.12.5.103 = this rebuild's capi-mgmt-v2 FIP; resolve dynamically -- the old 10.12.7.40 is dead.)

Step 6.0-BOOT -- Fresh-deploy tenant bootstrap (project, role, flavors, mgmt image)

# RUN: jumphost REQUIRED on a fresh deploy: post-teardown the cloud has no tenant projects, NO flavors, and NO images -- this is the substance of the retired do-doc-06 tenant setup, restored after the phase-NN consolidation dropped it (found in the 2026-06-10 pre-redeploy review). Everything is verify-or-create, so it is safe (all [SKIP]) on an existing cloud.

Flavor specs are as-built ground truth (2026-06-08 verified-state checkpoint): gp.large 16384/4/80 (mgmt VM, 6.2), gp.mid 8192/2/40 (workload masters, 8.0 template), capi.node 4096/2/40 (workload workers, 8.0 template); gp.small and m1.lbtest are as-built parity. The 40/80 GB root disks schedule because the bundle sets nova-compute libvirt-image-backend: rbd (B3) -- DISK_GB comes from the Ceph pool, not the ~9 GB local ephemeral ceiling.

The noble image is seeded by STAGE-AND-VERIFY (canonical per FINDING-3; supersedes the 2026-06-16 web-download ruling and the standalone-glance glance-direct line): download + sha256-vs-published-SHA256SUMS + openstack image create --file --import (client-safe -- the openstack snap's --import is the glance-direct equivalent; the standalone glance client is NOT assumed present). With the hardened bundle's glance image-conversion: true, --import lands the stored disk_format raw (D-021 Ceph fast-clone alignment). Web-download is retained as a tested alternative (appendix-A); for ubuntu cloud-images it works on the hardened bundle (the 2026-06-08 403 was transient/pre-hardening), but it cannot checksum-verify the fetched file -- stage-and-verify is preferred for provenance and unifies with the phase-08 kube seed.

AS-BUILT FACTS (verified live 2026-06-10 pre-teardown; supersede the rebuild handoff, which wrongly placed capi-mgmt in admin_domain): project capi-mgmt lives in domain capi ("CAPI/Magnum workload identity"); the noble image is public with os_distro/os_version properties; admin@admin_domain holds member + load-balancer_member + reader (NOT admin) on the project -- DOCFIX-036 / D-039: magnum mints the per-cluster app-cred from the TRUSTOR's roles, so the trustor must hold load-balancer_member or CAPO's cred 403s on Octavia and the workload cluster wedges at API-LB provisioning. NOTE -- the old static CAPO identity (user capo, its app-cred, capo-clouds.yaml) is a FOSSIL of the retired D-033 out-of-cloud path and is deliberately NOT recreated: the current architecture needs no static cloud credential (clusterctl init takes none; per-cluster creds are magnum-minted at create time per D-039).

( {
  set -u
  source ~/admin-openrc
  echo "=== domain capi (verify-or-create; as-built: 'CAPI/Magnum workload identity', NOT Juju-created) ==="
  PROJ_DOMAIN="capi"                                   # as-built, verified live 2026-06-10
  openstack domain show "$PROJ_DOMAIN" >/dev/null 2>&1 \
    && echo "[SKIP] domain $PROJ_DOMAIN exists" \
    || { openstack domain create --description "CAPI/Magnum workload identity" "$PROJ_DOMAIN" >/dev/null \
         && echo "[OK] domain $PROJ_DOMAIN"; }

  echo "=== project capi-mgmt in domain $PROJ_DOMAIN (verify-or-create) ==="
  openstack project show capi-mgmt --domain "$PROJ_DOMAIN" >/dev/null 2>&1 \
    && echo "[SKIP] project capi-mgmt exists" \
    || { openstack project create --domain "$PROJ_DOMAIN" \
           --description "CAPI management project" capi-mgmt >/dev/null \
         && echo "[OK] project capi-mgmt (domain $PROJ_DOMAIN)"; }

  echo "=== roles: $OS_USERNAME gets member + load-balancer_member + reader on capi-mgmt (DOCFIX-036 / D-039) ==="
  # D-039 ROOT CAUSE: magnum mints the per-cluster app-cred carrying the TRUSTOR's roles,
  # FROZEN at mint, and delegates ALL trustor roles unfiltered. If admin@admin_domain holds
  # only `member` here, CAPO's app-cred 403s on Octavia (needs load-balancer_member) and the
  # workload cluster wedges at API-LB provisioning. Grant all three so future mints carry LB
  # authority. (load-balancer_member + reader are keystone/Octavia default roles.)
  for ROLE in member load-balancer_member reader; do
    if openstack role assignment list --user "$OS_USERNAME" --user-domain "$OS_USER_DOMAIN_NAME" \
         --project capi-mgmt --project-domain "$PROJ_DOMAIN" --role "$ROLE" -f value 2>/dev/null | grep -q .; then
      echo "[SKIP] $ROLE already on capi-mgmt"
    else
      openstack role add --user "$OS_USERNAME" --user-domain "$OS_USER_DOMAIN_NAME" \
        --project capi-mgmt --project-domain "$PROJ_DOMAIN" "$ROLE" \
        && echo "[OK] $ROLE on capi-mgmt"
    fi
  done

  echo "=== flavors (as-built specs; public -- verified live 2026-06-10 pre-teardown) ==="
  for spec in "gp.large 4 16384 80" "gp.mid 2 8192 40" "capi.node 2 4096 40" \
              "gp.small 1 4096 20" "m1.lbtest 1 1024 4"; do
    set -- $spec
    openstack flavor show "$1" >/dev/null 2>&1 \
      && echo "[SKIP] flavor $1 exists" \
      || { openstack flavor create --vcpus "$2" --ram "$3" --disk "$4" --public "$1" >/dev/null \
           && echo "[OK] $1 ($2 vcpu / $3 MB / $4 GB)"; }
  done

  echo "=== mgmt VM image ubuntu-24.04-noble (verify-or-seed; STAGE-AND-VERIFY canonical; HOME-staged, L7) ==="
  if openstack image show ubuntu-24.04-noble >/dev/null 2>&1; then
    echo "[SKIP] image ubuntu-24.04-noble exists"
  else
    # Stage-and-verify (FINDING-3): download to $HOME (snap-readable; NOT /tmp -- L7) if missing/
    # checksum-stale, verify sha256 vs the published SHA256SUMS, then client-safe import via the
    # openstack snap (--import == glance-direct; image-conversion lands it raw). NOT the standalone
    # `glance` client (unconfirmed on this jumphost).
    IMG_URL="https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img"
    SUM_URL="https://cloud-images.ubuntu.com/noble/current/SHA256SUMS"
    IMG_FILE="noble-server-cloudimg-amd64.img"; SRC="$HOME/$IMG_FILE"
    EXP=$(curl -fsSL "$SUM_URL" | awk -v f="$IMG_FILE" '$2=="*"f || $2==f {print $1}')
    [ -n "$EXP" ] || { echo "GATE FAIL: no published checksum for $IMG_FILE"; exit 1; }
    if [ -f "$SRC" ] && [ "$(sha256sum "$SRC" | awk '{print $1}')" = "$EXP" ]; then
      echo "[OK] staged noble present + checksum-valid; skipping download"
    else
      echo "[..] downloading noble to $SRC (snap-readable; NOT /tmp)"
      wget -q -O "$SRC" "$IMG_URL"
      GOT=$(sha256sum "$SRC" | awk '{print $1}')
      [ "$EXP" = "$GOT" ] || { echo "GATE FAIL: checksum mismatch exp='$EXP' got='$GOT'"; exit 1; }
      echo "[OK] checksum verified ($GOT)"
    fi
    openstack image create ubuntu-24.04-noble \
      --file "$SRC" --import \
      --container-format bare --disk-format qcow2 --public \
      --property os_distro=ubuntu --property os_version=24.04
  fi
  # as-built (verified live 2026-06-10): visibility=public, os_distro=ubuntu, os_version=24.04,
  # stored raw in Ceph via the bundle's glance image-conversion=true.
  echo "=== poll to active (import + conversion) ==="
  for i in $(seq 1 40); do
    ST=$(openstack image show ubuntu-24.04-noble -f value -c status 2>/dev/null || echo '?')
    echo "[$i] status=$ST"
    [ "$ST" = active ] && break
    sleep 15
  done
} )

GATE: project + role + all five flavors present; ubuntu-24.04-noble active (disk_format raw expected with image-conversion on). Do not proceed to 6.0 until this passes.

Step 6.0 -- Keypair + security group (capi-mgmt project)

# RUN: jumphost Safe/idempotent setup -- consolidated. (LIVE-REVIEW: exact SG rule syntax is standard openstack-client; confirm on the redeploy test.)

( {
  set -u
  PROJ=capi-mgmt                                   # ENV(project)
  echo "=== keypair (import the jumphost pubkey) ==="
  openstack keypair show capi-mgmt-key >/dev/null 2>&1 \
    || openstack keypair create --public-key ~/.ssh/id_ed25519.pub capi-mgmt-key
  echo "=== security group capi-mgmt-sg (ingress 22 + 6443; egress default-allow) ==="
  openstack security group show capi-mgmt-sg >/dev/null 2>&1 \
    || openstack security group create --project "$PROJ" capi-mgmt-sg
  SG=$(openstack security group show capi-mgmt-sg -f value -c id)
  # add rules only if absent (re-run safe)
  openstack security group rule list "$SG" -f value -c "Port Range" | grep -q '^22:22' \
    || openstack security group rule create --proto tcp --dst-port 22   "$SG"
  openstack security group rule list "$SG" -f value -c "Port Range" | grep -q '^6443:6443' \
    || openstack security group rule create --proto tcp --dst-port 6443 "$SG"
  echo "=== verify ==="
  openstack security group rule list "$SG" -f value -c Protocol -c "Port Range"
} )

Expect: capi-mgmt-key present; capi-mgmt-sg with tcp/22 and tcp/6443 ingress.

Step 6.1 -- Network, subnet, router (capi-mgmt project)

# RUN: jumphost Idempotent network plumbing -- consolidated. DNS nameservers 1.1.1.1/1.0.0.1 (D-019: public resolvers; image pulls need internet egress).

( {
  set -u
  PROJ=capi-mgmt                                   # ENV(project)
  EXT=provider-ext                                 # ENV(ext-net)
  echo "=== network capi-mgmt-net ==="
  openstack network show capi-mgmt-net >/dev/null 2>&1 \
    || openstack network create --project "$PROJ" capi-mgmt-net
  echo "=== subnet capi-mgmt-subnet 10.20.0.0/24 ==="   # ENV(mgmt-cidr)
  openstack subnet show capi-mgmt-subnet >/dev/null 2>&1 \
    || openstack subnet create --project "$PROJ" --network capi-mgmt-net \
         --subnet-range 10.20.0.0/24 \
         --dns-nameserver 1.1.1.1 --dns-nameserver 1.0.0.1 capi-mgmt-subnet
  echo "=== router capi-mgmt-router + ext-gw + subnet ==="
  openstack router show capi-mgmt-router >/dev/null 2>&1 \
    || openstack router create --project "$PROJ" capi-mgmt-router
  openstack router set --external-gateway "$EXT" capi-mgmt-router
  openstack router add subnet capi-mgmt-router capi-mgmt-subnet 2>/dev/null || true
  echo "=== verify ==="
  openstack router show capi-mgmt-router -f value -c external_gateway_info -c status
} )

Expect: subnet 10.20.0.0/24; router ACTIVE with an external gateway on provider-ext.

Step 6.2 -- VM + floating IP (MUTATION; not batched with the gate)

# RUN: jumphost Creates the VM and pins the management FIP. The FIP is the stable apiserver endpoint for the jumphost AND the Magnum conductor.

( {
  set -u
  PROJ=capi-mgmt                                   # ENV(project)
  EXT=provider-ext                                 # ENV(ext-net)
  echo "=== create capi-mgmt-v2 (gp.large / ubuntu-24.04-noble) ==="
  openstack server show capi-mgmt-v2 >/dev/null 2>&1 \
    || openstack server create --image ubuntu-24.04-noble --flavor gp.large \
         --network capi-mgmt-net --security-group capi-mgmt-sg \
         --key-name capi-mgmt-key capi-mgmt-v2
  echo "=== wait ACTIVE (re-run until ACTIVE) ==="
  openstack server show capi-mgmt-v2 -f value -c status -c addresses
  echo "=== floating ip on provider-ext, associate to the VM ==="
  FIP=$(openstack floating ip create "$EXT" -f value -c floating_ip_address)
  openstack server add floating ip capi-mgmt-v2 "$FIP"
  # tenant (fixed) IP = the server address that is NOT the FIP (single-NIC VM has exactly the two)
  TENANT_IP=$(openstack server show capi-mgmt-v2 -f json \
    | FIP="$FIP" python3 -c "import os,json,sys; a=json.load(sys.stdin).get('addresses',{}) or {}; ips=[ip for net in a.values() for ip in net]; print(next((ip for ip in ips if ip!=os.environ['FIP']), ''))")
  [ -n "$TENANT_IP" ] || { echo "ABORT: could not resolve tenant IP"; exit 1; }
  # PERSIST both (single source for 6.3-6.6 -- PATTERN-1; the FIP is pool-allocated + the tenant
  # IP DHCP-assigned, so NEITHER is deterministic per rebuild -- never hardcode them)
  printf 'MGMT_FIP=%s\nMGMT_TENANT_IP=%s\n' "$FIP" "$TENANT_IP" | tee ~/capi-mgmt-net.env
  openstack server show capi-mgmt-v2 -f value -c addresses
} )

Note (DOCFIX-038): the FIP is pool-allocated and the tenant IP is DHCP-assigned -- NEITHER is deterministic (this rebuild: FIP 10.12.5.103, tenant 10.20.0.107; the pre-teardown VM was 10.12.7.40 / 10.20.0.45). Step 6.2 persists both to ~/capi-mgmt-net.env; 6.3-6.6a source it, and phase-07 (conductor kubeconfig) uses the same FIP. Do not hardcode either value.

Step 6.3 -- GATE 1: OS-level egress (before any k8s investment)

# RUN: mgmt VM This is the premise of D-035. PROCEED ONLY IF VIP-OK.

source ~/capi-mgmt-net.env   # MGMT_FIP, MGMT_TENANT_IP (written by 6.2)
ssh -i ~/.ssh/id_ed25519 -o BatchMode=yes -o StrictHostKeyChecking=no \
    -o UserKnownHostsFile=/dev/null -o ConnectTimeout=10 ubuntu@"$MGMT_FIP" bash -s <<'REOF'
set -u
echo "=== VM -> Keystone VIP 10.12.4.50:5000 ==="            # ENV(keystone-vip)
timeout 6 bash -c 'exec 3<>/dev/tcp/10.12.4.50/5000' && echo VIP-OK || echo VIP-FAIL
echo "=== VM -> internet 1.1.1.1:443 (image pulls) ==="
timeout 6 bash -c 'exec 3<>/dev/tcp/1.1.1.1/443' && echo NET-OK || echo NET-FAIL
REOF

GATE: require VIP-OK. NET-FAIL means sort provider-ext internet egress (or a registry mirror) before 6.6. Do NOT build k8s on a VM that fails VIP-OK. (appendix-A: D-035 -- single-NIC removes the dual-homed reverse-path bug.)

Step 6.4 -- k8s-snap install + bootstrap (MUTATION; secret-free)

# RUN: mgmt VM Channel is 1.32-classic/stable (NOT 1.32/stable -- that is the charm-era track and does not exist for the snap). The bootstrap config MUST carry an explicit cluster-config block (appendix-A: DOCFIX-024 -- a config without it disables network+dns and the node never goes Ready). Every sudo gets </dev/null (appendix-A: DOCFIX-021 -- remote bash -s reads the script from stdin).

source ~/capi-mgmt-net.env   # MGMT_FIP, MGMT_TENANT_IP (written by 6.2)
ssh -i ~/.ssh/id_ed25519 -o BatchMode=yes -o StrictHostKeyChecking=no \
    -o UserKnownHostsFile=/dev/null -o ConnectTimeout=10 ubuntu@"$MGMT_FIP" \
    bash -s "$MGMT_FIP" "$MGMT_TENANT_IP" <<'REOF'
set -euo pipefail
MGMT_FIP="$1"; MGMT_TENANT_IP="$2"    # passed from the jumphost (extra-sans must be the real FIP + tenant IP)

echo "=== install k8s snap 1.32-classic/stable ==="
sudo snap install k8s --classic --channel=1.32-classic/stable </dev/null

echo "=== write bootstrap config (DOCFIX-024: cluster-config block REQUIRED) ==="
sudo tee /root/bootstrap-config.yaml >/dev/null <<CFG
cluster-config:
  network:
    enabled: true
  dns:
    enabled: true
pod-cidr: 10.1.0.0/16
service-cidr: 10.152.183.0/24
extra-sans:
- $MGMT_FIP
- $MGMT_TENANT_IP
CFG
sudo cat /root/bootstrap-config.yaml

echo "=== bootstrap (timeout 10m) ==="
sudo k8s bootstrap --name capi-mgmt-v2 --file /root/bootstrap-config.yaml --timeout 10m </dev/null

echo "=== status ==="
sudo k8s status --wait-ready --timeout 5m </dev/null
REOF

Expect: k8s status reports cluster ready, network+dns enabled, one node. Retry path: sudo snap remove k8s --purge </dev/null then re-run this block.

Step 6.5 -- GATE 2: kubeconfig to jumphost + pod-egress proof (THE D-035 GATE)

The agnhost pod-egress probe is the exact test the dual-homed D-033 node and the old k3s node FAILED. On this single-NIC VM it must Completed.

# RUN: jumphost (ssh to the mgmt VM; the kubeconfig lands on the jumphost). server = the FIP, not tenant IP
source ~/capi-mgmt-net.env   # MGMT_FIP
ssh -i ~/.ssh/id_ed25519 -o BatchMode=yes -o StrictHostKeyChecking=no \
    -o UserKnownHostsFile=/dev/null -o ConnectTimeout=10 ubuntu@"$MGMT_FIP" \
    "sudo k8s config server=https://$MGMT_FIP:6443 </dev/null" > ~/capi-mgmt.kubeconfig
# [SENSITIVE] ~/capi-mgmt.kubeconfig contains a cluster-admin credential.
wc -l ~/capi-mgmt.kubeconfig ; head -1 ~/capi-mgmt.kubeconfig   # expect >0 lines, "apiVersion: v1"

# RUN: jumphost -- node check + the hard gate
( {
  set -u
  export KUBECONFIG="$HOME/capi-mgmt.kubeconfig"
  echo "=== node ==="
  kubectl get nodes -o wide                          # expect capi-mgmt-v2 Ready, v1.32.13
  echo "=== agnhost pod-egress probe -> Keystone VIP 10.12.4.50:5000 ==="
  kubectl run egress-test --image=registry.k8s.io/e2e-test-images/agnhost:2.40 \
    --restart=Never --command -- /agnhost connect 10.12.4.50:5000 --timeout=5s
  echo "(poll the next line until STATUS=Completed)"
  kubectl get pod egress-test -o jsonpath='{.status.phase} {.status.containerStatuses[0].state}{"\n"}'
} )

GATE: require the probe pod Completed / exitCode 0 (empty logs = clean TCP connect). That proves pod -> Cilium -> ens3 -> OVN -> router SNAT egress works. Then clean up the throwaway pod:

# RUN: jumphost
KUBECONFIG="$HOME/capi-mgmt.kubeconfig" kubectl delete pod egress-test --now

Step 6.6 -- CAPI provider stack (pinned to dependencies.json; D-034)

# RUN: mgmt VM Run VM-side as root with KUBECONFIG=/root/kubeconfig (local apiserver = the VM's tenant IP:6443) so the matched 1.32.13 kubectl is used -- avoids the jumphost kubectl's +3-minor skew. Versions are READ from the tag's dependencies.json, never hardcoded (D-034). The as-built pins are in the reference block below as a known-good cross-check only.

HARDENED ORDER (appendix-A: D-034 install-ordering): cert-manager -> ORC -> clusterctl init -> CAAPH -> janitor. ORC precedes clusterctl init because CAPO v0.14.4's openstackserver controller hard-depends on ORC's Image.openstack.k-orc.cloud CRD; installing CAPO first crash-loops until ORC lands. (The 2026-06-08 run used ORC last and self-healed after 6 restarts -- the runbook corrects the order.)

6.6a -- tooling + pins (install helm/clusterctl/kubectl VM-side; read dependencies.json @ 0.25.1)

# RUN: jumphost Installs the CAPI tooling on the mgmt VM at the dependencies.json pins and writes ~/capi-pins.env (sourced by 6.6b-6.6f). kubectl is pinned to the cluster's 1.32.13 (no apiserver skew). The SSH_OPTS/MGMT_VM vars set here are reused by 6.6b-6.6f (same jumphost shell).

# define the mgmt-VM connection once (reused by 6.6b-6.6f)
source ~/capi-mgmt-net.env        # MGMT_FIP, MGMT_TENANT_IP (written by 6.2)
MGMT_VM="$MGMT_FIP"
SSH_OPTS="-i $HOME/.ssh/id_ed25519 -o BatchMode=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=10"

ssh $SSH_OPTS ubuntu@"$MGMT_VM" bash -s <<'REOF'
set -euo pipefail
sudo apt-get update -qq </dev/null && sudo apt-get install -y jq curl </dev/null

# kubeconfig for the local apiserver (the VM's own tenant IP:6443), readable by ubuntu -> helm/clusterctl/kubectl need no sudo
mkdir -p "$HOME/.kube"; sudo k8s config </dev/null > "$HOME/.kube/config"; chmod 600 "$HOME/.kube/config"

# egress pre-check (the VM pulls charts/binaries/manifests from these)
for h in https://raw.githubusercontent.com https://get.helm.sh https://github.com https://dl.k8s.io; do
  printf '%s -> ' "$h"; curl -s -o /dev/null -w '%{http_code}\n' "$h" || echo FAIL
done

# version constellation from the chart tag's dependencies.json (D-034; never hardcoded)
curl -fsSL https://raw.githubusercontent.com/azimuth-cloud/capi-helm-charts/0.25.1/dependencies.json -o "$HOME/deps.json"
CAPI=$(jq -r '."cluster-api"' "$HOME/deps.json")
CAPO=$(jq -r '."cluster-api-provider-openstack"' "$HOME/deps.json")
CERT=$(jq -r '."cert-manager"' "$HOME/deps.json")
ORC=$(jq -r '."openstack-resource-controller"' "$HOME/deps.json")
CAAPH=$(jq -r '."addon-provider"' "$HOME/deps.json")
JANITOR=$(jq -r '."cluster-api-janitor-openstack"' "$HOME/deps.json")
HELM=$(jq -r '.helm' "$HOME/deps.json")
{ echo "CAPI=$CAPI"; echo "CAPO=$CAPO"; echo "CERT=$CERT"; echo "ORC=$ORC"; \
  echo "CAAPH=$CAAPH"; echo "JANITOR=$JANITOR"; echo "HELM=$HELM"; } > "$HOME/capi-pins.env"
echo "== pins (cross-check: CAPI v1.13.2 CAPO v0.14.4 CERT v1.20.2 ORC v2.5.0 CAAPH 0.12.0 JANITOR 0.11.0 HELM v3.17.3) =="
cat "$HOME/capi-pins.env"

# install helm (pinned), clusterctl (= CAPI pin), kubectl (= cluster 1.32.13)
curl -fsSL "https://get.helm.sh/helm-${HELM}-linux-amd64.tar.gz" -o /tmp/helm.tgz
sudo tar -xzf /tmp/helm.tgz -C /usr/local/bin --strip-components=1 linux-amd64/helm </dev/null
curl -fsSL "https://github.com/kubernetes-sigs/cluster-api/releases/download/${CAPI}/clusterctl-linux-amd64" -o /tmp/clusterctl
sudo install -m 0755 /tmp/clusterctl /usr/local/bin/clusterctl </dev/null
curl -fsSL "https://dl.k8s.io/release/v1.32.13/bin/linux/amd64/kubectl" -o /tmp/kubectl
sudo install -m 0755 /tmp/kubectl /usr/local/bin/kubectl </dev/null

echo "== tooling =="; helm version --short; clusterctl version; kubectl version --client 2>/dev/null | head -1
REOF

6.6b -- cert-manager (DOCFIX-025a: crds.enabled=true, NOT installCRDs)

# RUN: jumphost

ssh $SSH_OPTS ubuntu@"$MGMT_VM" bash -s <<'REOF'
set -euo pipefail
source "$HOME/capi-pins.env"
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm upgrade --install cert-manager jetstack/cert-manager \
  --namespace cert-manager --create-namespace \
  --version "$CERT" --set crds.enabled=true --wait --timeout 5m
kubectl -n cert-manager wait --for=condition=Available deploy --all --timeout=180s
kubectl -n cert-manager get pods
REOF

6.6c -- ORC (BEFORE clusterctl init; CAPO hard-depends on the ORC Image CRD)

# RUN: jumphost server-side apply (large CRDs). Manifest is the k-orc release install.yaml (D-034).

ssh $SSH_OPTS ubuntu@"$MGMT_VM" bash -s <<'REOF'
set -euo pipefail
source "$HOME/capi-pins.env"
kubectl apply --server-side -f \
  "https://github.com/k-orc/openstack-resource-controller/releases/download/${ORC}/install.yaml"
kubectl -n orc-system wait --for=condition=Available deploy --all --timeout=180s
kubectl get crd images.openstack.k-orc.cloud
REOF

6.6d -- clusterctl init (core + kubeadm bootstrap/control-plane + CAPO)

# RUN: jumphost cert-manager already present -> clusterctl detects and skips it.

ssh $SSH_OPTS ubuntu@"$MGMT_VM" bash -s <<'REOF'
set -euo pipefail
source "$HOME/capi-pins.env"
clusterctl init \
  --core "cluster-api:${CAPI}" \
  --bootstrap "kubeadm:${CAPI}" \
  --control-plane "kubeadm:${CAPI}" \
  --infrastructure "openstack:${CAPO}"
for ns in capi-system capi-kubeadm-bootstrap-system capi-kubeadm-control-plane-system capo-system; do
  echo "== $ns =="; kubectl -n "$ns" wait --for=condition=Available deploy --all --timeout=240s
done
REOF

6.6e -- CAAPH + janitor (azimuth helm charts; chart names from each repo Chart.yaml)

# RUN: jumphost

ssh $SSH_OPTS ubuntu@"$MGMT_VM" bash -s <<'REOF'
set -euo pipefail
source "$HOME/capi-pins.env"
helm repo add capi-addon   https://azimuth-cloud.github.io/cluster-api-addon-provider
helm repo add capi-janitor https://azimuth-cloud.github.io/cluster-api-janitor-openstack
helm repo update
helm upgrade --install cluster-api-addon-provider capi-addon/cluster-api-addon-provider \
  --namespace capi-addon-system --create-namespace --version "$CAAPH" --wait --timeout 5m
helm upgrade --install cluster-api-janitor-openstack capi-janitor/cluster-api-janitor-openstack \
  --namespace capi-janitor-system --create-namespace --version "$JANITOR" --wait --timeout 5m
kubectl -n capi-addon-system   get pods
kubectl -n capi-janitor-system get pods
REOF

6.6f -- verify the stack

# RUN: jumphost

ssh $SSH_OPTS ubuntu@"$MGMT_VM" bash -s <<'REOF'
set -euo pipefail
clusterctl version
echo "== all controllers Running =="
kubectl get pods -A | egrep 'capi-|capo-|cert-manager|orc-system|janitor|addon' || true
echo "== key CRDs present =="
kubectl get crd clusters.cluster.x-k8s.io \
  openstackclusters.infrastructure.cluster.x-k8s.io \
  kubeadmcontrolplanes.controlplane.cluster.x-k8s.io \
  images.openstack.k-orc.cloud
REOF

EXIT GATE (phase-06 complete)

GATE 1 VIP-OK and GATE 2 agnhost Completed both passed.
capi-mgmt-v2 Ready (v1.32.13); ~/capi-mgmt.kubeconfig (server = FIP) works from the jumphost.
All CAPI controllers Running; ORC Image CRD present; no crash-looping CAPO.
Proceed to phase-07 (conductor graft).

As-built reference (2026-06-08/09 run -- audit trail; values are run-specific)

VM capi-mgmt-v2: gp.large, ubuntu-24.04-noble; tenant IP + FIP are per-rebuild (this rebuild 10.20.0.107 ens3 / FIP 10.12.5.103; 2026-06-08/09: 10.20.0.45 / 10.12.7.40). 6.2 persists both to ~/capi-mgmt-net.env.
Net capi-mgmt-net / subnet capi-mgmt-subnet 10.20.0.0/24; router capi-mgmt-router.
k8s-snap: 1.32-classic/stable, rev 5326, v1.32.13 (classic confinement); CNI Cilium 1.17.12-ck0.
pod CIDR 10.1.0.0/16; svc CIDR 10.152.183.0/24; cluster DNS 10.152.183.31.
GATE 2: probe pod 10.1.0.150 -> 10.12.4.50:5000, exitCode 0 / Completed (agnhost:2.40, ~9s pull).
Pins (capi-helm-charts 0.25.1 dependencies.json): CAPI v1.13.2 | CAPO v0.14.4 | cert-manager v1.20.2 | CAAPH 0.12.0 | janitor 0.11.0 | ORC v2.5.0 | helm v3.17.3. CAAPH/janitor deploy SHA-pinned images: 62f7c00 / d527847.
Tooling VM-side: helm v3.17.3, clusterctl v1.13.2, matched kubectl 1.32.13 (KUBECONFIG=/root/kubeconfig).

phase-07 -- conductor graft: place ~/capi-mgmt.kubeconfig at /etc/magnum/kubeconfig on magnum/0 and stage the [capi_helm] conf.d drop-in (D-037), pointing the conductor at the FIP.

Phase 06 -- In-Cloud Management Cluster (D-035)

Prerequisites (must be true entering phase-06)

Constants and env-literals (TAG: regenerate/confirm per site on rebuild)

Run-location legend (every block states where it runs)

Step 6.0-BOOT -- Fresh-deploy tenant bootstrap (project, role, flavors, mgmt image)

Step 6.0 -- Keypair + security group (capi-mgmt project)

Step 6.1 -- Network, subnet, router (capi-mgmt project)

Step 6.2 -- VM + floating IP (MUTATION; not batched with the gate)

Step 6.3 -- GATE 1: OS-level egress (before any k8s investment)

Step 6.4 -- k8s-snap install + bootstrap (MUTATION; secret-free)

Step 6.5 -- GATE 2: kubeconfig to jumphost + pod-egress proof (THE D-035 GATE)

Step 6.6 -- CAPI provider stack (pinned to dependencies.json; D-034)

6.6a -- tooling + pins (install helm/clusterctl/kubectl VM-side; read dependencies.json @ 0.25.1)

6.6b -- cert-manager (DOCFIX-025a: crds.enabled=true, NOT installCRDs)

6.6c -- ORC (BEFORE clusterctl init; CAPO hard-depends on the ORC Image CRD)

6.6d -- clusterctl init (core + kubeadm bootstrap/control-plane + CAPO)

6.6e -- CAAPH + janitor (azimuth helm charts; chart names from each repo Chart.yaml)

6.6f -- verify the stack

EXIT GATE (phase-06 complete)

As-built reference (2026-06-08/09 run -- audit trail; values are run-specific)

Next