Newer
Older
openstack-caracal-ipv4 / runbooks / 04a-capi-bootstrap-cluster.md

Runbook 04a — CAPI bootstrap cluster install on capi-mgmt.maas

Reference: D-017 (full rebuild every cycle). Runs after 04-magnum-domain.md and before 05-magnum-capi-driver.md.

Goal: From a MAAS-Ready capi-mgmt VM, produce a single-node k3s running cluster-api, CAPO, canonical-kubernetes providers, cert-manager, and ORC, with a workload-cluster kubeconfig delivered to the jumphost for use by the Magnum CAPI driver in runbook 05.

Pre-conditions:

  • OpenStack cloud is up and stable (02-deploy.md complete, all units active/idle)
  • Magnum trustee domain is created (04-magnum-domain.md complete)
  • capi-mgmt MAAS machine is in Ready state (released after teardown, not yet deployed)
  • Jumphost has ~/admin-openrc sourced and an authenticated openstack CLI working against the new Caracal cloud
  • Vault CA bundle is available on the jumphost at a known path (issued by the Caracal Vault during 03-vault-init.md)

Network preconditions:

  • capi-mgmt machine should be configured in MAAS with two interfaces:
    • eth0 on the metal fabric (DHCP from MAAS) — used for k3s API bind
    • eth1 on the provider fabric (static IP, no DHCP) — used for workload-cluster FIP reach. This IP must NOT fall inside the Neutron FIP allocation pool on the ext_net subnet.
  • Verify the eth1 IP is outside the FIP pool before deploy:
    openstack subnet show <ext_net_subnet> -c allocation_pools -c gateway_ip

Step 1 — Deploy Ubuntu 24.04 to capi-mgmt via MAAS

Use MAAS UI: Machines → capi-mgmt → Take action → Deploy → Ubuntu 24.04 LTS (Noble) → Deploy machine. Wait for Deployed status (~10 min).

Verify SSH reachability once Deployed (note: SSH user is ubuntu, not jessea123; MAAS cloud-init pattern):

ssh ubuntu@<eth0-ip> 'hostname; uname -a; ip -br a'

Verify both interfaces show their expected IPs.

Step 2 — Install Vault CA on the bootstrap host

The bootstrap host must trust the Caracal Vault root CA so that openstack CLI calls and CAPO authentication to Keystone succeed over HTTPS.

# From jumphost — replace <eth0-ip> with the deployed capi-mgmt IP
scp <vault-ca-path>/vault-ca.crt ubuntu@<eth0-ip>:/tmp/vault-ca.crt

ssh ubuntu@<eth0-ip> << 'REMOTE'
sudo install -m 0644 /tmp/vault-ca.crt /usr/local/share/ca-certificates/vault-ca.crt
sudo update-ca-certificates
# Verify Keystone reachable with TLS
curl --cacert /etc/ssl/certs/ca-certificates.crt https://<keystone-internal>:5000/v3 -s -o /dev/null -w "%{http_code}\n"
# Expect: 200
REMOTE

Step 3 — Install k3s

k3s defaults to binding 0.0.0.0:6443. Bind to the metal-network IP only to keep the management API off the provider network. The TLS-SAN flags must include both the IP and the FQDN. k3s does NOT auto-add 127.0.0.1 to the SAN list; if 127.0.0.1 needs to be in the kubeconfig, add it explicitly as a --tls-san. We do not — we rewrite the kubeconfig server URL instead.

ssh ubuntu@<eth0-ip> 'bash -s' << 'REMOTE'
set -euo pipefail
BIND_ADDR=$(ip -4 -br a show eth0 | awk '{print $3}' | cut -d/ -f1)
echo "bind addr: $BIND_ADDR"

if systemctl is-active --quiet k3s; then
  echo "[skip] k3s already running"
else
  curl -sfL https://get.k3s.io | \
    INSTALL_K3S_EXEC="server \
      --bind-address=${BIND_ADDR} \
      --advertise-address=${BIND_ADDR} \
      --node-ip=${BIND_ADDR} \
      --tls-san=${BIND_ADDR} \
      --tls-san=capi-mgmt.maas \
      --write-kubeconfig-mode=0644 \
      --disable=traefik" \
    sh -
fi

# Wait for Ready
for i in $(seq 1 30); do
  if sudo k3s kubectl get nodes 2>/dev/null | awk 'NR>1 && $2=="Ready"{n++} END{exit n<1}'; then
    echo "[ok] node Ready after ${i} polls"
    break
  fi
  sleep 2
done

# Copy and rewrite kubeconfig
sudo install -o ubuntu -g ubuntu -m 0600 /etc/rancher/k3s/k3s.yaml /home/ubuntu/.kube-bootstrap.yaml
sed -i "s|server: https://127\\.0\\.0\\.1:6443|server: https://${BIND_ADDR}:6443|" /home/ubuntu/.kube-bootstrap.yaml
grep '^    server:' /home/ubuntu/.kube-bootstrap.yaml

KUBECONFIG=/home/ubuntu/.kube-bootstrap.yaml kubectl get nodes
REMOTE

Step 4 — Install helm and clusterctl

kubectl is provided by k3s as a symlink; do not re-install.

ssh ubuntu@<eth0-ip> 'bash -s' << 'REMOTE'
set -euo pipefail

# helm
if ! command -v helm >/dev/null 2>&1; then
  curl -fL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
fi
helm version --short

# clusterctl — fetch latest from GitHub API, fall back to a pinned version if needed
if ! command -v clusterctl >/dev/null 2>&1; then
  CLUSTERCTL_VER=$(curl -fsSL --max-time 15 \
    https://api.github.com/repos/kubernetes-sigs/cluster-api/releases/latest \
    | python3 -c 'import json,sys; print(json.load(sys.stdin)["tag_name"])')
  curl -fLo /tmp/clusterctl --max-time 60 \
    "https://github.com/kubernetes-sigs/cluster-api/releases/download/${CLUSTERCTL_VER}/clusterctl-linux-amd64"
  sudo install -o root -g root -m 0755 /tmp/clusterctl /usr/local/bin/clusterctl
  rm /tmp/clusterctl
fi
clusterctl version
REMOTE

Step 5 — clusterctl init with canonical-kubernetes providers

ssh ubuntu@<eth0-ip> 'bash -s' << 'REMOTE'
set -euo pipefail

mkdir -p ~/.cluster-api
cat > ~/.cluster-api/clusterctl.yaml << 'CONFIG'
providers:
  - name: "canonical-kubernetes"
    url: "https://github.com/canonical/cluster-api-k8s/releases/latest/download/bootstrap-components.yaml"
    type: "BootstrapProvider"
  - name: "canonical-kubernetes"
    url: "https://github.com/canonical/cluster-api-k8s/releases/latest/download/control-plane-components.yaml"
    type: "ControlPlaneProvider"
CONFIG

export KUBECONFIG=/home/ubuntu/.kube-bootstrap.yaml

if kubectl get namespace capi-system >/dev/null 2>&1; then
  echo "[skip] CAPI already initialized"
else
  clusterctl init \
    --infrastructure openstack \
    --bootstrap canonical-kubernetes \
    --control-plane canonical-kubernetes
fi

# Wait for all controller deployments
for ns in cert-manager capi-system cabpck-system cacpck-system capo-system; do
  echo "[wait] ${ns}"
  kubectl wait --for=condition=Available deployment --all --namespace "${ns}" --timeout=5m
done

clusterctl version
kubectl get pods -A
REMOTE

Expected namespaces (note the abbreviated canonical-kubernetes names):

  • cert-manager
  • capi-system — cluster-api core
  • capo-system — CAPI provider for OpenStack
  • cabpck-system — CAPI Bootstrap Provider Canonical Kubernetes
  • cacpck-system — CAPI Control-Plane Provider Canonical Kubernetes

Step 6 — Install ORC (OpenStack Resource Controller)

Required by CAPO for managing OpenStack resources as Kubernetes objects. Verify the latest release URL before applying.

ssh ubuntu@<eth0-ip> 'bash -s' << 'REMOTE'
set -euo pipefail
export KUBECONFIG=/home/ubuntu/.kube-bootstrap.yaml

ORC_URL="https://github.com/k-orc/openstack-resource-controller/releases/latest/download/install.yaml"
kubectl apply -f "$ORC_URL"

# Wait for ORC controller
sleep 5
for ns in $(kubectl get ns -o name | grep -E '^namespace/(orc|openstack-resource-controller)' | sed 's|namespace/||'); do
  echo "[wait] ${ns}"
  kubectl wait --for=condition=Available deployment --all --namespace "${ns}" --timeout=5m
done
REMOTE

Step 7 — Cloud-side preparation (run from jumphost)

Inventory existing images and flavors before creating. Lesson from prior cycles: do not blindly create ubuntu-24.04-capi when noble-amd64 is already present and suitable.

source ~/admin-openrc
openstack image list | grep -i noble
openstack flavor list

Create the supporting cloud-side resources for CAPO:

# Project
openstack project create --domain admin_domain capi-mgmt \
  --description "CAPI management cluster workloads"

# User
openstack user create --domain admin_domain --project capi-mgmt \
  --project-domain admin_domain --password-prompt capo

# Roles
openstack role add --project capi-mgmt --project-domain admin_domain \
  --user capo --user-domain admin_domain member
openstack role add --project capi-mgmt --project-domain admin_domain \
  --user capo --user-domain admin_domain load-balancer_member

# Switch to capo
unset $(env | awk -F= '/^OS_/{print $1}')
export OS_AUTH_URL=<keystone-internal>
export OS_IDENTITY_API_VERSION=3
export OS_USERNAME=capo
export OS_USER_DOMAIN_NAME=admin_domain
export OS_PROJECT_NAME=capi-mgmt
export OS_PROJECT_DOMAIN_NAME=admin_domain
export OS_PASSWORD=<the-password-you-set>
export OS_CACERT=<vault-ca-path>

# App credential (record id and secret immediately — secret only shown at creation)
openstack application credential create capo-app-cred \
  --description "CAPO authentication" \
  -f yaml > ~/capi-mgmt/capo-app-cred.yaml
chmod 0600 ~/capi-mgmt/capo-app-cred.yaml

# Nova keypair — generate on capi-mgmt and upload public key
ssh ubuntu@<eth0-ip> 'ssh-keygen -t ed25519 -N "" -f ~/.ssh/capi-mgmt-key'
ssh ubuntu@<eth0-ip> 'cat ~/.ssh/capi-mgmt-key.pub' > /tmp/capi-mgmt-key.pub
openstack keypair create --public-key /tmp/capi-mgmt-key.pub capi-mgmt-key
# Also pull the private key back to jumphost for post-rebuild access
scp -p ubuntu@<eth0-ip>:~/.ssh/capi-mgmt-key ~/capi-mgmt/capi-mgmt-key
chmod 0600 ~/capi-mgmt/capi-mgmt-key

Step 8 — Compose clouds.yaml and cloud.conf

Use v3applicationcredential auth — cleaner than user/password.

# Read app credential
APP_CRED_ID=$(yq -r '.id' ~/capi-mgmt/capo-app-cred.yaml)
APP_CRED_SECRET=$(yq -r '.secret' ~/capi-mgmt/capo-app-cred.yaml)

# Compose clouds.yaml for capi-mgmt
cat > /tmp/clouds.yaml << EOC
clouds:
  openstack:
    auth_type: v3applicationcredential
    auth:
      auth_url: <keystone-internal>
      application_credential_id: ${APP_CRED_ID}
      application_credential_secret: ${APP_CRED_SECRET}
    region_name: RegionOne
    cacert: /usr/local/share/ca-certificates/vault-ca.crt
    interface: public
    identity_api_version: 3
EOC

scp /tmp/clouds.yaml ubuntu@<eth0-ip>:/home/ubuntu/clouds.yaml
ssh ubuntu@<eth0-ip> 'chmod 0600 ~/clouds.yaml'

# cloud.conf for OCCM — use tls-insecure=true for v1 testcloud
# (v2: ship Vault CA via CK8sConfig files field instead)
cat > /tmp/cloud.conf << EOC
[Global]
auth-url=<keystone-internal>
application-credential-id=${APP_CRED_ID}
application-credential-secret=${APP_CRED_SECRET}
region=RegionOne
tls-insecure=true

[LoadBalancer]
floating-network-id=<ext-net-uuid>
EOC

Step 9 — Render and apply the cluster manifest

The canonical-kubernetes cluster template takes 18 substitution variables. Capture them in a cluster-env file, then use envsubst to render. The template is fetched from canonical/cluster-api-k8s.

Variables (verify exact names against the template at apply time):

CLUSTER_NAME=capi-mgmt-cluster
NAMESPACE=default
KUBERNETES_VERSION=v1.32.2
CONTROL_PLANE_MACHINE_COUNT=1
WORKER_MACHINE_COUNT=0
OPENSTACK_CONTROL_PLANE_MACHINE_FLAVOR=capi-mgmt-node
OPENSTACK_NODE_MACHINE_FLAVOR=capi-mgmt-node
OPENSTACK_DNS_NAMESERVERS=<dns-server-ips>
OPENSTACK_EXTERNAL_NETWORK_ID=<ext-net-uuid>
OPENSTACK_FAILURE_DOMAIN=nova
OPENSTACK_IMAGE_NAME=noble-amd64
OPENSTACK_SSH_KEY_NAME=capi-mgmt-key
OPENSTACK_CLOUD_YAML_B64=$(base64 -w0 /tmp/clouds.yaml)
OPENSTACK_CLOUD_CONFIG_B64=$(base64 -w0 /tmp/cloud.conf)
OPENSTACK_CLOUD_CACERT_B64=$(base64 -w0 <vault-ca-path>)
OPENSTACK_CLOUD=openstack
OPENSTACK_NODE_CIDR=10.6.0.0/24
KUBE_CONTROL_PLANE_ENDPOINT_PORT=6443

Render and apply:

ssh ubuntu@<eth0-ip> 'bash -s' << 'REMOTE'
set -euo pipefail
export KUBECONFIG=/home/ubuntu/.kube-bootstrap.yaml

curl -fLo /tmp/cluster-template.yaml \
  https://github.com/canonical/cluster-api-k8s/releases/latest/download/cluster-template.yaml

# Source env vars (operator fills in /tmp/cluster-env)
# shellcheck disable=SC1091
source /tmp/cluster-env

envsubst < /tmp/cluster-template.yaml > /tmp/cluster-rendered.yaml
kubectl apply -f /tmp/cluster-rendered.yaml
REMOTE

Step 10 — Poll for cluster Available

ssh ubuntu@<eth0-ip> 'bash -s' << 'REMOTE'
set -euo pipefail
export KUBECONFIG=/home/ubuntu/.kube-bootstrap.yaml

START=$(date +%s)
DEADLINE=$((START + 15*60))

while [[ $(date +%s) -lt $DEADLINE ]]; do
  PHASE=$(kubectl get cluster capi-mgmt-cluster -o jsonpath='{.status.phase}' 2>/dev/null || echo "?")
  AVAILABLE=$(kubectl get cluster capi-mgmt-cluster -o jsonpath='{.status.conditions[?(@.type=="Available")].status}' 2>/dev/null || echo "?")
  ELAPSED=$(($(date +%s) - START))
  printf '[%4ds] Phase=%s Available=%s\n' "$ELAPSED" "$PHASE" "$AVAILABLE"
  [[ "$AVAILABLE" == "True" ]] && break
  sleep 15
done

clusterctl describe cluster capi-mgmt-cluster --show-conditions all
REMOTE

Step 11 — Export workload kubeconfig to jumphost

ssh ubuntu@<eth0-ip> 'bash -s' << 'REMOTE'
set -euo pipefail
export KUBECONFIG=/home/ubuntu/.kube-bootstrap.yaml
mkdir -p ~/magnum-capi
clusterctl get kubeconfig capi-mgmt-cluster > ~/magnum-capi/capi-mgmt-cluster.kubeconfig
chmod 0600 ~/magnum-capi/capi-mgmt-cluster.kubeconfig
KUBECONFIG=~/magnum-capi/capi-mgmt-cluster.kubeconfig kubectl get nodes
REMOTE

# Copy to jumphost for runbook 05
scp -p ubuntu@<eth0-ip>:~/magnum-capi/capi-mgmt-cluster.kubeconfig ~/magnum-capi/capi-mgmt-cluster.kubeconfig
chmod 0600 ~/magnum-capi/capi-mgmt-cluster.kubeconfig

Exit criteria

  • capi-mgmt.maas is Deployed in MAAS with k3s + CAPI controllers + ORC running
  • capi-mgmt-cluster workload cluster is Available
  • Workload kubeconfig exists at ~/magnum-capi/capi-mgmt-cluster.kubeconfig on the jumphost
  • Proceed to 05-magnum-capi-driver.md

Recurring pitfalls (apply to execution)

  • juju ssh HANGS when stdout is redirected — use juju exec --unit X -- 'cmd'
  • MAAS-deployed Ubuntu uses ubuntu user, not jessea123
  • k3s --bind-address=X doesn't bind 127.0.0.1 — kubeconfig server URL must be sed-rewritten
  • Snap-confined openstack CLI cannot read /tmp — paths under $HOME only
  • openstack -f value -c X -c Y outputs in alphabetical column order — use single-column queries
  • GitHub API rate limit is 60 unauthenticated requests/hour — cache results, don't refetch on every run
  • .maas DNS may not resolve from jumphost — use IPs directly