Status: Executes after 04-magnum-domain.md (Keystone wiring) and 04a-capi-bootstrap-cluster.md (workload cluster + kubeconfig staged). Final post-deploy step to make Magnum capable of creating CAPI-managed tenant K8s clusters.
Cross-references:
Known doc inconsistency (tracked for cleanup): D-007's Layer B currently states the kubeconfig points at "capi-mgmt.maas bootstrap k3s". That language is correct for Bobcat (no pivot) but obsolete post-workstream-3b (pivot mandatory). This runbook uses the workload cluster kubeconfig as the canonical target. D-007 patch to follow in a workstream-3 cleanup commit.
Graft the CAPI Helm driver onto the Charmed Magnum deployment so that openstack coe cluster create provisions tenant K8s clusters via CAPI (in the workload cluster) instead of via the deprecated Heat driver.
Output of this runbook:
magnum-capi-helm==1.1.0 installed on the magnum unit's system Python./etc/magnum/kubeconfig populated with the workload cluster's kubeconfig (post-pivot CAPI controller plane)./etc/magnum/magnum.conf.d/99-capi.conf configured with enabled_drivers = k8s_capi_helm_v1 and [capi_helm] kubeconfig_file=.magnum-api and magnum-conductor that replace the init.d wrapper's ExecStart with explicit --config-dir invocation.Scope: v1 testcloud. Roosevelt deltas in §12.
Out of scope:
| Decision | Choice | Reason |
|---|---|---|
| Driver pin | magnum-capi-helm==1.1.0 from PyPI |
D-007 correction (stackhpc fork archived Dec 2024; canonical project on opendev/PyPI; 1.1.0 is last Caracal-cycle release) |
| Install method | pip3 install --break-system-packages |
PEP 668 — Ubuntu 22.04+ requires explicit override for system-site-packages install |
| Install scope | System Python on magnum unit (not venv) | Magnum charm uses system-packaged python at /usr/lib/python3/dist-packages/magnum/; driver must import from same site |
| Kubeconfig target | Workload cluster (post-pivot) | Workstream 3b — bootstrap k3s is empty post-pivot; CAPI controllers live in workload |
| Kubeconfig source | $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig (staged by 04a §19) |
Documented handoff |
| Driver entry-point name | k8s_capi_helm_v1 |
Per upstream magnum-capi-helm 1.1.0; verify in §10 |
| Conf.d filename | 99-capi.conf |
Numeric prefix ensures it loads AFTER any charm-managed conf, so enabled_drivers override wins |
| File encoding | ASCII-only | Non-ASCII in conf.d causes silent magnum daemon failures (handoff lesson; cf. Horizon local_settings.d issue) |
| Trustee credential | Existing magnum-shared user (charm-managed) | Roosevelt will use app-credential pattern |
| Prereq | Verification | ||
|---|---|---|---|
| Magnum charm active/idle | `juju status magnum \ | grep magnum/0showsactive idle` |
|
| Magnum domain setup completed (runbook 04) | `openstack domain show magnum \ | grep enabledreturnsTrue` |
|
| Workload cluster reachable from jumphost | kubectl --kubeconfig $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig get nodes returns Ready nodes |
||
| CAPI controllers running in workload cluster | `kubectl --kubeconfig $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig get pods -n capi-system \ | grep -v Running \ | grep -v NAME` empty |
| Workload kubeconfig staged at expected path | test -r $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig && stat -c %a $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig shows 600 |
||
juju exec works to magnum/leader (use exec, NOT ssh, for non-interactive — handoff lesson) |
juju exec --unit magnum/leader -- hostname returns the unit hostname |
Set shell context:
export WORK=$HOME/magnum-capi export WORKLOAD_KUBECONFIG=$WORK/capi-mgmt-cluster.kubeconfig export DRIVER_VERSION=magnum-capi-helm==1.1.0 # per D-007 correction cd "$WORK"
juju sshvsjuju execchoice: the handoff lessons explicitly call out thatjuju sshhangs when stdout is redirected (PTY allocation issue). This runbook usesjuju execfor all non-interactive command execution and reservesjuju sshonly for cases where you actually want an interactive shell.
Capture the magnum unit's state BEFORE making changes. Useful for diagnosis if anything goes wrong, and as a record of what was changed.
mkdir -p "$WORK/pre-state" # Service unit files (as managed by charm) juju exec --unit magnum/leader -- \ 'sudo systemctl cat magnum-api magnum-conductor 2>&1' \ > "$WORK/pre-state/systemd-units.txt" # Currently-enabled drivers juju exec --unit magnum/leader -- \ 'sudo grep -r enabled_drivers /etc/magnum/ 2>/dev/null || echo "(no enabled_drivers found — charm default applies)"' \ > "$WORK/pre-state/drivers-pre.txt" # Python site-packages — see what's already installed juju exec --unit magnum/leader -- \ 'sudo pip3 list 2>/dev/null | grep -iE "magnum|cluster|helm|kubernetes" || true' \ > "$WORK/pre-state/pip-pre.txt" # conf.d state juju exec --unit magnum/leader -- \ 'sudo ls -la /etc/magnum/magnum.conf.d/ 2>/dev/null || echo "(no conf.d directory)"' \ > "$WORK/pre-state/confd-pre.txt" # Service running state juju exec --unit magnum/leader -- \ 'sudo systemctl is-active magnum-api magnum-conductor' \ > "$WORK/pre-state/service-state-pre.txt" # Display the captured state cat "$WORK/pre-state/"*.txt
What to look for in pre-state: the charm-managed
enabled_driversvalue probably includes Heat-based drivers (heat_kubernetes, etc.). The 99-capi.conf override in §7 replaces this with the single CAPI driver. The pre-state capture documents what was active before the override took effect.
juju exec --unit magnum/leader -- \ "sudo pip3 install $DRIVER_VERSION --break-system-packages"
Verify install:
juju exec --unit magnum/leader -- \ 'sudo pip3 show magnum-capi-helm | head -10' # Expect: Name: magnum-capi-helm # Version: 1.1.0 # Location: /usr/lib/python3/dist-packages juju exec --unit magnum/leader -- \ 'sudo python3 -c "import magnum_capi_helm; print(magnum_capi_helm.__file__)"' # Expect: /usr/lib/python3/dist-packages/magnum_capi_helm/__init__.py
Check that the driver entry point is registered:
juju exec --unit magnum/leader -- \
'sudo python3 -c "
from stevedore import driver
mgr = driver.DriverManager(
namespace=\"magnum.drivers\",
name=\"k8s_capi_helm_v1\",
invoke_on_load=False
)
print(\"Driver class:\", mgr.driver)
"'
# Expect: Driver class: <class 'magnum_capi_helm.driver.Driver'>
# (or similar — the actual class path is package-version-dependent)
If the entry point check fails with "No 'k8s_capi_helm_v1' driver found", the driver name in 1.1.0 may differ from what D-007 documented. Inspect the installed package's
entry_points.txt:juju exec --unit magnum/leader -- \ 'sudo cat /usr/lib/python3/dist-packages/magnum_capi_helm*.dist-info/entry_points.txt 2>/dev/null'Find the entry under
[magnum.drivers]— use that exact name in §7.
# Transfer kubeconfig from jumphost to magnum unit juju scp "$WORKLOAD_KUBECONFIG" magnum/leader:/tmp/kubeconfig # Install with correct ownership/mode in one atomic step juju exec --unit magnum/leader -- \ 'sudo install -m 0640 -o root -g magnum /tmp/kubeconfig /etc/magnum/kubeconfig && sudo rm /tmp/kubeconfig'
Verify:
juju exec --unit magnum/leader -- \ 'sudo ls -la /etc/magnum/kubeconfig' # Expect: -rw-r----- 1 root magnum ... /etc/magnum/kubeconfig # Confirm magnum user can read it juju exec --unit magnum/leader -- \ 'sudo -u magnum cat /etc/magnum/kubeconfig | head -3' # Expect: apiVersion: v1 / clusters: / - cluster: # Confirm kubectl can use it from the magnum unit (sanity check on API reachability) juju exec --unit magnum/leader -- \ 'sudo -u magnum kubectl --kubeconfig /etc/magnum/kubeconfig get nodes 2>&1 | head -10' # Expect: NAME ... STATUS=Ready for control plane + workers # OR: kubectl not installed (acceptable — magnum-capi-helm uses Python client, not kubectl)
Why mode 0640 and group magnum: kubeconfig contains auth tokens. Mode 0600 (owner-only) wouldn't let the
magnumsystem user (which runs magnum-api/conductor) read it. Mode 0640 withgroup: magnumis the minimum-permission setup that works. NOT 0644 — keeps it off other users on the unit.
/etc/magnum/magnum.conf.d/99-capi.confGenerate the conf locally first (snap confinement does not apply to plain bash on jumphost, but we keep paths under $HOME for consistency), then transfer.
ASCII-only verification is critical — the handoff documents non-ASCII characters in conf.d files causing silent daemon failures (cf. Horizon local_settings.d). Use plain straight quotes, ASCII dashes, no smart typography.
# Write locally cat > "$WORK/99-capi.conf" <<'EOF' [DEFAULT] enabled_drivers = k8s_capi_helm_v1 [capi_helm] kubeconfig_file = /etc/magnum/kubeconfig EOF # Verify it is pure ASCII (no UTF-8 sneakers) file "$WORK/99-capi.conf" # Expect: ASCII text # If it says "UTF-8 Unicode text", STOP and rewrite by hand — even one stray # em-dash or smart quote will silently break magnum # Hex dump check (paranoid mode) xxd "$WORK/99-capi.conf" | grep -v "^[0-9a-f]*: [0-9a-f ]* [a-zA-Z0-9 \[\]=._/]*$" | head -5 # Expect: empty output (all bytes are printable ASCII)
Stage and install:
juju scp "$WORK/99-capi.conf" magnum/leader:/tmp/99-capi.conf juju exec --unit magnum/leader -- \ 'sudo mkdir -p /etc/magnum/magnum.conf.d && sudo install -m 0644 -o root -g root /tmp/99-capi.conf /etc/magnum/magnum.conf.d/99-capi.conf && sudo rm /tmp/99-capi.conf' # Verify juju exec --unit magnum/leader -- \ 'sudo ls -la /etc/magnum/magnum.conf.d/ && sudo cat /etc/magnum/magnum.conf.d/99-capi.conf' # Expect: file listed; content matches what was written
The Charmed Magnum unit files use a wrapper pattern:
ExecStart=/etc/init.d/magnum-api systemd-start
The wrapper does NOT pass --config-dir to magnum-api, so /etc/magnum/magnum.conf.d/ is never loaded. The 99-capi.conf would have no effect.
Override with explicit --config-file + --config-dir invocation.
Generate override files locally:
cat > "$WORK/magnum-api-override.conf" <<'EOF' [Service] ExecStart= ExecStart=/usr/bin/magnum-api --config-file=/etc/magnum/magnum.conf --config-dir=/etc/magnum/magnum.conf.d EOF cat > "$WORK/magnum-conductor-override.conf" <<'EOF' [Service] ExecStart= ExecStart=/usr/bin/magnum-conductor --config-file=/etc/magnum/magnum.conf --config-dir=/etc/magnum/magnum.conf.d EOF # ASCII check file "$WORK/magnum-api-override.conf" "$WORK/magnum-conductor-override.conf" # Expect: ASCII text x2
The empty
ExecStart=line is critical. Systemd accumulates ExecStart directives by default; an empty assignment is required to CLEAR the inherited directive before setting the replacement. Without the empty line, the unit would have BOTH the init.d wrapper AND the new direct invocation, and would likely fail to start.
Install on the unit:
juju scp "$WORK/magnum-api-override.conf" magnum/leader:/tmp/magnum-api-override.conf juju scp "$WORK/magnum-conductor-override.conf" magnum/leader:/tmp/magnum-conductor-override.conf juju exec --unit magnum/leader -- \ 'sudo mkdir -p /etc/systemd/system/magnum-api.service.d /etc/systemd/system/magnum-conductor.service.d && \ sudo install -m 0644 -o root -g root /tmp/magnum-api-override.conf /etc/systemd/system/magnum-api.service.d/override.conf && \ sudo install -m 0644 -o root -g root /tmp/magnum-conductor-override.conf /etc/systemd/system/magnum-conductor.service.d/override.conf && \ sudo rm /tmp/magnum-api-override.conf /tmp/magnum-conductor-override.conf' # Reload systemd to pick up the overrides juju exec --unit magnum/leader -- 'sudo systemctl daemon-reload' # Verify the overrides are effective (systemctl cat shows combined unit + overrides) juju exec --unit magnum/leader -- 'sudo systemctl cat magnum-api | grep -A1 ExecStart' # Expect: TWO ExecStart= lines — the empty clear-line and the new /usr/bin/magnum-api invocation juju exec --unit magnum/leader -- 'sudo systemctl cat magnum-conductor | grep -A1 ExecStart' # Expect: TWO ExecStart= lines as above for magnum-conductor
Charm reconciliation note: the Magnum charm may rewrite its own systemd units on config changes or upgrades. The drop-in override at
/etc/systemd/system/magnum-api.service.d/override.confis OUTSIDE the charm's writable zone and should survive. Verify after anyjuju refreshorjuju config magnumcommand by re-running thesystemctl catcheck above.
juju exec --unit magnum/leader -- \ 'sudo systemctl restart magnum-api magnum-conductor' # Wait briefly for services to initialize sleep 5 # Check active state juju exec --unit magnum/leader -- \ 'sudo systemctl is-active magnum-api magnum-conductor' # Expect: active (x2) # Examine recent journal for errors (the critical step — magnum's silent failure # mode means we must read logs, not just trust is-active) juju exec --unit magnum/leader -- \ 'sudo journalctl -u magnum-api --since "2 minutes ago" --no-pager | tail -50' juju exec --unit magnum/leader -- \ 'sudo journalctl -u magnum-conductor --since "2 minutes ago" --no-pager | tail -50'
Look for these red flags in the logs:
| Symptom | Likely cause | Remediation |
|---|---|---|
ImportError: No module named magnum_capi_helm |
§5 pip install failed | Re-run §5; check pip3 output |
EntryPointError: No 'k8s_capi_helm_v1' driver |
Driver entry-point name mismatch | Verify name per §5 footnote; update §7 |
| Service repeatedly restarts (look for "Started" appearing twice in 10s) | Likely a config error in 99-capi.conf | Re-check ASCII-only; check magnum.conf.d permissions |
kubeconfig_file not honored |
--config-dir not being passed | §8 override not active; re-run systemctl daemon-reload |
| Silent: no error but driver also not loading | Non-ASCII char snuck into a conf | file /etc/magnum/magnum.conf.d/99-capi.conf — if it says UTF-8, regenerate |
Verify the driver is actually loaded by Magnum and reachable via the API.
source $HOME/admin-openrc # List supported COE drivers via the Magnum API openstack coe cluster template list -f json # (empty templates list is fine — we are checking the endpoint responds) # Direct check on the unit: scan the service's loaded drivers juju exec --unit magnum/leader -- \ 'sudo journalctl -u magnum-conductor --since "5 minutes ago" --no-pager | grep -iE "driver|enabled" | head -20' # Expect: a line mentioning k8s_capi_helm_v1 having been loaded # (Magnum logs the loaded drivers at startup) # Definitive check: try creating a cluster template that requires the CAPI driver openstack coe cluster template create magnum-capi-driver-check \ --image noble-amd64 \ --keypair capi-workload-key \ --external-network ext_net \ --master-flavor capi-mgmt-node \ --flavor capi-mgmt-node \ --coe kubernetes \ --network-driver calico \ --labels kube_tag=v1.31.4 openstack coe cluster template show magnum-capi-driver-check -c name -c coe -c labels
If template create fails with "driver not enabled" or similar: the Magnum API process is not loading the conf.d. Verify the systemd override took effect —
sudo systemctl show magnum-api -p ExecStarton the unit should show the explicit--config-dirinvocation. If it still shows the init.d wrapper, the daemon-reload + restart did not pick up the override.
Cleanup the driver-check template:
openstack coe cluster template delete magnum-capi-driver-check
This step is optional. Full validation belongs in runbook 08. Use this smoketest only if you want immediate confirmation that the entire chain (Magnum API -> conductor -> magnum-capi-helm -> CAPI controllers in workload cluster -> tenant K8s cluster on tenant VMs) works end-to-end.
# Create a cluster template tuned for testcloud smoketest
openstack coe cluster template create magnum-smoketest-template \
--image noble-amd64 \
--keypair capi-workload-key \
--external-network ext_net \
--master-flavor capi-mgmt-node \
--flavor capi-mgmt-node \
--coe kubernetes \
--network-driver calico \
--labels boot_volume_size=20,kube_tag=v1.31.4,octavia_provider=ovn
# Create a 1+1 cluster (minimum for smoketest)
openstack coe cluster create magnum-smoketest \
--cluster-template magnum-smoketest-template \
--master-count 1 \
--node-count 1
# Poll for status (15-20 min typical; CAPI provisions tenant VMs end-to-end)
for i in $(seq 1 60); do
STATUS=$(openstack coe cluster show magnum-smoketest -c status -f value)
echo "$(date -Is) status=$STATUS"
case "$STATUS" in
CREATE_COMPLETE) echo "Smoketest passed"; break ;;
CREATE_FAILED) echo "Smoketest FAILED"; openstack coe cluster show magnum-smoketest; exit 1 ;;
esac
sleep 30
done
# Retrieve the smoketest cluster's kubeconfig
openstack coe cluster config magnum-smoketest --dir "$WORK/smoketest-kubeconfig"
# Sanity-check the smoketest cluster
KUBECONFIG="$WORK/smoketest-kubeconfig/config" kubectl get nodes
KUBECONFIG="$WORK/smoketest-kubeconfig/config" kubectl get pods -A | head -20
# Cleanup the smoketest cluster
openstack coe cluster delete magnum-smoketest
openstack coe cluster template delete magnum-smoketest-template
What success looks like: the CAPI controllers in the workload cluster receive the new Cluster CR (created by magnum-capi-helm in response to the Magnum API call), CAPO talks to OpenStack to provision tenant VMs, the tenant VMs join the new K8s cluster, and the new cluster has 1 control plane + 1 worker Ready. Octavia provides the API server LB (visible as a Floating IP in the tenant project).
| Aspect | Testcloud (v1) | Roosevelt |
|---|---|---|
| Driver pin source | PyPI magnum-capi-helm==1.1.0 |
Internal mirror with checksum verification |
| Driver pin record | Implicit in this runbook | Captured in Vault as audit artifact alongside CAPI pins |
| Kubeconfig source | Workload cluster (post-pivot per 04a §17) | Same |
| Kubeconfig rotation | Manual on capi-mgmt rebuild | Automated when workload cluster cert rotates |
| Trustee credential | Charm-default magnum-shared user | Per-tenant app credentials via Vault auth method |
| Magnum HA | num_units=1 (per D-009 testcloud) | num_units=3 with hacluster + provider VIP |
| Driver upgrade discipline | Manual re-run of §5 | Tracked maintenance window; Vault audit log |
| Systemd override | Drop-in at /etc/systemd/system/magnum-*.service.d/override.conf |
Same — but provided via a charm overlay package, not manual file install |
| ASCII-only enforcement | Manual check (§7, §8) | Pre-flight lint in scripts/pre-flight-checks.sh |
These gotchas burned cycles during the Bobcat Magnum CAPI work. Each is explicitly handled in this runbook; collecting them here for visibility:
--break-system-packages (§5). Ubuntu 22.04+ refuses pip install against system Python by default. The flag is required for the magnum-capi-helm install path used by Charmed Magnum.juju ssh hangs on stdout redirect. PTY allocation issue. This runbook uses juju exec for all non-interactive command execution.juju ssh is fragile. This runbook writes conf files locally first and uses juju scp + juju exec install to transfer — single-level only.conf.d files cause silent daemon failures. §7 and §8 both include file <path> ASCII verification before transfer.openstack -f value -c X -c Y outputs in alphabetical field order, not flag order. This runbook uses single-column queries or -f json | jq throughout.enabled_drivers is overridden, not appended. The enabled_drivers = k8s_capi_helm_v1 line in 99-capi.conf REPLACES the charm-default value (which would include the deprecated Heat drivers).ExecStart= line is required to clear the inherited ExecStart before setting the replacement (§8).openstack CLI cannot read /tmp. This runbook stages files under $WORK=$HOME/magnum-capi. The smoketest in §11 also writes to $WORK/smoketest-kubeconfig.| Date | Change | Reference |
|---|---|---|
| 2026-05-22 | Document created. magnum-capi-helm 1.1.0 from PyPI; workload-cluster kubeconfig (post-pivot per workstream 3b); systemd override pattern; ASCII-only conf.d. | Workstream 3c |