Status: Third execution document of Batch C. Last document in the Magnum/CAPI stack. Grafts the CAPI Helm driver onto the deployed Magnum so openstack coe cluster create provisions tenant K8s clusters via the workload cluster's CAPI controllers (not via the deprecated Heat driver).
Position in sequence: Runs after v1-do-doc-07-capi-bootstrap.md (workload cluster + kubeconfig staged at $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig). Final document of Batch C. Followed by Batch D (tenant + DNS + validate).
Replaces: runbooks/05-magnum-capi-driver.md — same substantive procedure with fixes applied. The old runbook moves to runbooks/deprecated/ as part of this batch's commits.
Fixes applied vs the prior runbook (runbooks/05-magnum-capi-driver.md):
docs/design-decisions.md; the notice is now obsolete.exit 1 on CREATE_FAILED converted to non-exiting [FAIL] report — the smoketest is optional and a failure should not kill the operator's shell.Cross-references:
Graft the CAPI Helm driver onto the Charmed Magnum deployment so that openstack coe cluster create provisions tenant K8s clusters via CAPI (in the workload cluster) instead of via the deprecated Heat driver.
Output of this runbook:
magnum-capi-helm==1.1.0 installed on the magnum unit's system Python./etc/magnum/kubeconfig populated with the workload cluster's kubeconfig (post-pivot CAPI controller plane)./etc/magnum/magnum.conf.d/99-capi.conf configured with enabled_drivers = k8s_capi_helm_v1 and [capi_helm] kubeconfig_file=.magnum-api and magnum-conductor that replace the init.d wrapper's ExecStart with explicit --config-dir invocation.Scope: v1 testcloud. Roosevelt deltas in §12.
Out of scope:
| Decision | Choice | Reason |
|---|---|---|
| Driver pin | magnum-capi-helm==1.1.0 from PyPI |
D-007 correction (stackhpc fork archived Dec 2024; canonical project on opendev/PyPI; 1.1.0 is last Caracal-cycle release) |
| Install method | pip3 install --break-system-packages |
PEP 668 — Ubuntu 22.04+ requires explicit override for system-site-packages install |
| Install scope | System Python on magnum unit (not venv) | Magnum charm uses system-packaged python at /usr/lib/python3/dist-packages/magnum/; driver must import from same site |
| Kubeconfig target | Workload cluster (post-pivot) | Workstream 3b — bootstrap k3s is empty post-pivot; CAPI controllers live in workload |
| Kubeconfig source | $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig (staged by v1-do-doc-07 §19) |
Documented handoff |
| Driver entry-point name | k8s_capi_helm_v1 |
Per upstream magnum-capi-helm 1.1.0; verify in §5 |
| Conf.d filename | 99-capi.conf |
Numeric prefix ensures it loads AFTER any charm-managed conf, so enabled_drivers override wins |
| File encoding | ASCII-only | Non-ASCII in conf.d causes silent magnum daemon failures (handoff lesson; cf. Horizon local_settings.d issue) |
| Trustee credential | Existing magnum-shared user (charm-managed) | Roosevelt will use app-credential pattern |
| Prereq | Verification | ||
|---|---|---|---|
| Magnum charm active/idle | `juju status magnum | grep magnum/0showsactive idle` |
|
| Magnum domain setup completed (v1-do-doc-06) | ( source $HOME/admin-openrc; openstack domain show magnum -f value -c enabled ) returns True |
||
| Workload cluster reachable from jumphost | kubectl --kubeconfig $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig get nodes returns Ready nodes |
||
| CAPI controllers running in workload cluster | `kubectl --kubeconfig $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig get pods -n capi-system | grep -v Running | grep -v NAME` empty |
| Workload kubeconfig staged at expected path | test -r $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig && stat -c %a $HOME/magnum-capi/capi-mgmt-cluster.kubeconfig shows 600 |
||
juju exec works to magnum/leader (use exec, NOT ssh, for non-interactive — handoff lesson) |
juju exec --unit magnum/leader -- hostname returns the unit hostname |
Set shell context:
export WORK=$HOME/magnum-capi export WORKLOAD_KUBECONFIG=$WORK/capi-mgmt-cluster.kubeconfig export DRIVER_VERSION=magnum-capi-helm==1.1.0 # per D-007 correction cd "$WORK"
juju sshvsjuju execchoice: the handoff lessons explicitly call out thatjuju sshhangs when stdout is redirected (PTY allocation issue). This runbook usesjuju execfor all non-interactive command execution and reservesjuju sshonly for cases where you actually want an interactive shell.
Capture the magnum unit's state BEFORE making changes. Useful for diagnosis if anything goes wrong, and as a record of what was changed.
mkdir -p "$WORK/pre-state" # Service unit files (as managed by charm) juju exec --unit magnum/leader -- \ 'sudo systemctl cat magnum-api magnum-conductor 2>&1' \ > "$WORK/pre-state/systemd-units.txt" # Currently-enabled drivers juju exec --unit magnum/leader -- \ 'sudo grep -r enabled_drivers /etc/magnum/ 2>/dev/null || echo "(no enabled_drivers found — charm default applies)"' \ > "$WORK/pre-state/drivers-pre.txt" # Python site-packages — see what's already installed juju exec --unit magnum/leader -- \ 'sudo pip3 list 2>/dev/null | grep -iE "magnum|cluster|helm|kubernetes" || true' \ > "$WORK/pre-state/pip-pre.txt" # conf.d state juju exec --unit magnum/leader -- \ 'sudo ls -la /etc/magnum/magnum.conf.d/ 2>/dev/null || echo "(no conf.d directory)"' \ > "$WORK/pre-state/confd-pre.txt" # Service running state juju exec --unit magnum/leader -- \ 'sudo systemctl is-active magnum-api magnum-conductor' \ > "$WORK/pre-state/service-state-pre.txt" # Display the captured state cat "$WORK/pre-state/"*.txt
What to look for in pre-state: the charm-managed
enabled_driversvalue probably includes Heat-based drivers (heat_kubernetes, etc.). The 99-capi.conf override in §7 replaces this with the single CAPI driver. The pre-state capture documents what was active before the override took effect.
juju exec --unit magnum/leader -- \ "sudo pip3 install $DRIVER_VERSION --break-system-packages"
Verify install:
juju exec --unit magnum/leader -- \ 'sudo pip3 show magnum-capi-helm | head -10' # Expect: Name: magnum-capi-helm # Version: 1.1.0 # Location: /usr/lib/python3/dist-packages juju exec --unit magnum/leader -- \ 'sudo python3 -c "import magnum_capi_helm; print(magnum_capi_helm.__file__)"' # Expect: /usr/lib/python3/dist-packages/magnum_capi_helm/__init__.py
Check that the driver entry point is registered:
juju exec --unit magnum/leader -- \
'sudo python3 -c "
from stevedore import driver
mgr = driver.DriverManager(
namespace=\"magnum.drivers\",
name=\"k8s_capi_helm_v1\",
invoke_on_load=False
)
print(\"Driver class:\", mgr.driver)
"'
# Expect: Driver class: <class 'magnum_capi_helm.driver.Driver'>
# (or similar — the actual class path is package-version-dependent)
If the entry point check fails with "No 'k8s_capi_helm_v1' driver found", the driver name in 1.1.0 may differ from what D-007 documented. Inspect the installed package's
entry_points.txt:juju exec --unit magnum/leader -- \ 'sudo cat /usr/lib/python3/dist-packages/magnum_capi_helm*.dist-info/entry_points.txt 2>/dev/null'Find the entry under
[magnum.drivers]— use that exact name in §7.
# Transfer kubeconfig from jumphost to magnum unit juju scp "$WORKLOAD_KUBECONFIG" magnum/leader:/tmp/kubeconfig # Install with correct ownership/mode in one atomic step juju exec --unit magnum/leader -- \ 'sudo install -m 0640 -o root -g magnum /tmp/kubeconfig /etc/magnum/kubeconfig && sudo rm /tmp/kubeconfig'
Verify:
juju exec --unit magnum/leader -- \ 'sudo ls -la /etc/magnum/kubeconfig' # Expect: -rw-r----- 1 root magnum ... /etc/magnum/kubeconfig # Confirm magnum user can read it juju exec --unit magnum/leader -- \ 'sudo -u magnum cat /etc/magnum/kubeconfig | head -3' # Expect: apiVersion: v1 / clusters: / - cluster: # Confirm kubectl can use it from the magnum unit (sanity check on API reachability) juju exec --unit magnum/leader -- \ 'sudo -u magnum kubectl --kubeconfig /etc/magnum/kubeconfig get nodes 2>&1 | head -10' # Expect: NAME ... STATUS=Ready for control plane + workers # OR: kubectl not installed (acceptable — magnum-capi-helm uses Python client, not kubectl)
Why mode 0640 and group magnum: kubeconfig contains auth tokens. Mode 0600 (owner-only) wouldn't let the
magnumsystem user (which runs magnum-api/conductor) read it. Mode 0640 withgroup: magnumis the minimum-permission setup that works. NOT 0644 — keeps it off other users on the unit.
/etc/magnum/magnum.conf.d/99-capi.confGenerate the conf locally first (keep paths under $HOME for consistency with snap confinement on other steps), then transfer.
ASCII-only verification is critical — the handoff documents non-ASCII characters in conf.d files causing silent daemon failures (cf. Horizon local_settings.d). Use plain straight quotes, ASCII dashes, no smart typography.
# Write locally cat > "$WORK/99-capi.conf" <<'EOF' [DEFAULT] enabled_drivers = k8s_capi_helm_v1 [capi_helm] kubeconfig_file = /etc/magnum/kubeconfig EOF # Verify it is pure ASCII (no UTF-8 sneakers) file "$WORK/99-capi.conf" # Expect: ASCII text # If it says "UTF-8 Unicode text", STOP and rewrite by hand — even one stray # em-dash or smart quote will silently break magnum # Hex dump check (paranoid mode) xxd "$WORK/99-capi.conf" | grep -v "^[0-9a-f]*: [0-9a-f ]* [a-zA-Z0-9 \[\]=._/]*$" | head -5 # Expect: empty output (all bytes are printable ASCII)
Stage and install:
juju scp "$WORK/99-capi.conf" magnum/leader:/tmp/99-capi.conf juju exec --unit magnum/leader -- \ 'sudo mkdir -p /etc/magnum/magnum.conf.d && sudo install -m 0644 -o root -g root /tmp/99-capi.conf /etc/magnum/magnum.conf.d/99-capi.conf && sudo rm /tmp/99-capi.conf' # Verify juju exec --unit magnum/leader -- \ 'sudo ls -la /etc/magnum/magnum.conf.d/ && sudo cat /etc/magnum/magnum.conf.d/99-capi.conf' # Expect: file listed; content matches what was written
The Charmed Magnum unit files use a wrapper pattern:
ExecStart=/etc/init.d/magnum-api systemd-start
The wrapper does NOT pass --config-dir to magnum-api, so /etc/magnum/magnum.conf.d/ is never loaded. The 99-capi.conf would have no effect.
Override with explicit --config-file + --config-dir invocation.
Generate override files locally:
cat > "$WORK/magnum-api-override.conf" <<'EOF' [Service] ExecStart= ExecStart=/usr/bin/magnum-api --config-file=/etc/magnum/magnum.conf --config-dir=/etc/magnum/magnum.conf.d EOF cat > "$WORK/magnum-conductor-override.conf" <<'EOF' [Service] ExecStart= ExecStart=/usr/bin/magnum-conductor --config-file=/etc/magnum/magnum.conf --config-dir=/etc/magnum/magnum.conf.d EOF # ASCII check file "$WORK/magnum-api-override.conf" "$WORK/magnum-conductor-override.conf" # Expect: ASCII text x2
The empty
ExecStart=line is critical. Systemd accumulates ExecStart directives by default; an empty assignment is required to CLEAR the inherited directive before setting the replacement. Without the empty line, the unit would have BOTH the init.d wrapper AND the new direct invocation, and would likely fail to start.
Install on the unit:
juju scp "$WORK/magnum-api-override.conf" magnum/leader:/tmp/magnum-api-override.conf juju scp "$WORK/magnum-conductor-override.conf" magnum/leader:/tmp/magnum-conductor-override.conf juju exec --unit magnum/leader -- \ 'sudo mkdir -p /etc/systemd/system/magnum-api.service.d /etc/systemd/system/magnum-conductor.service.d && \ sudo install -m 0644 -o root -g root /tmp/magnum-api-override.conf /etc/systemd/system/magnum-api.service.d/override.conf && \ sudo install -m 0644 -o root -g root /tmp/magnum-conductor-override.conf /etc/systemd/system/magnum-conductor.service.d/override.conf && \ sudo rm /tmp/magnum-api-override.conf /tmp/magnum-conductor-override.conf' # Reload systemd to pick up the overrides juju exec --unit magnum/leader -- 'sudo systemctl daemon-reload' # Verify the overrides are effective (systemctl cat shows combined unit + overrides) juju exec --unit magnum/leader -- 'sudo systemctl cat magnum-api | grep -A1 ExecStart' # Expect: TWO ExecStart= lines — the empty clear-line and the new /usr/bin/magnum-api invocation juju exec --unit magnum/leader -- 'sudo systemctl cat magnum-conductor | grep -A1 ExecStart' # Expect: TWO ExecStart= lines as above for magnum-conductor
Charm reconciliation note: the Magnum charm may rewrite its own systemd units on config changes or upgrades. The drop-in override at
/etc/systemd/system/magnum-api.service.d/override.confis OUTSIDE the charm's writable zone and should survive. Verify after anyjuju refreshorjuju config magnumcommand by re-running thesystemctl catcheck above.
juju exec --unit magnum/leader -- \ 'sudo systemctl restart magnum-api magnum-conductor' # Wait briefly for services to initialize sleep 5 # Check active state juju exec --unit magnum/leader -- \ 'sudo systemctl is-active magnum-api magnum-conductor' # Expect: active (x2) # Examine recent journal for errors (the critical step — magnum's silent failure # mode means we must read logs, not just trust is-active) juju exec --unit magnum/leader -- \ 'sudo journalctl -u magnum-api --since "2 minutes ago" --no-pager | tail -50' juju exec --unit magnum/leader -- \ 'sudo journalctl -u magnum-conductor --since "2 minutes ago" --no-pager | tail -50'
Look for these red flags in the logs:
| Symptom | Likely cause | Remediation |
|---|---|---|
ImportError: No module named magnum_capi_helm |
§5 pip install failed | Re-run §5; check pip3 output |
EntryPointError: No 'k8s_capi_helm_v1' driver |
Driver entry-point name mismatch | Verify name per §5 footnote; update §7 |
| Service repeatedly restarts (look for "Started" appearing twice in 10s) | Likely a config error in 99-capi.conf | Re-check ASCII-only; check magnum.conf.d permissions |
kubeconfig_file not honored |
--config-dir not being passed | §8 override not active; re-run systemctl daemon-reload |
| Silent: no error but driver also not loading | Non-ASCII char snuck into a conf | file /etc/magnum/magnum.conf.d/99-capi.conf — if it says UTF-8, regenerate |
Verify the driver is actually loaded by Magnum and reachable via the API.
source $HOME/admin-openrc # List supported COE drivers via the Magnum API openstack coe cluster template list -f json # (empty templates list is fine — we are checking the endpoint responds) # Direct check on the unit: scan the service's loaded drivers juju exec --unit magnum/leader -- \ 'sudo journalctl -u magnum-conductor --since "5 minutes ago" --no-pager | grep -iE "driver|enabled" | head -20' # Expect: a line mentioning k8s_capi_helm_v1 having been loaded # (Magnum logs the loaded drivers at startup) # Definitive check: try creating a cluster template that requires the CAPI driver openstack coe cluster template create magnum-capi-driver-check \ --image noble-amd64 \ --keypair capi-workload-key \ --external-network ext_net \ --master-flavor capi-mgmt-node \ --flavor capi-mgmt-node \ --coe kubernetes \ --network-driver calico \ --labels kube_tag=$KUBERNETES_VERSION openstack coe cluster template show magnum-capi-driver-check -c name -c coe -c labels
If template create fails with "driver not enabled" or similar: the Magnum API process is not loading the conf.d. Verify the systemd override took effect —
sudo systemctl show magnum-api -p ExecStarton the unit should show the explicit--config-dirinvocation. If it still shows the init.d wrapper, the daemon-reload + restart did not pick up the override.
$KUBERNETES_VERSIONcarry-over: if your shell session no longer has$KUBERNETES_VERSIONset from v1-do-doc-07 §4, re-read it from$HOME/capi-bootstrap/pins/KUBERNETES_VERSIONor substitute the actual version in the--labels kube_tag=flag.
Cleanup the driver-check template:
openstack coe cluster template delete magnum-capi-driver-check
This step is optional. Full validation belongs in v1-do-doc-11. Use this smoketest only if you want immediate confirmation that the entire chain (Magnum API → conductor → magnum-capi-helm → CAPI controllers in workload cluster → tenant K8s cluster on tenant VMs) works end-to-end.
# Create a cluster template tuned for testcloud smoketest
openstack coe cluster template create magnum-smoketest-template \
--image noble-amd64 \
--keypair capi-workload-key \
--external-network ext_net \
--master-flavor capi-mgmt-node \
--flavor capi-mgmt-node \
--coe kubernetes \
--network-driver calico \
--labels boot_volume_size=20,kube_tag=$KUBERNETES_VERSION,octavia_provider=ovn
# Create a 1+1 cluster (minimum for smoketest)
openstack coe cluster create magnum-smoketest \
--cluster-template magnum-smoketest-template \
--master-count 1 \
--node-count 1
# Poll for status (15-20 min typical; CAPI provisions tenant VMs end-to-end)
SMOKETEST_RESULT=""
for i in $(seq 1 60); do
STATUS=$(openstack coe cluster show magnum-smoketest -c status -f value 2>/dev/null)
echo "$(date -Is) status=$STATUS"
case "$STATUS" in
CREATE_COMPLETE)
echo "[OK] Smoketest passed"
SMOKETEST_RESULT="pass"
break
;;
CREATE_FAILED)
echo "[FAIL] Smoketest cluster creation failed. Investigate via:"
echo " openstack coe cluster show magnum-smoketest"
echo " openstack stack list # if any Heat stack remained"
echo " kubectl --kubeconfig \$HOME/magnum-capi/capi-mgmt-cluster.kubeconfig get cluster,machines -A"
echo " juju exec --unit magnum/leader -- sudo journalctl -u magnum-conductor --since '30 minutes ago'"
SMOKETEST_RESULT="fail"
break
;;
esac
sleep 30
done
if [ "$SMOKETEST_RESULT" = "fail" ]; then
echo ""
echo "[FAIL] Smoketest did not complete cleanly. Stop here, investigate, and decide whether to proceed."
echo " The smoketest cluster may need manual cleanup — see cleanup block below if you want to remove it."
elif [ -z "$SMOKETEST_RESULT" ]; then
echo "[WARN] Smoketest poll timed out without reaching a terminal state. Cluster may still be provisioning."
echo " Manually check with: openstack coe cluster show magnum-smoketest"
fi
If the smoketest reached CREATE_COMPLETE:
# Retrieve the smoketest cluster's kubeconfig openstack coe cluster config magnum-smoketest --dir "$WORK/smoketest-kubeconfig" # Sanity-check the smoketest cluster KUBECONFIG="$WORK/smoketest-kubeconfig/config" kubectl get nodes KUBECONFIG="$WORK/smoketest-kubeconfig/config" kubectl get pods -A | head -20
Cleanup the smoketest cluster (regardless of pass/fail):
openstack coe cluster delete magnum-smoketest 2>/dev/null || echo "(cluster may already be deleting)" openstack coe cluster template delete magnum-smoketest-template 2>/dev/null || echo "(template may already be deleted)"
What success looks like: the CAPI controllers in the workload cluster receive the new Cluster CR (created by magnum-capi-helm in response to the Magnum API call), CAPO talks to OpenStack to provision tenant VMs, the tenant VMs join the new K8s cluster, and the new cluster has 1 control plane + 1 worker Ready. Octavia provides the API server LB (visible as a Floating IP in the tenant project).
| Aspect | Testcloud (v1) | Roosevelt |
|---|---|---|
| Driver pin source | PyPI magnum-capi-helm==1.1.0 |
Internal mirror with checksum verification |
| Driver pin record | Implicit in this runbook | Captured in Vault as audit artifact alongside CAPI pins |
| Kubeconfig source | Workload cluster (post-pivot per v1-do-doc-07 §17) | Same |
| Kubeconfig rotation | Manual on capi-mgmt rebuild | Automated when workload cluster cert rotates |
| Trustee credential | Charm-default magnum-shared user | Per-tenant app credentials via Vault auth method |
| Magnum HA | num_units=1 (per D-009 testcloud) | num_units=3 with hacluster + provider VIP |
| Driver upgrade discipline | Manual re-run of §5 | Tracked maintenance window; Vault audit log |
| Systemd override | Drop-in at /etc/systemd/system/magnum-*.service.d/override.conf |
Same — but provided via a charm overlay package, not manual file install |
| ASCII-only enforcement | Manual check (§7, §8) | Pre-flight lint in scripts/pre-flight-checks.sh |
These gotchas burned cycles during the Bobcat Magnum CAPI work. Each is explicitly handled in this runbook; collecting them here for visibility:
--break-system-packages (§5). Ubuntu 22.04+ refuses pip install against system Python by default. The flag is required for the magnum-capi-helm install path used by Charmed Magnum.juju ssh hangs on stdout redirect. PTY allocation issue. This runbook uses juju exec for all non-interactive command execution.juju ssh is fragile. This runbook writes conf files locally first and uses juju scp + juju exec install to transfer — single-level only.conf.d files cause silent daemon failures. §7 and §8 both include file <path> ASCII verification before transfer.openstack -f value -c X -c Y outputs in alphabetical field order, not flag order. This runbook uses single-column queries or -f json | jq throughout.enabled_drivers is overridden, not appended. The enabled_drivers = k8s_capi_helm_v1 line in 99-capi.conf REPLACES the charm-default value (which would include the deprecated Heat drivers).ExecStart= line is required to clear the inherited ExecStart before setting the replacement (§8).openstack CLI cannot read /tmp. This runbook stages files under $WORK=$HOME/magnum-capi. The smoketest in §11 also writes to $WORK/smoketest-kubeconfig.$WORK/pre-state/magnum-capi-helm installed; pip3 show magnum-capi-helm shows version 1.1.0 from /usr/lib/python3/dist-packages/etc/magnum/kubeconfig present with mode 0640 root:magnum; magnum user can read it/etc/magnum/magnum.conf.d/99-capi.conf present; file reports ASCII text/etc/systemd/system/magnum-*.service.d/override.conf; systemctl cat shows TWO ExecStart= lines for both unitsmagnum-capi-driver-check template creation succeeded; deleted afterCREATE_COMPLETE and was cleaned up (or skipped intentionally)If all required (non-optional) items checked, Batch C is complete. Proceed to Batch D (v1-do-doc-09-tenant.md first).
| Date | Change | Reference |
|---|---|---|
| 2026-05-22 | Original runbook 05 created. magnum-capi-helm 1.1.0 from PyPI; workload-cluster kubeconfig (post-pivot per workstream 3b); systemd override pattern; ASCII-only conf.d. | Workstream 3c |
| 2026-05-27 | Adapted into v1-do-doc-08. Fixes: §1 obsolete D-007 inconsistency notice removed; cross-references updated to v1-do-doc set; §11 smoketest exit 1 converted to non-exiting [FAIL] report (smoketest is optional and should not kill the operator shell). |
Batch C drafting |