# Runbook 05 — Magnum CAPI Helm Driver Graft

**Reference:** D-007 Layer B (rescoped per D-017). Runs after `04a-capi-bootstrap-cluster.md`.

**Purpose:** Install the `stackhpc/magnum-capi-helm` driver into the Magnum
charm's Python environment, configure Magnum to load and use it, and verify
end-to-end cluster creation via the driver against the bootstrap k3s
management cluster on `capi-mgmt.maas`.

**Prerequisites:**

- Runbook 04 complete (Magnum trustee domain created)
- Runbook 04a complete (capi-mgmt bootstrap k3s + CAPI controllers + ORC
  running; workload cluster Available; kubeconfig at
  `~/magnum-capi/capi-mgmt-cluster.kubeconfig` on jumphost)
- Authenticated Juju session active

**Key constraint:** Charm-magnum's systemd units invoke
`/etc/init.d/magnum-{api,conductor} systemd-start` (SysV-wrapped). Drop-in
config dirs are NOT consumed by the init.d script as shipped. Phase 4 graft
must REPLACE the systemd ExecStart entirely with a wrapper that adds
`--config-dir /etc/magnum/magnum.conf.d/`. This pattern was validated on
Bobcat and is expected to persist on Caracal — verify with `juju exec` at
the start of execution.

## Step 1 — Investigation block (D-017 rehearsal)

Before any grafting, inspect the live charm state. The init.d/systemd
wrapping shape may have shifted between Bobcat and Caracal:

```bash
juju exec --unit magnum/leader -- 'cat /lib/systemd/system/magnum-api.service'
juju exec --unit magnum/leader -- 'cat /lib/systemd/system/magnum-conductor.service'
juju exec --unit magnum/leader -- 'ls /etc/init.d/ | grep magnum'
juju exec --unit magnum/leader -- 'cat /etc/init.d/magnum-api 2>/dev/null | head -40'
juju exec --unit magnum/leader -- 'ls /etc/default/magnum-* 2>/dev/null'
juju exec --unit magnum/leader -- 'python3 -c "import magnum; print(magnum.__file__)"'
```

Record results in execution notes. The Python import path tells us where
to pip-install the driver (Bobcat: `/usr/lib/python3/dist-packages/magnum/`).

## Step 2 — Pre-flight: confirm kubeconfig reachability

The Magnum charm unit must be able to reach the k3s API on
`capi-mgmt.maas:6443`. The charm runs in an LXD container on the metal
network; reach is expected via direct L2.

```bash
juju exec --unit magnum/leader -- "curl -sk --max-time 5 https://$(awk '/server:/ {print $2}' ~/magnum-capi/capi-mgmt-cluster.kubeconfig | head -1 | sed 's|https://||')/healthz"
# Expect: "ok"
```

## Step 3 — Install the driver into the charm Python environment

```bash
juju ssh magnum/leader -- "sudo pip install --break-system-packages \
  'git+https://github.com/stackhpc/magnum-capi-helm@v0.13.0'"

# Verify
juju exec --unit magnum/leader -- 'python3 -c "import magnum_capi_helm; print(magnum_capi_helm.__file__)"'
```

Pin to a specific tag rather than `main` — the driver should not move
under our feet between deploys. Version `v0.13.0` was validated on Bobcat;
verify it remains the chosen tag at Caracal execution time.

## Step 4 — Deploy the kubeconfig to the charm unit

```bash
# Copy from jumphost to magnum/leader
juju scp ~/magnum-capi/capi-mgmt-cluster.kubeconfig magnum/leader:/tmp/capi-kubeconfig
juju ssh magnum/leader -- "sudo install -o root -g magnum -m 0640 /tmp/capi-kubeconfig /etc/magnum/kubeconfig && sudo rm /tmp/capi-kubeconfig"
juju ssh magnum/leader -- "ls -la /etc/magnum/kubeconfig"
```

## Step 5 — Configure Magnum to use the CAPI Helm driver

Create the conf.d directory and drop-in:

```bash
juju ssh magnum/leader -- "sudo mkdir -p /etc/magnum/magnum.conf.d && sudo chown root:magnum /etc/magnum/magnum.conf.d && sudo chmod 0750 /etc/magnum/magnum.conf.d"

juju ssh magnum/leader -- "sudo tee /etc/magnum/magnum.conf.d/99-capi.conf > /dev/null" << 'EOC'
[DEFAULT]
enabled_drivers = k8s_capi_helm_v1

[capi_helm]
kubeconfig_file = /etc/magnum/kubeconfig
EOC

juju ssh magnum/leader -- "sudo chown root:magnum /etc/magnum/magnum.conf.d/99-capi.conf && sudo chmod 0640 /etc/magnum/magnum.conf.d/99-capi.conf"
```

## Step 6 — Install the systemd ExecStart override

Because the charm's systemd units invoke an init.d wrapper that does NOT
honor `--config-dir`, the override must replace the ExecStart entirely
with a wrapper that invokes the Magnum binaries directly with both the
default config file and our config dir.

```bash
juju ssh magnum/leader -- "sudo mkdir -p /etc/systemd/system/magnum-api.service.d"
juju ssh magnum/leader -- "sudo tee /etc/systemd/system/magnum-api.service.d/override.conf > /dev/null" << 'EOC'
[Service]
ExecStart=
ExecStart=/usr/bin/magnum-api --config-file /etc/magnum/magnum.conf --config-dir /etc/magnum/magnum.conf.d
EOC

juju ssh magnum/leader -- "sudo mkdir -p /etc/systemd/system/magnum-conductor.service.d"
juju ssh magnum/leader -- "sudo tee /etc/systemd/system/magnum-conductor.service.d/override.conf > /dev/null" << 'EOC'
[Service]
ExecStart=
ExecStart=/usr/bin/magnum-conductor --config-file /etc/magnum/magnum.conf --config-dir /etc/magnum/magnum.conf.d
EOC

juju ssh magnum/leader -- "sudo systemctl daemon-reload"
juju ssh magnum/leader -- "sudo systemctl restart magnum-api magnum-conductor"
juju ssh magnum/leader -- "sudo systemctl status magnum-api magnum-conductor --no-pager"
```

Verify the override took effect:

```bash
juju ssh magnum/leader -- "sudo systemctl cat magnum-api | grep ExecStart"
juju ssh magnum/leader -- "ps -ef | grep magnum-api | grep -v grep"
# Expect: /usr/bin/magnum-api with --config-dir flag
```

## Step 7 — Verify driver loaded

```bash
juju ssh magnum/leader -- "sudo tail -100 /var/log/magnum/magnum-conductor.log | grep -i -E 'driver|capi'"
# Expect: log lines mentioning k8s_capi_helm_v1 driver loaded
```

## Step 8 — Smoke test

Create a cluster template and small cluster to validate end-to-end:

```bash
source ~/admin-openrc

# Cluster template
openstack coe cluster template create \
  --name k8s-capi-test \
  --image noble-amd64 \
  --keypair capi-mgmt-key \
  --external-network <ext-net> \
  --master-flavor m1.medium \
  --flavor m1.medium \
  --coe kubernetes \
  --network-driver calico \
  --labels driver=k8s_capi_helm_v1,kube_tag=v1.32.2

# Cluster
openstack coe cluster create \
  --cluster-template k8s-capi-test \
  --master-count 1 \
  --node-count 1 \
  --keypair capi-mgmt-key \
  k8s-capi-smoke

# Poll
watch -n 30 'openstack coe cluster show k8s-capi-smoke -c status -c status_reason'
# Expect CREATE_COMPLETE within 15-20 min
```

Tear down the smoke cluster after validation:

```bash
openstack coe cluster delete k8s-capi-smoke
# Wait for DELETE_COMPLETE
openstack coe cluster template delete k8s-capi-test
```

## Exit criteria

- Magnum services running with `--config-dir /etc/magnum/magnum.conf.d`
  visible in the live process
- `k8s_capi_helm_v1` driver logged at conductor startup
- Smoke-test cluster reached `CREATE_COMPLETE` and torn down cleanly

## Idempotency and recovery notes

- The systemd override survives `charm config-changed` (charm rewrites
  `magnum.conf` but doesn't touch the conf.d dir or systemd drop-ins)
- The pip-installed driver may NOT survive a charm `upgrade-charm` — if
  the venv gets rebuilt, re-run Step 3
- The kubeconfig at `/etc/magnum/kubeconfig` is operator-managed; survives
  charm hooks but if Magnum is redeployed, restore it

## Recurring pitfalls

- `juju ssh` HANGS when stdout is redirected — use `juju exec --unit X -- 'cmd'`
- Python magnum at `/usr/lib/python3/dist-packages/magnum/` needs `--break-system-packages` for PEP 668
- Heredoc nesting in `juju ssh` is fragile — keep heredocs simple, single level
- Non-ASCII characters in conf.d files cause silent daemon failures — ensure ASCII only
