# Phase 07 -- Magnum Conductor Graft (D-031 / D-037 / D-042)

Graft the magnum-capi-helm CAPI driver onto the charm-managed conductor
(`magnum/0`), point it at the in-cloud management cluster (phase-06) via the
FIP, and land on a CONTRACT-COHERENT driver so `coe cluster` health reports
`HEALTHY`. The driver upgrade (D-042) is part of the v1 baseline here, not a
follow-up -- the as-first-built 1.3.0 read the version-less v1beta2
`infrastructureRef` and reported a cosmetic UNHEALTHY; it is superseded by the
RELEASED `magnum-capi-helm==1.4.0`, which is the v1 end state.

Decisions: D-031 (driver/engine/surface), D-037 (conf.d drop-in + config-dir via
/etc/default, NOT a systemd ExecStart drop-in), D-042 (driver must be
contract-coherent with the Layer-A core; amends D-034). D-036 (driver/engine/
chart coherence). Troubleshooting: appendix-A DOCFIX-021, D-037, D-042, and
lessons L-P6-1..4.

---

## Prerequisites (must be true entering phase-07)
- phase-06 EXIT GATE passed: `capi-mgmt-v2` Ready, CAPI stack up (ORC `Image` CRD
  present, no crash-looping CAPO), `~/capi-mgmt.kubeconfig` (server = FIP) works
  from the jumphost.
- Magnum charm live (`magnum/0`); the Keystone trustee domain is auto-configured by the
  magnum charm via its keystone (identity-credentials) relation -- verify [trust]
  (trustee_domain_id / trustee_domain_admin_id / trustee_domain_admin_password) is
  populated in magnum.conf; no manual step.
- `admin-openrc` on the jumphost; `juju` (model openstack); `jq`.

## Constants and env-literals (TAG: confirm per site on rebuild)
- `ENV(conductor-unit)` magnum/0        (LXD 1/lxd/2 on openstack1; addr 10.12.4.76)
- `ENV(conductor-src)`  10.12.4.76/32   (the conductor's provider IP; SG source)
- `ENV(mgmt-fip)`       10.12.7.40       (mgmt apiserver; kubeconfig server)
- `ENV(mgmt-sg)`        capi-mgmt-sg     (in the capi-mgmt project)
- `ENV(project)`        capi-mgmt        (id 674171fd28d446d3a37073b6a761e910)
- `ENV(magnum-ns)`      magnum-674171fd28d446d3a37073b6a761e910  (driver namespace per project)
- `ENV(chart-ver)`      0.25.1           (capi-helm-charts; load-bearing -- driver default is 0.10.1)
- `ENV(helm-ver)`       v3.17.3

## Run-location legend
- `# RUN: jumphost`            -- vopenstack-jesse as jessea123 (admin-openrc).
- `# RUN: jumphost -> magnum/0`-- shipped to the conductor via `juju ssh -m openstack magnum/0 '...' </dev/null`
  (DOCFIX-021: `</dev/null` on every juju ssh / sudo so the remote command does not eat the heredoc/pipe).
- Conductor facts: DEB install (magnum 18.0.1, python3.10, container base ubuntu 22.04);
  conductor runs as user `magnum`; daemon launched by an LSB init script wrapped by
  systemd `systemd-start` (NOT a direct ExecStart) -- see Step 7.7.

---

## Step 7.1 -- Authorize the conductor source on the mgmt-cluster SG
`# RUN: jumphost` (scoped to the capi-mgmt project). Idempotent.

```bash
( {
  set -u
  # scope openstack CLI to the capi-mgmt project (id form -- robust to name/domain)
  source ~/admin-openrc
  unset OS_PROJECT_NAME OS_PROJECT_ID OS_TENANT_NAME OS_TENANT_ID
  export OS_PROJECT_ID=674171fd28d446d3a37073b6a761e910      # ENV(project)
  SG=$(openstack security group show capi-mgmt-sg -f value -c id)   # ENV(mgmt-sg)
  echo "SG=$SG"
  echo "=== add ingress tcp/6443 from the conductor 10.12.4.76/32 (if absent) ==="
  openstack security group rule list "$SG" -f value -c "IP Range" -c "Port Range" \
    | grep -q '10.12.4.76/32 6443:6443' \
    || openstack security group rule create --proto tcp --dst-port 6443 \
         --remote-ip 10.12.4.76/32 "$SG"
  openstack security group rule list "$SG" -f value -c Protocol -c "Port Range" -c "IP Range"
} )
```
Then prove conductor -> mgmt apiserver reachability:
```bash
# RUN: jumphost -> magnum/0
juju ssh -m openstack magnum/0 \
  "timeout 6 bash -c 'exec 3<>/dev/tcp/10.12.7.40/6443' && echo TCP-OK || echo TCP-FAIL" </dev/null
```
GATE: require `TCP-OK`. (Pre-existing jumphost rules tcp/22+6443 from 10.12.4.1/32 remain.)

## Step 7.2 -- Place the mgmt kubeconfig on the conductor [SENSITIVE; not batched]
`# RUN: jumphost -> magnum/0`  The source `~/capi-mgmt.kubeconfig` already has its
server rewritten to the FIP (phase-06 6.5). Transfer base64-piped straight into a
root-written 0600 file owned by the conductor user -- never stage the admin
kubeconfig in /tmp (appendix-A: L-P6-4).

```bash
# discover the conductor service user (expect: magnum)
juju ssh -m openstack magnum/0 'systemctl show magnum-conductor -p User --value' </dev/null

# transfer (umask 077; chown to the discovered user; 0600)
# NOTE: NO trailing </dev/null here -- stdin IS the payload. A </dev/null would
# override the pipe (SC2259) and silently write an EMPTY kubeconfig while the
# && chain still exits 0. DOCFIX-021 applies only to commands whose stdin is
# NOT in use; the discovery line above keeps it, this pipe must not.
base64 ~/capi-mgmt.kubeconfig | juju ssh -m openstack magnum/0 \
  "sudo bash -c 'umask 077; base64 -d > /etc/magnum/kubeconfig && \
   getent passwd magnum >/dev/null && chown magnum: /etc/magnum/kubeconfig && \
   chmod 0600 /etc/magnum/kubeconfig'"

# verify byte-exact (hashes must match before proceeding)
sha256sum ~/capi-mgmt.kubeconfig
juju ssh -m openstack magnum/0 'sudo sha256sum /etc/magnum/kubeconfig' </dev/null
```
GATE: the two sha256 hashes are identical (an empty or truncated transfer fails here,
not three steps later as a confusing conductor auth error).
End-to-end proof (the conductor user authenticates to the mgmt cluster via the FIP):
```bash
juju ssh -m openstack magnum/0 \
  'sudo -u magnum env HOME=/tmp helm --kubeconfig /etc/magnum/kubeconfig list -A' </dev/null
```
Expect: the mgmt-cluster helm releases listed (cert-manager, ck-dns, ck-network
cilium, cluster-api-addon-provider, cluster-api-janitor-openstack, metrics-server).
GATE: a populated list = reach + auth OK. (Hardening, Roosevelt: replace this
cluster-admin kubeconfig with a scoped ServiceAccount kubeconfig.)

## Step 7.3 -- Confirm the driver target + served CAPI versions (D-042)
`# RUN: jumphost` + jumphost kubectl. The fix is the RELEASED tag
`magnum-capi-helm==1.4.0` (the "generalize-api-resources" feature). 1.3.0 read the
version-less v1beta2 `infrastructureRef` and failed the health GET; 1.4.0 resolves each
resource query as `api_resources.get(<Kind>,{}).get("api_version", <code-default>)`,
where the driver's CODE defaults are v1beta1 for every CAPI core kind (Cluster /
MachineDeployment / Machine -> cluster.x-k8s.io/v1beta1; OpenstackCluster ->
infrastructure.cluster.x-k8s.io/v1beta1; K8sControlPlane ->
controlplane.cluster.x-k8s.io/v1beta1). IMPORTANT: the `api_resources` OPTION itself
defaults to an EMPTY map `{}` -- the v1beta1 values are code-level fallbacks, NOT option
defaults. This cluster serves v1beta1 (CAPI v1.13 still serves it; unserved only in
v1.16), so an empty `api_resources` yields v1beta1 lookups that match -- no per-kind
override needed.

Sanity-confirm v1beta1 is served per group before installing:
```bash
( {
  export KUBECONFIG="$HOME/capi-mgmt.kubeconfig"
  for g in cluster.x-k8s.io controlplane.cluster.x-k8s.io infrastructure.cluster.x-k8s.io \
           bootstrap.cluster.x-k8s.io addons.cluster.x-k8s.io; do
    echo "== $g =="; kubectl api-resources --api-group="$g" 2>/dev/null | awk 'NR==1 || /v1beta1/'
  done
} )
#   Expect v1beta1 for: cluster.x-k8s.io (Cluster/MachineDeployment/Machine),
#   controlplane.cluster.x-k8s.io (KubeadmControlPlane), infrastructure.cluster.x-k8s.io
#   (OpenStackCluster -- verified anchor). If a CORE kind serves ONLY v1beta2, override
#   just that kind via api_resources in Step 7.6; otherwise the defaults work as-is.
```

## Step 7.4 -- Install the driver (1.4.0) + helm in the conductor container
`# RUN: jumphost -> magnum/0`  `--no-deps` preserves the deb-managed oslo stack (no
PEP668 issue on the 22.04 container).

```bash
# egress pre-check
juju ssh -m openstack magnum/0 \
  'curl -s -o /dev/null -w "pypi:%{http_code}\n" https://pypi.org/simple/ ; \
   curl -s -o /dev/null -w "helm:%{http_code}\n" https://get.helm.sh/' </dev/null

# helm v3.17.3 (if not already present from a prior graft)
juju ssh -m openstack magnum/0 'command -v helm && helm version --short || echo "helm absent -- install v3.17.3 from get.helm.sh tarball to /usr/local/bin/helm"' </dev/null

# install the RELEASED contract-coherent driver (supersedes 1.3.0)
juju ssh -m openstack magnum/0 'sudo python3 -m pip install --no-deps --upgrade "magnum-capi-helm==1.4.0"' </dev/null

# verify the install + entry point
juju ssh -m openstack magnum/0 \
  'pip show magnum-capi-helm | egrep "Version|Location"; \
   python3 -c "import importlib.metadata as m; print([e.name for e in m.entry_points(group=\"magnum.drivers\")])"' </dev/null
```
Expect: Version 1.4.0; `k8s_capi_helm_v1` present in the entry points.

## Step 7.5 -- api_resources (D-042; set EXPLICITLY to an empty map on this cluster)
1.4.0 exposes ONE [capi_helm] option for this -- `api_resources`, a JSON string mapping
CAPI kinds (Cluster, OpenstackCluster, MachineDeployment, K8sControlPlane, Machine,
Manifests, HelmRelease) to `{api_version, plural_name}`. The driver's CODE falls back to
v1beta1 for every CAPI core kind when that kind is absent from the map (Step 7.3), and
this cluster serves v1beta1 -- so the map's CONTENTS are empty here. But set it
EXPLICITLY to `{}` in the drop-in (Step 7.6) rather than omit it: the option's registered
default is a Python dict `{}` and the driver runs `json.loads()` on the value, so an
explicit string `{}` avoids depending on how oslo coerces a non-string default (not
empirically testable in the build environment -- explicit-set is the safe choice).
Override a specific kind ONLY if Step 7.3 showed it serves ONLY v1beta2, e.g.
`api_resources = {"Cluster": {"api_version": "cluster.x-k8s.io/v1beta2"}}`.

## Step 7.6 -- Stage the [capi_helm] conf.d drop-in (D-037)
`# RUN: jumphost -> magnum/0`  0644 root, NO secrets (it points at the 0600
kubeconfig). The `default_helm_chart_version = 0.25.1` line is LOAD-BEARING (driver
built-in default is `0.10.1`, the retired v1alpha6-era chart). `api_resources` is set to
an explicit empty map `{}` (Step 7.5 -- the driver's code falls back to v1beta1 for every
CAPI kind, which this cluster serves; explicit `{}` avoids the dict-default `json.loads`
question). ASCII only.

```bash
juju ssh -m openstack magnum/0 "sudo tee /etc/magnum/magnum.conf.d/00-capi-helm.conf >/dev/null <<'CONF'
[capi_helm]
kubeconfig_file = /etc/magnum/kubeconfig
helm_chart_repo = https://azimuth-cloud.github.io/capi-helm-charts
helm_chart_name = openstack-cluster
default_helm_chart_version = 0.25.1
api_resources = {}
CONF" </dev/null
```
If (and only if) Step 7.3 showed a core kind is v1beta2-only, append the override --
ONE line, a JSON value naming just the kinds that need it:
```
    # api_resources = {"Cluster": {"api_version": "cluster.x-k8s.io/v1beta2"}, ...}
```
Re-check ASCII cleanliness:
```bash
juju ssh -m openstack magnum/0 \
  'LC_ALL=C grep -nP "[^\x00-\x7F]" /etc/magnum/magnum.conf.d/00-capi-helm.conf && echo NON-ASCII || echo "ASCII clean"' </dev/null
```

## Step 7.7 -- Wire config-dir injection via /etc/default (D-037 REVISED; NOT a systemd drop-in)
`# RUN: jumphost -> magnum/0`  These OpenStack debs run the daemon through an LSB
init script wrapped by systemd `systemd-start`; a systemd `ExecStart` drop-in is
INERT (appendix-A: D-037, L-P6-1/L-P6-2). The sanctioned extension point is
`/etc/default/magnum-conductor`, sourced inside the init script AFTER the base
`--config-file` is assembled. The charm does not manage that file.

```bash
# confirm the daemon currently has NO --config-dir (the problem we are fixing)
juju ssh -m openstack magnum/0 'ps -ww -C magnum-conductor -o args=' </dev/null

# create the per-service extension (literal $DAEMON_ARGS -- it expands at source time)
juju ssh -m openstack magnum/0 \
  "echo 'DAEMON_ARGS=\"\$DAEMON_ARGS --config-dir /etc/magnum/magnum.conf.d\"' \
   | sudo tee /etc/default/magnum-conductor >/dev/null && \
   sudo chmod 0644 /etc/default/magnum-conductor" </dev/null

# DRY-RUN verify WITHOUT restarting: the init script's own show-args echoes the assembled cmdline
juju ssh -m openstack magnum/0 '/etc/init.d/magnum-conductor show-args' </dev/null
```
GATE: `show-args` must show BOTH `--config-file=/etc/magnum/magnum.conf` AND
`--config-dir /etc/magnum/magnum.conf.d`. Do not restart until this passes.
RESIDUAL (logged): if a future charm hook ever writes /etc/default/magnum-conductor,
the append is lost and [capi_helm] silently stops being read -- detect via show-args/ps.

## Step 7.8 -- Restart conductor + verify driver + HEALTHY (P6e + D-042 Stage 6)
`# RUN: jumphost -> magnum/0`, then jumphost health poll.

```bash
juju ssh -m openstack magnum/0 \
  'sudo systemctl restart magnum-conductor && sleep 3 && systemctl is-active magnum-conductor && \
   ps -ww -C magnum-conductor -o args=' </dev/null
# expect: active; live cmdline carries --config-dir.

juju ssh -m openstack magnum/0 'sudo magnum-driver-manage list-drivers 2>/dev/null | grep capi || \
   echo "driver list (full):"; sudo magnum-driver-manage list-drivers' </dev/null
# expect: k8s_capi_helm_v1 listed.
```
Health poll (the D-042 fix target -- this is what 1.3.0 reported UNHEALTHY):

FRESH DEPLOY ROUTING: on a clean redeploy NO cluster exists yet, so there is nothing
to poll -- SKIP this poll; the gate is discharged in phase-08 step 8.2
(`capi-test-1` reaching `health_status = HEALTHY`). The poll below applies when
grafting onto a cloud that already has a CAPI-driver cluster: substitute that
cluster's name and the current `ENV(project)` id (both are run-specific).
```bash
( {
  source ~/admin-openrc
  unset OS_PROJECT_NAME OS_PROJECT_ID OS_TENANT_NAME OS_TENANT_ID
  export OS_PROJECT_ID=674171fd28d446d3a37073b6a761e910       # ENV(project)
  for i in $(seq 1 10); do
    echo "[$i] health=$(openstack coe cluster show capi-test-1 -f value -c health_status 2>/dev/null)"
    echo "    reason=$(openstack coe cluster show capi-test-1 -f value -c health_status_reason 2>/dev/null)"
    sleep 20
  done
} )
```
GATE (existing-cluster graft only): `health_status -> HEALTHY`, with the
`infrastructure` sub-check now `Ready` (it was the only failing axis under 1.3.0).
On a FRESH DEPLOY this gate is deferred to phase-08 step 8.2 -- do not block here.
If it does not clear on an existing-cluster graft, go to Rollback.

## Step 7.9 -- Regression check (confirm create/manage path intact)
`# RUN: jumphost` (capi-mgmt scope). Prove the upgraded driver still creates+deletes.

FRESH DEPLOY ROUTING: SKIP this step -- the `capi-k8s-v1-32` template does not exist
yet (phase-08 step 8.0 creates it), and phase-08 itself (create `capi-test-1` to
CREATE_COMPLETE, full acceptance, then 8.5 delete) is a superset of this check. Run
7.9 as written only when grafting onto an existing cloud where the template is present.
```bash
openstack coe cluster create capi-fix-check --cluster-template capi-k8s-v1-32 \
  --keypair capi-mgmt-key --master-count 1 --node-count 1
# watch to CREATE_COMPLETE, then:
openstack coe cluster delete capi-fix-check    # watch to gone
```

## Rollback (TEMPORARY holding state only -- if 7.8 health does not clear or 7.9 regresses)
`# RUN: jumphost -> magnum/0`  Reverts to the as-first-built functional
(cosmetic-UNHEALTHY) state on 1.3.0 -- a TEMPORARY holding state to keep the conductor
serving while the 1.4.0 issue is diagnosed, NOT a v1 end state. v1 is NOT complete until
`magnum-capi-helm==1.4.0` is installed and `health_status = HEALTHY` (D-011). Re-attempt
7.3-7.9 after diagnosis.
```bash
juju ssh -m openstack magnum/0 'sudo python3 -m pip install --no-deps --force-reinstall "magnum-capi-helm==1.3.0"' </dev/null
# restore the config backup if you snapshotted one, then:
juju ssh -m openstack magnum/0 'sudo systemctl restart magnum-conductor' </dev/null
```

---

## EXIT GATE (phase-07 complete)
- Conductor reaches the mgmt apiserver via the FIP (TCP-OK); kubeconfig 0600/magnum; helm list OK.
- magnum-capi-helm 1.4.0 installed (contract-coherent, RELEASED); `k8s_capi_helm_v1` enumerated.
- [capi_helm] drop-in read by the conductor (`--config-dir` present in the live cmdline).
- `health_status = HEALTHY` (infrastructure Ready) on a CAPI-driver cluster -- D-042
  issue eliminated. FRESH DEPLOY: no cluster exists yet; this item is DEFERRED to
  phase-08 step 8.2 (existing-cluster graft: verify here on that cluster).
- Regression create/delete passed (FRESH DEPLOY: deferred -- phase-08 8.1-8.5 is the
  superset proof).
- Proceed to phase-08 (workload-cluster acceptance + D-011).

## As-built reference (2026-06-08/09 graft -- audit trail)
- magnum/0: LXD 1/lxd/2 on openstack1, addr 10.12.4.76, charm magnum 2024.1/stable rev 70,
  DEB magnum 18.0.1, python3.10, container ubuntu 22.04; conductor user `magnum`.
- As-FIRST-built driver: 1.3.0 (pip --no-deps) -> read the version-less v1beta2 ref -> health UNHEALTHY (D-042).
  PHASE-07 BASELINE supersedes this with the RELEASED magnum-capi-helm==1.4.0 (api_resources; default v1beta1).
- kubeconfig: /etc/magnum/kubeconfig, -rw------- magnum, ~5657 bytes, server = FIP 10.12.7.40:6443.
- conf.d drop-in /etc/magnum/magnum.conf.d/00-capi-helm.conf: kubeconfig_file, helm_chart_repo
  (azimuth), helm_chart_name openstack-cluster, default_helm_chart_version 0.25.1 (api_resources
  left default -- v1beta1 served by CAPI v1.13.2 / CAPO v0.14.4).
- config-dir injection: /etc/default/magnum-conductor `DAEMON_ARGS="$DAEMON_ARGS --config-dir
  /etc/magnum/magnum.conf.d"`; verified live via `ps` and the init script `show-args`.
- helm v3.17.3 at /usr/local/bin/helm.
- Driver internals (reference, from installed source): routes on (server_type vm, os ubuntu,
  coe kubernetes); k8s version comes from the IMAGE `kube_version` property (NOT a template label),
  os_distro=ubuntu; flavor floor 2048 MB / 2 vCPU; auto-mints an app credential (workload nodes use
  the PUBLIC keystone interface); apiServer ALWAYS provisions an Octavia LB (+FIP default).

## Next
phase-08 -- workload-cluster acceptance: create a tenant cluster from template
`capi-k8s-v1-32`, confirm CREATE_COMPLETE + Ready nodes + Calico + LB, and run the
D-011 (amended per D-019) acceptance criteria.
