# Design Decisions — VR0 DC0 Omega Cloud

This document is the architectural record for the VR0 DC0 testcloud rebuild. Every decision listed here has been deliberately made and discussed; it is not a wishlist or a brainstorm. If a decision is changed, this document is updated and the change is committed with a referencing message.

**Scope split:** This repository implements **v1 (IPv4-only)**. Several decisions below are tagged with **[v2-scope]** — they remain valid design intent but are deferred to a future v2 deployment when upstream router infrastructure supports IPv6. See **D-015** for the v1/v2 fork record.

---


## D-001: Deployment target paradigm

**Decision:** Path 2A — Charmed OpenStack Caracal (2024.1) via Juju bundle.

**Alternatives considered:**

- Path 2B — Canonical Sunbeam (microk8s-based). Rejected: discards most of the test-cloud experience accumulated to date; different operator paradigm.
- Path 1 — Stay on Bobcat 2023.2. Rejected: defeats the purpose of a Caracal rehearsal ahead of Roosevelt bare-metal.


**Consequences:**

- Bundle-based deployment, suitable for both KVM testcloud and bare-metal scale.
- Caracal-stable channel matrix applies (see D-002).
- EOL date is April 2027 (Caracal upstream support window).

---


## D-002: Channel pinning matrix

**Decision:** Pin every charm to a Caracal-stable channel. No OVN pinning on testcloud (Roosevelt will pin via `ovn-source`).

| Charm group                                                                                                             | Channel                    |
| ----------------------------------------------------------------------------------------------------------------------- | -------------------------- |
| OpenStack core (keystone, glance, nova-\*, neutron-api, cinder, placement, octavia, barbican, designate, magnum, vault) | `2024.1/stable`            |
| OVN (ovn-central, ovn-chassis, ovn-dedicated-chassis-octavia)                                                           | `24.03/stable`             |
| Ceph (ceph-mon, ceph-osd, ceph-radosgw if used)                                                                         | `squid/stable` (see D-005) |
| MySQL (mysql-innodb-cluster, mysql-router subordinates)                                                                 | `8.0/stable`               |
| RabbitMQ                                                                                                                | `3.9/stable`               |
| Vault                                                                                                                   | `1.8/stable`               |
| hacluster                                                                                                               | `2.4/stable`               |
| etcd, easyrsa                                                                                                           | `latest/stable`            |


**Verification source:** Caracal channel matrix per Canonical Charmed OpenStack Charm Delivery table, verified against Charmhub 2026-05-22 (keystone rev 778, ovn-central rev 332, hacluster `2.4/stable`). Canonical advises against `latest/stable` for OS-stack charms due to non-determinism on charm upgrades; hacluster is therefore pinned to its stable channel. Verify against Charmhub before each deploy via `scripts/pre-flight-checks.sh`.

---


## D-003: Network architecture — Option B

**Decision:** Provider network carries BOTH ext_net (tenant FIPs + SNAT egress) AND OpenStack public API VIPs on the same L2 segment.

**Rationale:** During Magnum CAPI Phase 3 on the Bobcat testcloud, OCCM crashloop was traced to tenant networks being unable to reach OpenStack API endpoints — the libvirt FORWARD chain rejected cross-bridge packets between provider (virbr1) and metal (virbr2) bridges. With API VIPs on metal, tenant workloads cannot reach them. Putting API VIPs on the same network as the FIPs makes the API path tenant-reachable by construction.

**Address space layout for v1 (IPv4-only):**

| Range                       | Purpose                                                  |
| --------------------------- | -------------------------------------------------------- |
| `10.12.4.10 – 10.12.4.223`  | Neutron FIP pool                                         |
| `10.12.4.224 – 10.12.4.254` | Charm API VIPs (excluded from Neutron allocation\_pools) |


The Provider `/22` (`10.12.4.0/22`) carries both ranges within a single Neutron subnet. Neutron `allocation_pools` MUST exclude the API VIP range.

**v2-scope extension:** IPv6 Provider subnet adds parallel FIP and API VIP IPv6 IP Ranges within a single `/64`. See D-004.

---


## D-004 [v2-scope]: Dual-stack vs IPv6-only matrix

**Decision (v2-scope, NOT for v1):** Network role determines address family. IPv6 preferred; IPv6-only where the network has no external clients.

**v1 reality:** All networks are IPv4-only on the existing MAAS-provisioned layout. This matrix becomes active in v2.

| Role                       | IPv4 (v1)     | IPv4 (v2) | IPv6 (v2) | Reasoning                                           |
| -------------------------- | ------------- | --------- | --------- | --------------------------------------------------- |
| Metal                      | ✓             | ✓         | ✓         | Charm-to-charm; MAAS PXE IPv4-first                 |
| Provider                   | ✓             | ✓         | ✓         | Tenant FIPs need IPv4; API VIPs reachable from both |
| Data (Geneve underlay)     | ✓             | —         | ✓         | v2: no external clients; underlay agnostic          |
| Storage (Ceph public)      | ✓             | —         | ✓         | v2: `ms-bind-ipv6: true`; no external clients       |
| Replication (Ceph cluster) | ✓             | —         | ✓         | v2: internal OSD↔OSD only                           |
| LBaaS Management           | ✓             | ✓         | ✓         | Amphora image compatibility                         |
| OOB                        | n/a           | n/a       | n/a       | Bare-metal-only concern                             |
| OpenStack Tenant pool      | ✓ (v1: D-016) | —         | ✓         | v1 IPv4 hybrid; v2 IPv6 modeled                     |

---


## D-004a [v2-scope]: Host management → Metal

**Decision (v2-scope, NOT for v1):** Under v2, openstack0-3 host management IPs move from storage (`10.12.16.40-.43`) to Metal (`10.12.8.0/22`) when Storage becomes IPv6-only. v1 keeps host management on storage.

---


## D-005: Ceph release

**Decision:** Squid (Ceph 19, released October 2024).

**Rationale:** Matches Caracal default; one fewer source override in bundle; rehearses what Roosevelt will run. If Squid has rough edges, the testcloud is the place to find them, not production.

**Alternatives considered:**

- Reef (Ceph 18) — current on Bobcat testcloud; lower risk; would require `source: cloud:jammy-caracal` override on ceph-mon/ceph-osd while keeping `reef/stable` channel. Rejected: defeats the rehearsal purpose.

---


## D-006: Vault HA backend

**Decision:** etcd + easyrsa, per Canonical Charmed Vault HA docs.

**Rationale:** This is the documented charm path. The chicken-and-egg TLS dependency (Vault needs certs to start, but Vault issues certs) is resolved by easyrsa bootstrapping the etcd cluster's TLS, after which Vault relations to etcd come up cleanly.

**Topology on testcloud (v1):** Vault num_units=1 + hacluster relation (decorative; documents the relation pattern). Vault HA quorum is not actually exercised at testcloud scale.

**Topology on Roosevelt:** Vault num_units=3 + hacluster on metal space; etcd num_units=3; easyrsa num_units=1.

---


## D-007: Magnum inclusion

**Decision:** Magnum in bundle from day one. Two-layer install.

**Layer A — Bundle:**

- `magnum` charm
- `magnum-mysql-router` subordinate
- `magnum-dashboard` subordinate
- Standard relations: keystone, mysql-innodb-cluster (via router), rabbitmq-server, vault (certificates), openstack-dashboard
- Binding: `public: provider` with VIP on provider API VIP range
- Hacluster relation included (decorative on testcloud)


**Layer B — Post-deploy runbook (`runbooks/05-magnum-capi-driver.md`):**

- `juju run magnum/leader domain-setup --wait=10m`
- pip install `magnum-capi-helm==1.1.0` from PyPI into the magnum charm venv with `--break-system-packages` (stackhpc/magnum-capi-helm fork archived Dec 2024; canonical project moved to `openstack/magnum-capi-helm` on opendev/PyPI; 1.1.0 is the last Caracal-cycle release. Upstream tests against Magnum 2023.1+, so backward-compatible through Caracal 2024.1.)
- Deploy `/etc/magnum/kubeconfig` pointing at the **workload cluster** (the post-pivot home of CAPI controllers per **runbook 04a §17** `clusterctl move`). Staged on jumphost at `$HOME/magnum-capi/capi-mgmt-cluster.kubeconfig` by runbook 04a §19, transferred to the magnum unit by runbook 05 §6. Bobcat had this pointing at bootstrap k3s because the pivot was never executed; workstream 3b (2026-05-22) made the pivot mandatory.
- Systemd override replacing init.d ExecStart to load `--config-dir`
- `/etc/magnum/magnum.conf.d/99-capi.conf` setting `enabled_drivers=k8s_capi_helm_v1` and `[capi_helm] kubeconfig_file=/etc/magnum/kubeconfig` (ASCII-only; non-ASCII characters in conf.d cause silent daemon failures)


**CAPI mgmt plane:** Post-pivot, the workload cluster IS the CAPI management plane (per **runbook 04a §17**, `clusterctl move` pivots cluster state from the `capi-mgmt.maas` bootstrap k3s into the workload cluster, which becomes self-managing). Per **D-017**, both the bootstrap k3s and the workload cluster are rebuilt from scratch every deployment cycle — there is no preserved-across-rebuild artifact. The bootstrap install + pivot procedure lives in `runbooks/04a-capi-bootstrap-cluster.md` and runs **before** this runbook. This pattern transfers to Roosevelt unchanged.

**Superseded portions:** The "preserved across rebuild" stance in earlier drafts of this decision is **superseded by D-017**. See D-017 for rationale. The earlier `stackhpc/magnum-capi-helm` v0.13.0 driver pin is superseded by the `openstack/magnum-capi-helm` 1.1.0 pin above (driver source repo moved + archived).

---


## D-008: DNS architecture

**Decision:** Layered — static /etc/hosts for bootstrap + Designate (in bundle from day one) for tenant-level resolution.

**Naming convention:**

```
<service>.<cloud>.<dc>.<region>.cloud.neumatrix.local
```
Examples:

- `keystone.omega.dc0.vr0.cloud.neumatrix.local`
- `nova.omega.dc0.vr0.cloud.neumatrix.local`


**Bootstrap order:**

1. Static `/etc/hosts` on jumphost + all openstack0-3 hosts + all LXD containers
2. Bundle deploys with `os-public-hostname: <fqdn>` per API charm
3. Vault issues certs with FQDN in SAN
4. Post-deploy: Designate zone created, A records populated (v1: A records only; v2 adds AAAA records)
5. Neutron `default_dns_domain` and `dns_servers` configured to point at Designate
6. Tenant subnets created with `--dns-nameserver <designate-vip>`

---


## D-009: Hacluster modeling at testcloud scale

**Decision:** Include hacluster + VIP relations at num_units=1 across all HA-eligible API charms.

**Rationale:** Decorative at testcloud scale (a single unit can't form a real HA quorum). Documents the relation pattern so Roosevelt scale-up is mechanical: change `num_units: 1` → `num_units: 3` and rerun.

**Charms with hacluster relation:** keystone, glance, neutron-api, nova-cloud-controller, placement, openstack-dashboard, cinder, octavia, barbican, magnum, vault, designate.

---


## D-010: NetBox-upstream policy

**Decision:** NetBox is the single source of truth for IPAM at the **role and cloud-level pool** layer. Per-project tenant subnets are exempt under the hybrid model (D-016).

**Workflow:** Update NetBox → update bundle/overlay → commit both with cross-reference.

**Standing imports for v1 (gating the bundle):**

- VR0 DC0 site exists in NetBox ✓
- IPv4 prefixes for v1: Metal /22, Provider /22, LBaaS Mgmt /22 (via `netbox/ipv4-prefixes-import.py`) — **pending**
- Provider IP Ranges for FIPs and API VIPs (same script) — **pending**
- IPv4 tenant pool /16 (same script, per D-016) — **pending**
- IPv6 entries marked as Reservation status (via `netbox/ipv6-mark-reserved.py`) — **pending**


**Deferred to v2 (per Q2):** VR0 DC0-VLANs group additions beyond VID 240 (already imported during prior session work). MAAS currently uses untagged-per-fabric; modeling additional VLANs in NetBox without corresponding network-side tagging would be misleading documentation.

---


## D-011: Validation bar — Roosevelt-rehearsal level

**Decision:** Deployment is not considered successful until all of the following pass:

1. All charms `active/idle` in `juju status`
2. API reachability from jumphost (all public VIPs respond on hostname)
3. API reachability from a tenant VM (Option B verification)
4. Octavia LB pattern re-passes (round-robin, failover, recovery — per Bobcat v3 work)
5. End-to-end Magnum CAPI cluster creation succeeds, including OCCM not crash-looping
6. Vault unseal + auto-unseal-after-reboot pattern verified
7. KVM snapshot baseline taken (Phase 5)
8. Designate zones populated and tenant VMs resolve API hostnames


Validation script: `scripts/validate.sh` (TBD).

---


## D-012: Snapshot strategy

**Decision:** Two baseline snapshots.

- **Snapshot 1:** Post-deploy, post-validation, pre-tenant-resources. Clean cloud state — what a fresh install looks like.
- **Snapshot 2:** Post-tenant-setup. Includes domain1, project1, user1, openrc, flavors, base images (noble-amd64), keypair. Restore point for tenant work.


Snapshots are KVM/qcow2-level on the jumphost hypervisor. Per-VM.

---


## D-013: ~~Clean teardown of existing capi-mgmt~~ (SUPERSEDED by D-018)

**Original decision:** Before destroying the OpenStack model, gracefully delete the CAPI workload cluster on capi-mgmt.maas to allow OpenStack resources (LBs, FIPs, volumes) to be cleaned up properly by CAPI controllers.

**Original steps:** `kubectl delete cluster capi-mgmt-cluster` → wait for CAPI to clean up tenant-side OpenStack resources → `juju destroy-model openstack --destroy-storage --no-prompt`.

**Original "preserved across rebuild" claim:** capi-mgmt.maas bootstrap k3s + CAPI controllers re-used as the Magnum CAPI mgmt plane post-deploy.

**Status:** Superseded. See **D-018** for the replacement teardown strategy (MAAS-release-direct, skip graceful) and **D-017** for the replacement bootstrap cluster lifecycle (full rebuild every cycle, nothing preserved).

---


## D-014: Repository storage location and naming

**Decision:** Self-hosted GitBucket at `git.baldurkeep.com`.

**Repo path:** `jesse.austin/openstack-caracal-ipv4` (v1; IPv4-only).

**v2 repository:** TBD when v2 work begins. Two viable paths: sibling repo `openstack-caracal-ipv6` or `openstack-caracal-dualstack`, OR `v2` branch in this repo with an `overlays/v2-dualstack.yaml`. The single-repo-with-branch approach preserves history of what changed v1→v2 together; the sibling-repo approach keeps v1 frozen as a reference once v2 is in motion.

**Branching strategy:** `main` is canonical. Per-phase work in feature branches when a deploy is in progress; merge back to `main` at successful validation.

---


## D-015: v1 / v2 Fork

**Decision:** Caracal testcloud ships in two iterations.

**v1 (this repository, `openstack-caracal-ipv4`):** IPv4-only Caracal on existing MAAS-provisioned network layout. Proves the bundle, Option B binding fix, Magnum CAPI graft, Designate-from-day-one, hacluster relation pattern, and validation framework. Ships first.

**v2 (deferred):** Adds IPv6 / dual-stack per D-004. Requires upstream router infrastructure to be IPv6-capable, which is not currently the case in this environment. v2 work begins after v1 validation passes AND router-side IPv6 is in place.

**Rationale:** Decoupling the OpenStack-side rebuild from the network-side IPv6 readiness lets us prove the more-important architectural fix (Option B) without waiting on infrastructure work outside the OpenStack deployment's control. The IPv6 design intent is preserved as NetBox Reservation-status entries (per D-010 and `netbox/ipv6-mark-reserved.py`).

**v1→v2 migration scope (forward-look):**

- Re-IP roles per D-004 (add IPv6 sibling to Metal/Provider/LBaaS; move Data/Storage/Replication to IPv6-only)
- Move host management IPs from storage to Metal (D-004a)
- Re-bind charms to listen on both families where dual-stack
- Add AAAA records to Designate zones
- Add tenant IPv6 pool carve-outs

---


## D-016: IPv4 tenant pool — hybrid model (v1)

**Decision:** NetBox owns one upstream IPv4 tenant pool prefix for VR0 DC0. Per-project tenant subnets are Neutron-managed within that pool and are NOT modeled in NetBox.

**Pool allocation:** `10.20.0.0/16` (default; configurable in `netbox/ipv4-prefixes-import.py`). 65,536 addresses; 256 `/24`s available for per-project tenant subnets. Modeled under VR0 DC0 with role `openstack-tenant`.

**Per-project allocation pattern (operationally):**

When a project is created, allocate a /24 from the pool. Operator records the allocation in tenant-setup runbook output but does NOT create a NetBox prefix entry for it. Suggested convention: `10.20.<project-index>.0/24`, starting with `10.20.1.0/24` for project1, etc.

**Rationale (Option C from the discussion):**

- Option A (NetBox-modeled per-project) — full IPAM rigor; high friction for tenant lifecycle; round-trips to NetBox for ephemeral tenants.
- Option B (Neutron-only, no NetBox standing) — minimum friction; loses upstream visibility of total tenant footprint; violates D-010 in spirit.
- Option C (hybrid, chosen) — NetBox documents what space is reserved for tenants and prevents accidental collision with infra ranges; Neutron owns the lifecycle of individual tenant subnets without NetBox round-trips.


**Constraint:** Tenant CIDRs MUST be within the pool. The pre-flight checklist (`scripts/pre-flight-checks.sh`) should assert that proposed tenant subnets fall within the modeled pool.

**v2-scope counterpart:** IPv6 tenant pool `2602:f3e2:ff:0::/56` (NetBox-modeled, Reservation status in v1) becomes active in v2 with the same hybrid model — pool has NetBox standing, per-project IPv6 subnets Neutron-managed.

---


## D-017: CAPI bootstrap cluster lifecycle

**Decision:** L3 full teardown and rebuild every deployment cycle. The `capi-mgmt.maas` MAAS VM is released back to Ready state on teardown; on rebuild, it is re-deployed from scratch with Ubuntu 24.04, k3s, CAPI controllers, and ORC. **Nothing is preserved across cycles.**

**Rationale:**

- Rehearsal-first principle. If the bootstrap-cluster install procedure isn't documented and rehearsed, the runbook doesn't exist; if the runbook doesn't exist, surprises surface on Roosevelt.
- Self-imposed forcing function. Every rebuild exercises the full path: MAAS deploy → Ubuntu cloud-init → Vault CA install → k3s install with correct bind-address/SAN flags → kubeconfig server-URL rewrite → helm + clusterctl install → clusterctl init with canonical-kubernetes provider URLs → ORC install → cloud-side prep → cluster manifest render → apply → poll-to-Ready → kubeconfig copy.
- Disposability test. The Bobcat experience proved no critical state lives on capi-mgmt that isn't reproducible from the runbook and the OpenStack cloud. Wiping is safe.


**Runbook:** `runbooks/04a-capi-bootstrap-cluster.md` documents the install sequence in full. It runs **after** `02-deploy.md` (OpenStack cloud up) and **before** `05-magnum-capi-driver.md` (driver graft, which needs the bootstrap k3s kubeconfig).

**Supersedes:** the "preserved across rebuild" stance in earlier drafts of D-007 and D-013.

**Alternatives considered:**

- L1: Wipe just the cluster CRs, keep k3s + controllers. Rejected: skips the install rehearsal that's the whole point.
- L2: Wipe just the controllers, keep k3s. Rejected: same reason; the `clusterctl init` step is exactly the surface that needs rehearsing.
- L3 (chosen): Full wipe including the VM.

---


## D-018: Teardown strategy — skip graceful, release MAAS directly

**Decision:** On teardown, do not pursue graceful CAPI workload deletion or graceful OpenStack model destroy. Instead:

1. (Optional) Capture pre-destroy state for reference
2. `juju destroy-model openstack --force --no-wait --destroy-storage --no-prompt` (background)
3. MAAS release all 5 VMs (openstack0, openstack1, openstack2, openstack3, capi-mgmt) → Ready (parallel)
4. Verify both sides


**Rationale:**

- The rebuild's goal is rehearsing the Roosevelt deploy path. Roosevelt starts from MAAS-Ready bare-metal machines. The most faithful rehearsal is teardown-to-MAAS-Ready.
- Graceful CAPI workload teardown rehearses a different procedure (production cluster decommissioning) that doesn't transfer to Roosevelt's initial deploy.
- `juju destroy-model --destroy-storage` can hang on stuck hooks and leave partial state. `--force --no-wait` plus MAAS release is more reliable.
- Cloud-side OpenStack data (Keystone projects, Neutron networks, Glance images, app credentials) lives in MySQL on the openstack0-3 hosts. MAAS release wipes those hosts, so no separate cloud-side cleanup is needed.


**What is lost vs. graceful path:** verified-clean release path for CAPI workload resources (Octavia LBs, FIPs, CAPO-managed networks). All of these are destined for obliteration anyway; the loss is theoretical.

**What is gained:** ~30+ minutes saved; cleaner end-state guarantee; better Roosevelt rehearsal fidelity.

**Supersedes:** D-013.

**Runbook:** `runbooks/01-destroy-model.md` documents the four phases.

---


## D-019: Cloud DNS (Designate) deferred to v2 / Roosevelt

**Decision:** v1 ships with NO cloud-internal DNS; Designate is not deployed. Public service endpoints use FQDNs (`os-public-hostname`) that resolve to the provider VIPs via external/corporate DNS; internal and admin endpoints stay IP-based on the metal VIPs. Tenant instances use upstream resolvers (1.1.1.1 / 1.0.0.1). The D-011 acceptance bar is amended to drop the cloud-DNS criterion, and the planned `v1-do-doc-10-dns` runbook is dropped.

**Consequence (documented, not a blocker):** metal-only charm units that make catalog-based client calls pull the PUBLIC (FQDN) endpoint and cannot resolve or route it (the internal-endpoint certs carry no FQDN SAN). This is the root of the gss/retrofit amphora-pipeline constraint recorded in D-021. The proper fix (cloud-internal DNS + FQDN-valid certs, or charms consuming internal endpoints) is a Roosevelt item.

**Status:** Decided (v1). Reconstructed into this doc from the deploy record (no standalone D-019 file).

**Related:** D-008 (DNS architecture), D-021 (amphora-pipeline consequence), D-011 (acceptance bar amended).

---

## D-020: Dual provider + metal API VIPs on clustered charms

**Decision:** Every clustered OpenStack API application (keystone, glance, nova-cloud-controller, neutron-api, cinder, placement, barbican, octavia, openstack-dashboard, magnum, vault) is configured with BOTH a provider VIP and a metal VIP, as a space-separated pair: `vip: "10.12.4.X 10.12.8.X"` (Option B).

**Rationale:** with a provider-only VIP, `charms_openstack/ip.py:resolve_address(INTERNAL)` returns `None` and raises `ValueError`, breaking `identity-service-relation-joined` (and the analogous internal-endpoint registration on every clustered API charm). Supplying a metal-network VIP alongside the provider VIP gives `resolve_address` an internal address to return, and keeps east-west service traffic on the metal network rather than the provider network.

**Status:** Decided (v1). Reconstructed into this doc from the deploy record (no standalone D-020 file).

**Related:** D-003 (network architecture), D-002 (channels).

---

## D-021: Octavia amphora image pipeline on the no-DNS dual-endpoint deploy

**Decision:** build the amphora image with the charm-native `octavia-diskimage-retrofit` set `use-internal-endpoints: true`, seeded by a manually uploaded stock Ubuntu base image carrying the five Glance properties the retrofit reads (architecture, os_distro, os_version, version_name, product_name). Park `glance-simplestreams-sync` for the amphora pipeline. The amphora image is `image-format: raw`, tagged `octavia-amphora` to match octavia's `amp-image-tag`.

**Root cause:** on the dual-endpoint, no-DNS topology (D-019), metal-net catalog-callers (gss + its retrofit subordinate) cannot reach Glance: the public Glance FQDN does not resolve/route from the metal net, and the internal-endpoint cert carries no FQDN SAN (so an `/etc/hosts` FQDN->metal-VIP mapping fails TLS). gss `use-internal-endpoints` steers only its Keystone auth to internal; its glance/swift clients still use the public FQDN and there is no further charm-native lever -- a charm gap on the no-DNS topology. The retrofit's `use-internal-endpoints` lever DOES cover its build path, so it is the charm-native amphora builder here.

**Status:** Decided + validated end-to-end (v1): the retrofit, over internal endpoints, reads the seeded base and writes the amphora; gss parked; octavia + subordinates active/idle.

**Roosevelt:** cloud-internal DNS + FQDN-valid certs removes the manual seed and fixes gss end to end.

**Related:** D-007 (Octavia inclusion), D-019 (no-DNS root cause).

---

## D-028: Defer the CAPI v1beta2-contract cutover (deploy the single-contract v1beta1 stack)

**Decision:** defer adopting the CAPI v1beta2-CONTRACT generation until upstream ships it correctly for this path; deploy the clean single-contract v1beta1 stack now.

**Context:** while grounding the (then-current) Canonical CK8s workload chart, the chart referenced control-plane/bootstrap kinds at apiVersion v1beta1 while the pinned provider served them only at v1beta2 (DOCFIX-022). The broader question -- is the v1beta2-contract generation available and correct for long-term support on this path -- resolved to "not yet."

**Status:** Decided (v1). The CK8s-chart-specific particulars were subsequently retired when D-031 replaced the direct-CAPI CK8s path with Magnum + the azimuth kubeadm charts; the single-contract principle carries forward, and D-042 later made the driver-side contract axis concrete.

**Builds on:** D-022 / D-023 (do-07-era CAPI/CRD work). **Related:** D-031, D-042.

---

## D-029: Defer Keystone SSO (k8s-keystone-auth) to Roosevelt

**Decision:** Keystone SSO for the workload clusters (the chart's `k8s-keystone-auth` addon) is deferred to the next deployment and folded into the Roosevelt cloud-internal-DNS + trusted-cert foundation. v1 workload clusters run the Kubernetes Dashboard with standard token auth; the `k8sKeystoneAuth` addon stays OFF; SSO is not validated on v1.

**Rationale:** enabling it on v1 would produce a non-functional SSO path (TLS failure to the private-CA Keystone endpoint) plus apiserver webhook error noise -- a checked box that does not work -- and forcing it would require forking the addon or fighting CAAPH, neither of which carries forward to Roosevelt.

**Finding (verified 2026-06-05):** k8s-keystone-auth 1.5.1 exposes no keystone-CA option, so it cannot trust a private-CA Keystone endpoint.

**Status:** Decided (v1). **Related:** D-028 (same "land it on the proper foundation later" principle).

---

## D-030: Management-cluster placement -- in-cloud (superseded twice; see D-033, D-035)

**Decision (as taken 2026-06-06):** run the CAPI management plane IN-CLOUD for the v1 rehearsal (CAPI core + CAPO + cluster-api-addon-provider as VMs on the OpenStack cloud, following an Azimuth seed + HA pattern with a `clusterctl move` pivot to a self-hosted in-cloud management cluster). Out-of-cloud was recorded as a deferred alternative for Roosevelt.

**Status:** SUPERSEDED. First by D-033 (out-of-cloud Canonical `k8s`-charm on MAAS); then -- after D-033's dual-homed node hit an unfixable pod-egress fault -- placement returned in-cloud in a simpler single-homed form under D-035. Retained here for lineage.

**Related:** D-031, D-033, D-035.

---

## D-031: Cluster-creation surface + engine -- Magnum + magnum-capi-helm + azimuth kubeadm charts

**Decision:** the tenant Kubernetes service is built from three layers:
- Surface: OpenStack Magnum (`openstack coe cluster ...`), so tenants and operators manage clusters through the OpenStack API.
- Driver: the in-tree Cluster API Helm driver `magnum-capi-helm` (opendev.org/openstack/magnum-capi-helm), pip-installed into the Magnum conductor and pointed at a CAPI management cluster via `[capi_helm] kubeconfig_file`.
- Engine: the azimuth-cloud `capi-helm-charts` `openstack-cluster` chart (kubeadm-based: KubeadmControlPlane / KubeadmConfigTemplate + CAPO OpenStackCluster / OpenStackMachineTemplate + MachineDeployment), with addons (Cilium CNI, OpenStack CCM, Cinder CSI, and so on) installed by the cluster-api-addon-provider.
- Management-cluster placement: in-cloud for v1 (D-030, later refined by D-035).

**Status:** Decided. Supersedes the do-07 direct-CAPI Canonical CK8s chart path; the CK8s-chart-specific findings (DOCFIX-022 ref patch, etc.) are retired for this path.

**Related:** D-030 / D-035 (placement), D-034 (version constellation), D-036 / D-042 (driver/chart/core coherence).

---

## D-033: Management cluster -- out-of-cloud Canonical k8s-charm on MAAS (superseded by D-035)

**Decision (as taken 2026-06-07):** management cluster = a Canonical Kubernetes cluster deployed with the `k8s` / `k8s-worker` machine charms on MAAS, OUTSIDE OpenStack, made HA by the charms; CAPI layer via `clusterctl init --infrastructure openstack` + cluster-api-addon-provider, version-pinned to the capi-helm-charts release (NOT the D-022 do-07 pins); the management cluster does not run the OpenStack CCM for itself (CAPO reaches OpenStack through a `clouds.yaml` pointed at the public API endpoints); lifecycle via Juju.

**Status:** SUPERSEDED by D-035. The chosen node (capi-mgmt MAAS VM) is necessarily dual-homed (MAAS PXE on metal, API VIPs on provider), and pod egress from that multi-NIC node to the API VIPs failed (the Cilium reverse-NAT reply was mis-forwarded out the wrong NIC instead of redirected into the pod). Retained here for lineage.

**Supersedes:** D-030 (placement) + D-032 (azimuth-config tooling). **Builds on:** D-031.

---

## D-034: CAPI version constellation pinned to capi-helm-charts dependencies.json

**Decision:** pin the management-cluster CAPI constellation to the `dependencies.json` published with a chosen `capi-helm-charts` RELEASE TAG, read at deploy time on the jumphost with `jq` (dynamic lookup, no hand-picked versions). Retire D-022 "Option A" (driver 1.3.0 / CAPO v0.10.x / v1alpha6) as obsolete.

**Rationale:** the magnum-capi-helm driver does not hand-pick component versions; its own CI installs the management CAPI stack by reading the per-release `dependencies.json` and running a fixed install sequence -- that file is the single matched-and-tested set. Hand-picking fights the upstream model, and v1alpha6 has been removed from current cluster-api-provider-openstack. (At tag 0.25.1 the set is CAPI v1.13.2, CAPO v0.14.4, cert-manager v1.20.2, ORC v2.5.0, addon-provider 0.12.0, janitor 0.11.0, helm v3.17.3; appendix-B carries the as-built snapshot.)

**Status:** Adopted 2026-06-08. **Supersedes:** D-022. **Amended by:** D-042 (adds the driver<->core contract-coherence rule). **Related:** D-031, D-028 (CRD-contract note, now subsumed).

---

## D-035: Management-cluster placement -- in-cloud single-homed tenant VM

**Decision:** run the CAPI management cluster as a single-homed in-cloud tenant VM (`capi-mgmt-v2`): one NIC on the management tenant subnet (10.20.0.0/24), reached via a floating IP (10.12.7.40); k8s-snap (channel `1.32-classic/stable`), Cilium CNI; not CAPI-self-managed (no `clusterctl move`).

**Rationale:** D-033's out-of-cloud node was necessarily dual-homed and its pod egress to the OpenStack API VIPs failed -- the Cilium reverse-NAT reply was emitted back out the second NIC instead of being redirected into the pod via `cilium_host` (a multi-NIC reverse-path fault; the `k8s` charm exposes too few Cilium annotations to repair it). A single-homed VM removes the second NIC and the fault entirely. The single-NIC pod-egress premise was then proven by the Phase 4 hard gate (an agnhost pod TCP probe to the Keystone VIP 10.12.4.50:5000 returning exitCode 0).

**Status:** Adopted 2026-06-08; pod-egress premise validated. **Supersedes:** D-033 (revisits D-030 in simpler form). **Unaffected:** D-031, D-034.

**Trade-off:** a single-node management cluster is a SPOF with no self-heal -- see D-041 (manual-start policy) and D-040 (the OOM that surfaced it).

---

## D-036: magnum-capi-helm driver / chart / CAPO coherence (resolved)

**Decision / correction:** a mid-session "rebuild Phase 5 on chart 0.10.1" framing -- premised on the GA driver (1.3.0) emitting the v1alpha6 OpenStackCluster CRD and clashing with the modern v1beta1 stack -- is WRONG and is retired. Chart 0.10.1 is the retired v1alpha6 path that D-034 superseded; rebuilding on it would have reversed D-034.

**Verification:** the 1.3.0 driver is api_version-AGNOSTIC (driver.py has zero v1alpha6/v1beta1/apiVersion references; it helm-installs the chart and watches the CAPI `Cluster`, never writing OpenStackCluster directly). The OpenStackCluster apiVersion is set by the CHART: chart 0.25.1 emits `infrastructure.cluster.x-k8s.io/v1beta1`, matching the installed CAPO v0.14.4. The driver's built-in default chart is 0.10.1 (the v1alpha6-era chart); overriding `default_helm_chart_version` to 0.25.1 yields v1beta1. The "1.3.0 emits v1alpha6" claim was true only of the driver's DEFAULT chart, not of the driver pinned to chart 0.25.1.

**Status:** Resolved 2026-06-08. Implements D-031 Phase 3 under the D-034 constellation. NOTE: a SEPARATE axis -- the driver-vs-core CONTRACT, not the chart's CRD string -- is what later required the 1.4.0 driver pin; see D-042. **Related:** D-031, D-034.

---

## D-037: [capi_helm] config persistence on the charm-managed conductor

**Decision:** keep the `[capi_helm]` section in an oslo.config drop-in directory and point the conductor at it: `/etc/magnum/magnum.conf.d/00-capi-helm.conf` (0644, no secrets; it references the 0600 kubeconfig by path), with magnum-conductor launched with `--config-dir /etc/magnum/magnum.conf.d` so oslo.config merges the drop-in over the charm-rendered `magnum.conf`. The charm manages neither the .conf.d directory nor the launch extension, so this survives charm hooks and reproduces on Roosevelt.

**Problem:** the magnum charm (2024.1/stable rev 70) re-renders `magnum.conf` wholesale on hooks and exposes no conf-override option, so a `[capi_helm]` section written into `magnum.conf` would be clobbered.

**Mechanism (load-bearing correction):** the conductor's ExecStart is NOT a direct binary -- it is `/etc/init.d/magnum-conductor systemd-start` (an LSB init script wrapped by systemd), so a systemd ExecStart drop-in appending `--config-dir` is inert (the flag reaches the init script as an ignored positional). The adopted method instead creates `/etc/default/magnum-conductor` (0644; the charm does not manage it) containing `DAEMON_ARGS="$DAEMON_ARGS --config-dir /etc/magnum/magnum.conf.d"`; the init script sources `/etc/default/$NAME` after setting the base `DAEMON_ARGS`, then runs `exec $DAEMON $DAEMON_ARGS`. Verify behaviorally with `/etc/init.d/magnum-conductor show-args` and `ps -ww -C magnum-conductor -o args` (not string-presence).

**Status:** Adopted 2026-06-08 (mechanism revised mid-implementation). **Residual:** breaks silently if a future charm hook writes `/etc/default/magnum-conductor` -- detect via the same show-args/ps check. **Related:** D-031 Phase 3, D-036.

---

## D-040: Raise nova-compute reserved-host-memory on the hyperconverged hosts

**Decision:** set `nova-compute reserved-host-memory` to 8192 MB (from the default 512) so Nova placement accounts for the non-Nova memory co-located on each hyperconverged host. Charm config -> survives redeploy.

**Trigger / root cause:** during the first end-to-end Magnum workload-cluster create, openstack1 hit the kernel OOM-killer (no reboot; single boot since 2026-06-03) and killed a tenant qemu worker VM. The host co-locates nova-compute AND roughly 6 GiB of services invisible to Nova placement (mysqld [innodb-cluster member] ~2.9G, ceph-osd + ceph-mon ~1.2G, neutron workers ~0.7G, nova/apache/cinder/ovs ~1.4G) while Nova reserved only the default 512 MB; under the resulting memory pressure the host swap-thrashed (an ovsdb inactivity-probe storm made the workload API and Juju agent look "down" when the host was in fact thrashing, not down).

**Status:** Adopted + APPLIED 2026-06-09. **Related:** D-035 (the mgmt-VM SPOF the OOM hit), D-041.

---

## D-041: Non-HA deployments default to manual start

**Decision:** non-HA deployments default to MANUAL START -- no automatic VM power-on / auto-recovery is configured by default. Any non-HA deployment must be documented as non-HA, with the rationale that manual-down surfaces incidents (auto-restart masks capacity/health defects). Auto-recovery is an explicit, out-of-band exception, never the silent default.

**Trigger:** after the openstack1 OOM (D-040), CAPI's MachineHealthCheck self-healed the workload worker VMs automatically, but the single-node management VM (capi-mgmt-v2, D-035) was OOM-killed and stayed SHUTOFF -- it does not self-heal or auto-restart, which silently broke magnum reconcile/health and left workload nodes with the CAPI uninitialized taint until it was started by hand. The cost (downtime) was real, but the manual-down is also what forced the investigation that found the OOM root cause headed for Roosevelt.

**Status:** Adopted 2026-06-09 (policy/governance). **Related:** D-035 (the SPOF), D-040 (the OOM).

---

## D-042: magnum-capi-helm driver must be contract-coherent with the CAPI core

**Decision (amends D-034):** the magnum-capi-helm driver pin (Layer B) MUST be contract-coherent with the CAPI core that `dependencies.json` installs (Layer A). When the Layer-A lockfile is a v1beta2-contract core (CAPI v1.13), the driver pin must be a build that understands v1beta2 references; verify this intersection at deploy.

**Symptom / root cause:** capi-test-1 reached CREATE_COMPLETE with every real component healthy (3 Ready nodes, Calico, CCM/CSI/CoreDNS, API LB ACTIVE), yet magnum reported `health_status = UNHEALTHY` deterministically -- only the `infrastructure` sub-check failed ("Infrastructure resource not found"). The 1.3.0 driver reads `apiVersion` off the Cluster's `spec.infrastructureRef`, but under the v1beta2 contract that ref is version-less, so the health GET resolves nothing. The create path is unaffected (the chart templates the resource versions) -- a cosmetic health false-negative. The governing axis is the CAPI CONTRACT a provider implements toward core, not the CRD apiVersion string (per D-028); rolling back to a v1beta1 core would mean pinning an EOL CAPI for a Roosevelt rehearsal -- the wrong direction.

**Fix:** pin a driver build carrying the per-kind `[capi_helm] api_resources` override and set it so the health lookups use the served versions. As of 2026-06-09, D-042 recorded this capability as UNRELEASED (development series only; released line then 1.1.0/1.2.0/1.3.0), with the interim = a current-series commit for the testcloud and a released-tag pin deferred to Roosevelt.

**Subsequent update (driver-fix work):** the released `magnum-capi-helm==1.4.0` was then confirmed to ship the `api_resources` feature, so the released-tag pin is now available -- v1 pins 1.4.0 with an explicit `api_resources` and targets `health_status = HEALTHY` (installed in phase-07; as-built in appendix-B). This replaces D-042's interim dev-commit path.

**Operational caveat (while any health false-negative persists):** do NOT wire magnum auto-healing to `health_status` -- a persistent false UNHEALTHY could misfire; CAPI MachineHealthCheck handles node healing independently.

**Status:** Adopted 2026-06-09; fix landed via the 1.4.0 pin. **Amends:** D-034. **Related:** D-028 (the contract axis made concrete), D-031, D-035.

---


From prior bundle review work — these are anti-patterns:

- `magnum-shared-db` missing colon — causes a relation endpoint syntax error, deploy-blocking. Bundle must use `- - magnum:shared-db` (with the colon).
- Empty `osd-devices` YAML anchor referenced by multiple ceph-osd applications.
- `ovn-chassis` binding `overlay-suffix` — invalid binding name. Correct value is `data`.
- GUI annotation collision between NUMA-split ceph-osd apps (not applicable to testcloud since we don't NUMA-split, but flagged for Roosevelt).
- Hardcoded NIC name in `bridge-interface-mappings`. Use MAC where possible.
- `openstack -f value` column ordering — column order is not guaranteed; use `-c <column> -f value` for single-column output.
- Snap confinement: `openstackclients` snap has home-only interface; commands cannot read paths under `/tmp`. File paths must resolve under `$HOME`.
- Non-ASCII characters in `local_settings.d` overrides cause silent daemon failures in Horizon.

---


## Change log

| Date       | Change                                                                                                                                                                                                                            | Reference                                            |
| ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------- |
| 2026-05-22 | Initial document captured                                                                                                                                                                                                         | Caracal rebuild planning session                     |
| 2026-05-22 | D-015 v1/v2 fork added; D-004 and D-004a marked v2-scope; D-016 IPv4 tenant pool hybrid model added; D-014 updated with new repo name                                                                                             | v1/v2 fork session                                   |
| 2026-05-22 | D-017 CAPI bootstrap full-rebuild lifecycle added; D-018 MAAS-release-direct teardown added. D-013 marked superseded by D-018. D-007 Layer B updated to reference D-017 and `runbooks/04a-capi-bootstrap-cluster.md`.             | Teardown planning + handoff session                  |
| 2026-05-22 | D-002 hacluster row added (channel `2.4/stable`) per Canonical Charm Delivery table, verified against Charmhub. D-007 Layer B driver pin updated: `stackhpc/magnum-capi-helm` v0.13.0 → `openstack/magnum-capi-helm` 1.1.0 (PyPI; stackhpc fork archived Dec 2024). | Caracal channel verification + driver pin correction |
| 2026-05-22 | D-007 Layer B kubeconfig target corrected: bootstrap k3s → workload cluster (post-pivot per workstream 3b mandatory `clusterctl move`). CAPI mgmt plane paragraph updated accordingly. | Workstream 3 cleanup (post-pivot semantics) |
| 2026-05-29 | D-019 (Designate deferral) and D-020 (dual provider+metal API VIPs) recorded as already-taken; folded into this doc in the 2026-06-09 consolidation. | Deploy execution / handoff |
| 2026-05-30 | D-021 Octavia amphora pipeline (charm-native retrofit over internal endpoints; gss parked) added. | Octavia enablement |
| 2026-06-05 | D-028 (defer v1beta2-contract cutover) and D-029 (defer Keystone SSO) added. | CAPI path research |
| 2026-06-06 | D-030 (mgmt-cluster placement: in-cloud) and D-031 (Magnum + magnum-capi-helm + azimuth kubeadm engine) added. | Magnum/CAPI surface decisions |
| 2026-06-07 | D-033 (mgmt cluster: out-of-cloud k8s-charm on MAAS) added; supersedes D-030 and D-032. | Mgmt-cluster shape |
| 2026-06-08 | D-034 (CAPI constellation pinned to dependencies.json; supersedes D-022), D-035 (in-cloud single-homed mgmt VM; supersedes D-033), D-036 (driver/chart/CAPO coherence resolved), D-037 ([capi_helm] via /etc/default DAEMON_ARGS) added. | In-cloud mgmt pivot |
| 2026-06-09 | D-040 (reserved-host-memory 8192), D-041 (non-HA manual-start policy), D-042 (driver<->core contract coherence; 1.4.0 pin) added. | OOM incident + driver fix |
| 2026-06-09 | D-019..D-042 consolidated into this document (15 decisions). Existing D-001..D-018 left intact (em-dash style preserved); the new entries are ASCII. | Repo sanitation / doc refresh |
