diff --git a/docs/design-decisions.md b/docs/design-decisions.md index 6e54eaf..3e3aab3 100644 --- a/docs/design-decisions.md +++ b/docs/design-decisions.md @@ -153,7 +153,9 @@ ## D-008: DNS architecture -**Decision:** Layered — static /etc/hosts for bootstrap + Designate (in bundle from day one) for tenant-level resolution. +**Status:** Superseded by D-019 (2026-05-27). v2-scope. Original decision text preserved below for audit. + +**Decision (original; superseded):** Layered — static /etc/hosts for bootstrap + Designate (in bundle from day one) for tenant-level resolution. **Naming convention:** @@ -220,11 +222,12 @@ 5. End-to-end Magnum CAPI cluster creation succeeds, including OCCM not crash-looping 6. Vault unseal + auto-unseal-after-reboot pattern verified 7. KVM snapshot baseline taken (Phase 5) -8. Designate zones populated and tenant VMs resolve API hostnames Validation script: `scripts/validate.sh` (TBD). +**Amendment (2026-05-27):** Per D-019, the "Designate resolves" criterion (former item 8) is removed for v1. Designate is deferred to v2; tenant subnets resolve via public DNS. v2 will reinstate a DNS-resolution validation criterion calibrated to whatever DNS mechanism is in place (NS delegation from corporate DNS, or otherwise). + --- @@ -369,6 +372,50 @@ --- +## D-019: DNS scope reduction for v1 — Designate deferred to v2 + +**Decision (2026-05-27):** Designate is removed from the v1 testcloud bundle and deferred to v2 alongside corporate DNS / NS delegation work. v1 tenant subnets resolve via public DNS (`1.1.1.1`, `1.0.0.1`) directly via the `--dns-nameserver` option at subnet-create time. + +**Supersedes:** D-008 (DNS architecture). + +**Amends:** D-011 (validation bar — removes "Designate resolves" criterion). + +### Rationale + +Three findings from the 2026-05-27 testcloud topology investigation: + +1. **Outside-in DNS** (corporate clients resolving `*.cloud.neumatrix.local`) is not needed for v1. Corporate access to the cloud already flows through the existing `openstack.baldurkeep.com → 10.17.4.20 → 10.12.x` HTTPS proxy chain (handled by the edge nginx at `10.17.8.7`), which does not depend on corporate-side resolution of cloud-internal FQDNs. + +2. **The edge nginx cannot route to `10.12.x` directly.** Inspection confirmed the edge has only `10.17.8.7/22` plus a tailscale interface; reaching `10.12.4.x` requires the libvirt-host NAT path. Adding DNS to the testcloud would require parallel UDP/53 NAT/proxy plumbing across three hosts (edge nginx, libvirt host, internal nginx) for a feature that has no v1 consumer. + +3. **Inside-out DNS** (tenant VMs resolving external names) is satisfied by tenant subnets pointing `--dns-nameserver` at public DNS (`1.1.1.1`, `1.0.0.1`). Designate is not needed in the inside-out path either, since: + - Tenant VMs do not need to resolve cloud-internal FQDNs (their API access goes through documented IPs / `--cloud` configs in cloud.conf) + - Cross-tenant DNS visibility is not a v1 requirement + +The remaining v1 use case for Designate (FIP DNS auto-registration via the `neutron-api ↔ designate` integration) is informational only — nothing in v1 consumes those records. + +### v1 implementation + +- Tenant subnets created with `--dns-nameserver 1.1.1.1 --dns-nameserver 1.0.0.1` (or via the openrc `OS_DNS_NAMESERVERS` env) +- CAPI workload cluster template variable `OPENSTACK_DNS_NAMESERVERS` set to `1.1.1.1,1.0.0.1` (per `v1-do-doc-07-capi-bootstrap.md` §13) +- Cloud-internal `*.cloud.neumatrix.local` FQDN tree resolved via static `/etc/hosts` on bootstrap-relevant hosts (jumphost, openstack0-3, LXD containers per charm bootstrap, capi-mgmt — staged in `v1-do-doc-05-vault-init.md` §11 and `v1-do-doc-07-capi-bootstrap.md` §6) +- Charms continue to use FQDN-based `os-public-hostname` (cert SANs depend on it) — internal resolution via `/etc/hosts` is sufficient + +### v2 plan + +- Re-introduce Designate (charm + designate-bind + relations + hacluster sub) +- NS delegation from corporate DNS to designate-bind on a real (non-NAT) network VIP +- Tenant subnets transitioning to use Designate VIP as their resolver (after corporate DNS delegation lands) +- Designate v2 deploy on a real-network Roosevelt or v2-testcloud topology where the bridging-host complexity from v1 testcloud does not apply +- D-011 validation re-introduces a calibrated DNS-resolution criterion (mechanism TBD: NS delegation working end-to-end vs static A records at corporate DNS) + +### v2-residency note + +The IPv6 prefixes already imported into NetBox (and marked Reservation status) include allocations that would be appropriate for Designate's VIPs in a v2 design — these stay in NetBox as Reservation until v2 work begins. + +--- + + From prior bundle review work — these are anti-patterns: - `magnum-shared-db` missing colon — causes a relation endpoint syntax error, deploy-blocking. Bundle must use `- - magnum:shared-db` (with the colon). @@ -391,3 +438,4 @@ | 2026-05-22 | D-015 v1/v2 fork added; D-004 and D-004a marked v2-scope; D-016 IPv4 tenant pool hybrid model added; D-014 updated with new repo name | v1/v2 fork session | | 2026-05-22 | D-017 CAPI bootstrap full-rebuild lifecycle added; D-018 MAAS-release-direct teardown added. D-013 marked superseded by D-018. D-007 Layer B updated to reference D-017 and `runbooks/04a-capi-bootstrap-cluster.md`. | Teardown planning + handoff session | | 2026-05-22 | D-002 hacluster row added (channel `2.4/stable`) per Canonical Charm Delivery table, verified against Charmhub. D-007 Layer B driver pin updated: `stackhpc/magnum-capi-helm` v0.13.0 → `openstack/magnum-capi-helm` 1.1.0 (PyPI; stackhpc fork archived Dec 2024). | Caracal channel verification + driver pin correction | +| 2026-05-27 | D-019 added (DNS scope reduction; Designate deferred to v2). D-008 marked superseded by D-019. D-011 amended to remove "Designate resolves" criterion. | Testcloud topology investigation + v1 scope refinement |