Newer
Older
openstack-caracal-ipv4 / docs / v1-pre-deploy-fixes.md

v1 Pre-Deploy Fixes (v2 — includes Designate deferral)

Purpose: Single-pass repo hygiene before v1 deployment execution begins. Apply these fixes as one logical commit per group (nine commits total) before any execution document runs.

Status: Authoritative for the v1 deploy track. Supersedes the v1 draft of this document.

Scope: Repo-only changes. No cloud state is touched by this document. All changes are reviewed locally, committed to main, and pushed before the next do-document runs.

What changed in v2 of this change list (2026-05-27):

  • Added commits 7-9 implementing the Designate-deferral decision (D-019).
  • Amended commit 5 (deprecated runbook moves) — 07-dns-zones.md is now permanently deprecated per D-019, not "replaced by v1-do-doc-10-dns."
  • Amended commit 6 (deprecated README content) — same.
  • Updated commit 4 (README.md refresh) — adds language reflecting Designate deferral to the v1 scope section.
  • Updated §10 verification commands to expect 11 VIPs, not 12.

Cross-references

  • D-002 (channel matrix) — Vault row cleanup
  • D-005 (Ceph Squid release)
  • D-008 (DNS architecture) — superseded by D-019; v2-scope
  • D-011 (validation bar) — amended by D-019 (Designate criterion dropped)
  • D-014 (repo path) — stale path correction
  • D-017 (CAPI bootstrap cluster lifecycle) — supersedes runbook 00 Phase 5
  • D-018 (teardown strategy) — supersedes runbook 00 Phase 4
  • D-019 (NEW) — Designate deferral to v2; tenant resolvers use public DNS
  • Charmed Ceph charm-ceph-osd config.yaml and metadata.yaml (osd-devices semantics)
  • Ceph BlueStore configuration reference (single-device co-located OSD pattern)

1. Bundle: remove ceph-osd storage: block

What

In bundle.yaml, under the ceph-osd application, delete the entire storage: block. The options.osd-devices: /dev/vdb line stays.

Why

The osd-devices declared under storage: is additive to the options.osd-devices value per the charm-ceph-osd config.yaml: "These devices are the range of devices that will be checked for and used across all service units, in addition to any volumes attached via the --storage flag during deployment."

Concretely, with the current bundle:

  • options.osd-devices: /dev/vdb → one OSD per unit using the 512 GB libvirt-attached disk
  • storage.osd-devices: loop,1024M → an additional 1 GB loopback OSD per unit

Total: 2 OSDs per unit × 4 units = 8 OSDs, against expected-osd-count: 4 on ceph-mon. The 1 GB OSDs are below practical minimums, asymmetric with the 512 GB primaries (CRUSH-weighting anti-pattern), and provide no operational value.

The remaining bluestore-db, bluestore-wal, cache-devices, and osd-journals loopback entries are also being removed — not because they break anything, but because:

  1. BlueStore co-locates DB and WAL on the data device when no separate volume is supplied (Ceph Reef BlueStore reference).
  2. Loopback files on the same backing storage as /dev/vdb are not "faster than the primary device," so the standard rationale for separate DB/WAL devices doesn't apply.
  3. osd-journals is unused under BlueStore (default since Luminous).

Diff

# BEFORE (in bundle.yaml, applications.ceph-osd)
  ceph-osd:
    charm: ceph-osd
    channel: squid/stable
    num_units: 4
    to: ["8", "9", "10", "11"]
    options:
      source: *ceph-source
      osd-devices: /dev/vdb            # libvirt-attached, MAAS-untracked, wiped 2026-05-22
    bindings: *internal-bindings
    constraints: arch=amd64 tags=openstack
    storage:                            # Loop-backed auxiliaries (testcloud has no real SSDs)
      bluestore-db:  loop,1024M
      bluestore-wal: loop,1024M
      cache-devices: loop,10240M
      osd-devices:   loop,1024M
      osd-journals:  loop,1024M         # Legacy storage name still in squid metadata; benign

# AFTER
  ceph-osd:
    charm: ceph-osd
    channel: squid/stable
    num_units: 4
    to: ["8", "9", "10", "11"]
    options:
      source: *ceph-source
      osd-devices: /dev/vdb            # libvirt-attached, MAAS-untracked, wiped 2026-05-22
    bindings: *internal-bindings
    constraints: arch=amd64 tags=openstack

Commit message

bundle: remove ceph-osd storage block to match expected-osd-count

The storage: block declared a second osd-devices entry (loop,1024M) which
is additive to options.osd-devices per the charm config. That produced
8 OSDs against expected-osd-count: 4 on ceph-mon, with 1 GB loopback
OSDs as the asymmetric secondaries — a CRUSH-weighting anti-pattern.

Real production storage (DB/WAL on actual SSDs) will be declared on
Roosevelt. For the testcloud, BlueStore co-locates DB/WAL on the data
device which is the documented default for single-device setups.

osd-journals is unused under BlueStore.

2. design-decisions.md: D-002 — remove Vault from OpenStack-core row

What

In docs/design-decisions.md under D-002 (channel matrix), the OpenStack-core row currently lists , vault as one of the components on 2024.1/stable. Remove that token.

Why

Vault has its own track per Canonical's charm delivery table — it runs on 1.8/stable, not 2024.1/stable. The D-002 table elsewhere (and the actual bundle.yaml) already reflects this; the OpenStack-core row description was a leftover from earlier drafting.

Diff

-| OpenStack core API charms (keystone, glance, nova-cloud-controller, neutron-api, cinder, placement, octavia, barbican, magnum, designate, openstack-dashboard, vault) | `2024.1/stable` |
+| OpenStack core API charms (keystone, glance, nova-cloud-controller, neutron-api, cinder, placement, octavia, barbican, magnum, designate, openstack-dashboard) | `2024.1/stable` |

Commit message

docs/design-decisions: D-002 — drop vault from OpenStack-core channel row

Vault uses 1.8/stable per the Canonical charm delivery table, not the
2024.1/stable OpenStack-core track. The bundle.yaml already reflects
this; the design-decisions D-002 description had a stale token.

Note: the designate token in the row above is correct as-of this commit (Designate is still on 2024.1/stable channel). The Designate row is removed entirely by commit 8 (D-019).


3. design-decisions.md: D-014 — update repo path

What

In docs/design-decisions.md under D-014 (repo location), the path currently shows the per-user namespace from before the 2026-05-27 move. Update to the OpenStack-group path.

Why

Per the user-memory pinned note: "Caracal rebuild repo (moved to OpenStack group 2026-05-27): https://git.baldurkeep.com/OpenStack/openstack-caracal-ipv4 (web), https://git.baldurkeep.com/git/OpenStack/openstack-caracal-ipv4.git (clone). Old jesse.austin/openstack-caracal-ipv4 path no longer exists; GitBucket does not redirect."

Diff

 ## D-014: Repository location

-**Decision:** `git.baldurkeep.com/jesse.austin/openstack-caracal-ipv4` for v1.
+**Decision:** `git.baldurkeep.com/OpenStack/openstack-caracal-ipv4` for v1.
+
+- Web: `https://git.baldurkeep.com/OpenStack/openstack-caracal-ipv4`
+- Clone: `https://git.baldurkeep.com/git/OpenStack/openstack-caracal-ipv4.git`
+- Moved from `jesse.austin/openstack-caracal-ipv4` to the `OpenStack` group on 2026-05-27. GitBucket does not redirect from the old path.

 **Rationale:** Establishes a single repo per cloud lifecycle. v2 path TBD.

Commit message

docs/design-decisions: D-014 — update repo path after move to OpenStack group

Repository moved from jesse.austin/openstack-caracal-ipv4 to the
OpenStack group on 2026-05-27. GitBucket does not redirect, so the
prior path is dead.

4. README.md: refresh stale references

What

Three hunks in README.md:

  1. Update the design-decisions reference range from "D-001 through D-016" to "D-001 through D-019" (post-D-017/D-018/D-019 additions).
  2. Replace the inline runbook 00 description (which still mentions backups + capi-mgmt graceful teardown — both invalidated by D-017/D-018).
  3. Replace the 12-step deploy order with a pointer to the do-document set.
  4. Add a v1 scope reduction note for Designate (per D-019).

Why

The README's v1 deployment order block reflects the original 12-step runbook plan; the actual deploy path now flows through the v1-do-doc-NN execution documents. Keeping the README in sync prevents new operators from following the stale path.

The D-019 note in the v1 scope section makes the Designate deferral explicit for anyone reading the README to understand v1 scope.

Diff (four hunks)

Hunk A — D-range:

-└── docs/
-    └── design-decisions.md          # architectural record (D-001 through D-016)
+└── docs/
+    ├── design-decisions.md          # architectural record (D-001 through D-019)
+    └── netbox-vip-queue.md          # post-deploy NetBox imports (workstream 2)

Hunk B — runbook 00 description (in the layout block):

-│   ├── 00-pre-deploy.md             # backups, capi-mgmt graceful teardown
+│   # (deprecated; see runbooks/deprecated/ — superseded by D-017 + D-018 + v1-do-doc-NN set)

Hunk C — replace the deploy-order block:

-## v1 deployment order
-
-1. Verify NetBox state — run NetBox imports if not already applied
-  - `netbox/ipv4-prefixes-import.py` — required
-  - `netbox/ipv6-mark-reserved.py` — required (Q3: tag existing IPv6 entries)
-2. Run pre-flight checks (`scripts/pre-flight-checks.sh`)
-3. Backup current cloud state (`runbooks/00-pre-deploy.md`)
-4. Destroy existing OpenStack model (`runbooks/01-destroy-model.md`)
-5. Deploy new bundle (`runbooks/02-deploy.md`)
-6. Initialize Vault (`runbooks/03-vault-init.md`)
-7. Set up Magnum domain (`runbooks/04-magnum-domain.md`)
-8. Stand up CAPI bootstrap cluster on `capi-mgmt.maas` (`runbooks/04a-capi-bootstrap-cluster.md`)
-9. Install Magnum CAPI Helm driver (`runbooks/05-magnum-capi-driver.md`)
-10. Recreate tenant resources (`runbooks/06-tenant-setup.md`)
-11. Populate DNS zones (`runbooks/07-dns-zones.md`)
-12. Run validation (`runbooks/08-validate.md` + `scripts/validate.sh`)
+## v1 deployment order
+
+The deploy is executed via the `runbooks/v1-do-doc-NN-*.md` execution documents in numeric order:
+
+| Doc | Purpose |
+|---|---|
+| `v1-do-doc-01-prep.md` | Pre-flight state check (repo, openrc, MAAS state of 5 VMs) |
+| `v1-do-doc-02-pki.md` | Octavia PKI overlay generation |
+| `v1-do-doc-03-destroy.md` | Conditional model + MAAS teardown (clean state for rebuild) |
+| `v1-do-doc-04-deploy.md` | `juju deploy` + settle wait + on-disk PKI verification |
+| `v1-do-doc-05-vault-init.md` | Vault initialization + cert cascade + admin-openrc regeneration |
+| `v1-do-doc-06-magnum-domain.md` | Magnum Keystone domain setup |
+| `v1-do-doc-07-capi-bootstrap.md` | CAPI bootstrap cluster + workload pivot |
+| `v1-do-doc-08-magnum-driver.md` | Magnum CAPI Helm driver graft |
+| `v1-do-doc-09-tenant.md` | Tenant project/user/openrc + Snapshot 2 |
+| `v1-do-doc-10-validate.md` | D-011 acceptance criteria + Snapshot 3 |
+
+NetBox imports are run separately (gated on external NetBox engineer review; see `netbox/README.md`).

Hunk D — v1 scope note about Designate deferral:

 ## v1-specific design decisions (summary; see docs/design-decisions.md for full record)

 - **D-015 v1/v2 fork** — IPv4-only v1; IPv6/dual-stack v2 deferred
 - **D-016 IPv4 tenant pool hybrid model** — NetBox owns upstream `/16` pool; Neutron owns per-project subnets within it
 - **D-003 Option B network architecture** — Provider `/22` carries both ext_net FIPs (`10.12.4.10–.223`) and OpenStack public API VIPs (`10.12.4.224–.254`) on the same L2 segment; fixes the tenant→API unreachability that caused Magnum OCCM crashloop on Bobcat testcloud
 - **D-005 Ceph Squid** — matches Caracal default; rehearses Roosevelt
 - **D-006 Vault HA backend = etcd + easyrsa**
 - **D-007 Magnum from day one** — charm in bundle + CAPI Helm driver graft
-- **D-008 DNS via Designate from day one** — static /etc/hosts for bootstrap; Designate handles tenant-level resolution (A records only for v1)
+- **D-019 (supersedes D-008) DNS scope reduction for v1** — Designate deferred to v2 alongside corporate DNS / NS-delegation work. Tenant subnets use public DNS (`1.1.1.1` / `1.0.0.1`) directly via `--dns-nameserver`. `*.cloud.neumatrix.local` FQDN tree remains internal-only, resolved via static `/etc/hosts` on bootstrap-relevant hosts.
 - **D-009 Hacluster relations included at num_units=1** — decorative on testcloud; documents the relation pattern for Roosevelt scale-up
 - **No OVN pinning on testcloud** — Roosevelt bare-metal will pin via `ovn-source`

Commit message

README: refresh stale runbook references, reflect D-019 scope reduction

The v1 deployment order block referenced runbooks 00-08 which are
being moved to deprecated/ in favor of v1-do-doc-NN execution documents.
Replaces the order block with a pointer to the do-document set.

Also reflects D-019: Designate deferred to v2; v1 tenant resolvers use
public DNS. Adds the netbox-vip-queue.md reference. Updates the design-
decisions D-range from D-001-D-016 to D-001-D-019.

5. Move superseded runbooks to runbooks/deprecated/

What

git mv each superseded runbook into a new runbooks/deprecated/ directory. Add runbooks/deprecated/README.md explaining the deprecation in commit 6.

Why

The v1-do-doc-NN set replaces the prior runbook 00-08 work. Keeping the originals in runbooks/deprecated/ preserves the audit trail without misleading new operators into following the old paths.

Files to move

From To Replacement
runbooks/00-pre-deploy.md runbooks/deprecated/00-pre-deploy.md superseded by D-017 + D-018 (no per-cycle backups; teardown direct-to-MAAS); v1-do-doc-01 covers prep
runbooks/01a-octavia-pki-generation.md runbooks/deprecated/01a-octavia-pki-generation.md v1-do-doc-02-pki.md
runbooks/02-deploy.md runbooks/deprecated/02-deploy.md v1-do-doc-04-deploy.md
runbooks/03-vault-init.md runbooks/deprecated/03-vault-init.md v1-do-doc-05-vault-init.md
runbooks/04-magnum-domain.md runbooks/deprecated/04-magnum-domain.md v1-do-doc-06-magnum-domain.md
runbooks/04a-capi-bootstrap-cluster.md runbooks/deprecated/04a-capi-bootstrap-cluster.md v1-do-doc-07-capi-bootstrap.md
runbooks/05-magnum-capi-driver.md runbooks/deprecated/05-magnum-capi-driver.md v1-do-doc-08-magnum-driver.md
runbooks/06-tenant-setup.md runbooks/deprecated/06-tenant-setup.md v1-do-doc-09-tenant.md
runbooks/07-dns-zones.md runbooks/deprecated/07-dns-zones.md deferred to v2 per D-019 (no v1 replacement)
runbooks/08-validate.md runbooks/deprecated/08-validate.md v1-do-doc-10-validate.md

Files NOT moved

File Why kept in runbooks/
runbooks/01-destroy-model.md Referenced by v1-do-doc-03 as a conditional sub-procedure; still active

Git commands

cd "$HOME/openstack-caracal-ipv4"

mkdir -p runbooks/deprecated

git mv runbooks/00-pre-deploy.md                   runbooks/deprecated/
git mv runbooks/01a-octavia-pki-generation.md      runbooks/deprecated/
git mv runbooks/02-deploy.md                       runbooks/deprecated/
git mv runbooks/03-vault-init.md                   runbooks/deprecated/
git mv runbooks/04-magnum-domain.md                runbooks/deprecated/
git mv runbooks/04a-capi-bootstrap-cluster.md      runbooks/deprecated/
git mv runbooks/05-magnum-capi-driver.md           runbooks/deprecated/
git mv runbooks/06-tenant-setup.md                 runbooks/deprecated/
git mv runbooks/07-dns-zones.md                    runbooks/deprecated/
git mv runbooks/08-validate.md                     runbooks/deprecated/

git status
# Expect: 10 renames staged, runbooks/01-destroy-model.md untouched

Commit message

runbooks: move superseded files to runbooks/deprecated/

These are replaced by v1-do-doc-NN-*.md execution documents (added in
follow-up commits). The 01-destroy-model.md runbook stays in place — it's
referenced by v1-do-doc-03 as a conditional sub-procedure.

07-dns-zones.md is deferred to v2 per D-019 with no v1 replacement
(Designate is no longer in v1 scope).

History preserved via git mv.

6. Add a deprecation banner to runbooks/deprecated/README.md

What

Create a new file runbooks/deprecated/README.md with a banner explaining the deprecation scope and a replacement map.

Content

# Deprecated v1 Runbooks

The runbooks in this directory have been superseded by the
`runbooks/v1-do-doc-NN-*.md` execution documents (or, in the case of
`07-dns-zones.md`, deferred to v2 entirely per D-019).

They are preserved here so the audit trail from the early v1 drafting
phase remains accessible. **Do not execute them.** The v1 deploy is
gated through the do-document set.

## Replacement map

| Deprecated runbook | Replacement |
|---|---|
| `00-pre-deploy.md` | superseded by D-017 + D-018 (no per-cycle backups; direct MAAS teardown); `v1-do-doc-01-prep.md` covers prep |
| `01a-octavia-pki-generation.md` | `v1-do-doc-02-pki.md` |
| `02-deploy.md` | `v1-do-doc-04-deploy.md` |
| `03-vault-init.md` | `v1-do-doc-05-vault-init.md` |
| `04-magnum-domain.md` | `v1-do-doc-06-magnum-domain.md` |
| `04a-capi-bootstrap-cluster.md` | `v1-do-doc-07-capi-bootstrap.md` |
| `05-magnum-capi-driver.md` | `v1-do-doc-08-magnum-driver.md` |
| `06-tenant-setup.md` | `v1-do-doc-09-tenant.md` |
| `07-dns-zones.md` | **deferred to v2 per D-019** (no v1 replacement) |
| `08-validate.md` | `v1-do-doc-10-validate.md` |

`01-destroy-model.md` is **not** in this directory — it remains active in
`runbooks/` and is referenced as a conditional sub-procedure by
`v1-do-doc-03-destroy.md`.

Commit message

runbooks/deprecated: add README explaining the deprecation scope

Includes the deprecated → replacement mapping for operators who arrive
via git log searches or stale internal references.

7. Bundle: remove Designate (per D-019)

What

In bundle.yaml, remove four applications, seven relations, and update a header comment to reflect Designate deferral to v2. VIP 10.12.4.227 becomes unused space in the 10.12.4.224-.254 range (same status as 10.12.4.225 reserved for v2 ceph-radosgw HA).

Why

Per D-019 (added by commit 8 of this change list): Designate is deferred to v2 alongside corporate-DNS / NS-delegation work. v1 testcloud topology investigation (2026-05-27 session) confirmed:

  1. Outside-in DNS isn't needed for v1 — corporate clients reach the cloud through the existing openstack.baldurkeep.com → 10.17.4.20 → 10.12.x HTTPS proxy chain, not via the *.cloud.neumatrix.local FQDN tree. The edge nginx (neumatrix-nginx at 10.17.8.7) cannot route to 10.12.x directly anyway.
  2. Inside-out DNS doesn't require Designate — tenant subnets can specify public DNS (1.1.1.1, 1.0.0.1) directly via --dns-nameserver at subnet-create time.
  3. FIP DNS auto-registration (the remaining v1 use case for Designate) is nice-to-have, not load-bearing for any v1 acceptance criterion.

Removing Designate now removes one charm, four applications (with subordinate routers), seven relations, and one VIP from the v1 deploy surface, reducing first-deploy troubleshooting scope.

Diff

Remove four applications (search-and-delete each block in bundle.yaml):

  # =====================================================================
  # DNS: Designate (NEW for Caracal v1 per D-008)
  # =====================================================================
  # Naming convention: <service>.omega.dc0.vr0.cloud.neumatrix.local

  designate:
    charm: designate
    channel: 2024.1/stable
    num_units: 1
    to: [lxd:8]
    options:
      openstack-origin: *openstack-origin
      nameservers: "ns1.omega.dc0.vr0.cloud.neumatrix.local. ns2.omega.dc0.vr0.cloud.neumatrix.local."
      vip: 10.12.4.227
      os-public-hostname: designate.omega.dc0.vr0.cloud.neumatrix.local
    bindings: *api-bindings
    constraints: arch=amd64

  designate-mysql-router:
    charm: mysql-router
    channel: 8.0/stable

  designate-bind:
    charm: designate-bind
    channel: 2024.1/stable
    num_units: 1
    to: [lxd:8]
    bindings:
      "": provider                       # unit on provider so bind9:53 reachable from tenants (D-003)
      cluster: metal                     # peer traffic stays internal (decorative with num_units=1)
    constraints: arch=amd64

Remove the designate-hacluster: line from the hacluster subordinate block:

   keystone-hacluster:              { charm: hacluster, channel: 2.4/stable }
   glance-hacluster:                { charm: hacluster, channel: 2.4/stable }
   neutron-api-hacluster:           { charm: hacluster, channel: 2.4/stable }
   nova-cloud-controller-hacluster: { charm: hacluster, channel: 2.4/stable }
   placement-hacluster:             { charm: hacluster, channel: 2.4/stable }
   openstack-dashboard-hacluster:   { charm: hacluster, channel: 2.4/stable }
   cinder-hacluster:                { charm: hacluster, channel: 2.4/stable }
   octavia-hacluster:               { charm: hacluster, channel: 2.4/stable }
   barbican-hacluster:              { charm: hacluster, channel: 2.4/stable }
   magnum-hacluster:                { charm: hacluster, channel: 2.4/stable }
   vault-hacluster:                 { charm: hacluster, channel: 2.4/stable }
   # v2-deferred: ceph-radosgw-hacluster:    { charm: hacluster, channel: 2.4/stable }
-  designate-hacluster:             { charm: hacluster, channel: 2.4/stable }
+  # v2-deferred (D-019): designate-hacluster: { charm: hacluster, channel: 2.4/stable }

Remove seven relations from the relations: block:

  # ---- Designate (DNS) — NEW for Caracal v1 per D-008
  - [designate-mysql-router:db-router, mysql-innodb-cluster:db-router]
  - [designate-mysql-router:shared-db, designate:shared-db]
  - [designate:identity-service, keystone:identity-service]
  - [designate:amqp, rabbitmq-server:amqp]
  - [designate:certificates, vault:certificates]
  - [designate:dns-backend, designate-bind:dns-backend]
  - [designate:ha, designate-hacluster:ha]

Update the header comment block (decision references in the bundle's top-of-file block):

     D-006 Vault HA via etcd + easyrsa
     D-007 Magnum Layer A + Layer B graft
-    D-008 Designate day-one
+    D-019 (supersedes D-008) Designate deferred to v2
     D-009 hacluster subordinates (decorative on testcloud)

Update the HA subordinate header comment:

   # =====================================================================
-  # HA Cluster Subordinates (12 active for v1; ceph-radosgw deferred to v2)
+  # HA Cluster Subordinates (11 active for v1; ceph-radosgw + designate deferred to v2)
   # =====================================================================

Verification post-edit

cd "$HOME/openstack-caracal-ipv4"

# 1. No designate application remains
grep -E "^  designate" bundle.yaml \
  && echo "[FAIL] designate-related application still present" \
  || echo "[OK] no designate applications"

# 2. No designate relations remain
grep -E "designate" bundle.yaml | grep -vE "^[[:space:]]*#"
# Expect: no output (the only remaining 'designate' tokens should be commented)

# 3. VIP count is now 11
VIP_COUNT=$(grep -cE "^[[:space:]]+vip: 10\.12\.4\." bundle.yaml)
echo "VIPs: $VIP_COUNT (expect 11)"

# 4. YAML still parses
python3 -c "import yaml; yaml.safe_load(open('bundle.yaml')); print('[OK] YAML parses')"

Commit message

bundle: remove Designate per D-019 (deferred to v2)

Removes the designate, designate-bind, designate-mysql-router, and
designate-hacluster applications, plus all seven designate-related
relations. Updates header comments to reflect the deferral.

Rationale per D-019: v1 testcloud topology investigation confirmed
outside-in DNS is not needed (corporate clients reach the cloud via
the openstack.baldurkeep.com HTTPS proxy chain, not via *.cloud.
neumatrix.local FQDNs). Tenant subnets use public DNS directly via
--dns-nameserver. FIP DNS auto-registration is not load-bearing for
any v1 acceptance criterion.

VIP 10.12.4.227 becomes unused space in 10.12.4.224-.254 (same
status as 10.12.4.225 reserved for v2 ceph-radosgw HA).

Reduces v1 deploy surface by one charm + three subordinates +
seven relations + one VIP.

8. design-decisions.md: add D-019 + amend D-008 and D-011

What

Three coordinated edits in docs/design-decisions.md:

  1. Add a new D-019 entry with the full deferral rationale and v2 deltas
  2. Mark D-008 as "superseded by D-019 — v2-scope"
  3. Amend D-011 (validation bar) to remove the "Designate resolves" criterion

Why

Captures the decision in the authoritative design-decisions record. Preserves the audit trail (D-008 stays with its original content but with the superseded status flag).

Diff — Hunk A: amend D-008 status

 ## D-008: DNS architecture

-**Decision:** Layered — static /etc/hosts for bootstrap + Designate (in bundle from day one) for tenant-level resolution.
+**Status:** Superseded by D-019 (2026-05-27). v2-scope. Original decision text preserved below for audit.
+
+**Decision (original; superseded):** Layered — static /etc/hosts for bootstrap + Designate (in bundle from day one) for tenant-level resolution.

Diff — Hunk B: amend D-011 validation bar

In the D-011 section, the testcloud validation criteria currently list (among others) a Designate resolution check. Remove the relevant bullet.

 ## D-011: Roosevelt-rehearsal validation bar

 **Decision:** v1 testcloud must pass these criteria before being declared "deploy-equivalent" to Roosevelt:

 - All charms `active/idle` (per `juju status`)
 - Tenant subnet → OpenStack API reachability (Bobcat Magnum OCCM crashloop regression test)
 - Octavia LBaaS end-to-end (LB create + member health + failover + recovery)
 - Magnum CAPI end-to-end (cluster template + cluster create + CREATE_COMPLETE)
 - Vault unseal-after-reboot survives a power cycle
-- Designate resolves API FQDNs via the Designate VIP
 - Snapshots 1, 2, 3 captured at the appropriate stages (per D-012)

+**Amendment (2026-05-27):** Per D-019, the "Designate resolves" criterion is removed for v1. Designate is deferred to v2; tenant subnets resolve via public DNS. v2 will reinstate a DNS-resolution validation criterion calibrated to whatever DNS mechanism is in place (NS delegation from corporate DNS, or otherwise).

Diff — Hunk C: add D-019 (new entry, append to end of design-decisions.md)

---


## D-019: DNS scope reduction for v1 — Designate deferred to v2

**Decision (2026-05-27):** Designate is removed from the v1 testcloud bundle and deferred to v2 alongside corporate DNS / NS delegation work. v1 tenant subnets resolve via public DNS (`1.1.1.1`, `1.0.0.1`) directly via the `--dns-nameserver` option at subnet-create time.

**Supersedes:** D-008 (DNS architecture).

**Amends:** D-011 (validation bar — removes "Designate resolves" criterion).

### Rationale

Three findings from the 2026-05-27 testcloud topology investigation:

1. **Outside-in DNS** (corporate clients resolving `*.cloud.neumatrix.local`) is not needed for v1. Corporate access to the cloud already flows through the existing `openstack.baldurkeep.com → 10.17.4.20 → 10.12.x` HTTPS proxy chain (handled by the edge nginx at `10.17.8.7`), which does not depend on corporate-side resolution of cloud-internal FQDNs.

2. **The edge nginx cannot route to `10.12.x` directly.** Inspection confirmed the edge has only `10.17.8.7/22` plus a tailscale interface; reaching `10.12.4.x` requires the libvirt-host NAT path. Adding DNS to the testcloud would require parallel UDP/53 NAT/proxy plumbing across three hosts (edge nginx, libvirt host, internal nginx) for a feature that has no v1 consumer.

3. **Inside-out DNS** (tenant VMs resolving external names) is satisfied by tenant subnets pointing `--dns-nameserver` at public DNS (`1.1.1.1`, `1.0.0.1`). Designate is not needed in the inside-out path either, since:
   - Tenant VMs don't need to resolve cloud-internal FQDNs (their API access goes through documented IPs / `--cloud` configs in cloud.conf)
   - Cross-tenant DNS visibility is not a v1 requirement

The remaining v1 use case for Designate (FIP DNS auto-registration via the `neutron-api ↔ designate` integration) is informational only — nothing in v1 consumes those records.

### v1 implementation

- Tenant subnets created with `--dns-nameserver 1.1.1.1 --dns-nameserver 1.0.0.1` (or via the openrc `OS_DNS_NAMESERVERS` env)
- CAPI workload cluster template variable `OPENSTACK_DNS_NAMESERVERS` set to `1.1.1.1,1.0.0.1` (per `v1-do-doc-07-capi-bootstrap.md` §13)
- Cloud-internal `*.cloud.neumatrix.local` FQDN tree resolved via static `/etc/hosts` on bootstrap-relevant hosts (jumphost, openstack0-3, LXD containers per charm bootstrap, capi-mgmt — staged in `v1-do-doc-05-vault-init.md` §11 and `v1-do-doc-07-capi-bootstrap.md` §6)
- Charms continue to use FQDN-based `os-public-hostname` (cert SANs depend on it) — internal resolution via `/etc/hosts` is sufficient

### v2 plan

- Re-introduce Designate (charm + designate-bind + relations + hacluster sub)
- NS delegation from corporate DNS to designate-bind on a real (non-NAT) network VIP
- Tenant subnets transitioning to use Designate VIP as their resolver (after corporate DNS delegation lands)
- Designate v2 deploy on a real-network Roosevelt or v2-testcloud topology where the bridging-host complexity from v1 testcloud doesn't apply
- D-011 validation re-introduces a calibrated DNS-resolution criterion (mechanism TBD: NS delegation working end-to-end vs static A records at corporate DNS)

### v2-residency note

The IPv6 prefixes already imported into NetBox (and marked Reservation status) include allocations that would be appropriate for Designate's VIPs in a v2 design — these stay in NetBox as Reservation until v2 work begins.

Commit message

docs/design-decisions: add D-019, amend D-008/D-011 (DNS scope reduction)

D-019 captures the Designate deferral to v2 with rationale grounded in
the 2026-05-27 testcloud topology investigation: outside-in DNS not
needed (corporate clients use openstack.baldurkeep.com HTTPS chain);
edge nginx can't route to cloud-internal anyway; inside-out is
satisfied by tenant --dns-nameserver pointing at public DNS.

D-008 (DNS architecture) marked superseded; original text preserved
for audit trail.

D-011 validation bar amended to remove "Designate resolves" criterion;
v2 will reinstate a calibrated DNS criterion.

9. netbox-vip-queue.md: drop the Designate VIP row

What

In docs/netbox-vip-queue.md, remove the row for Designate's VIP 10.12.4.227. The queue goes from 12 entries to 11.

Why

Per D-019, Designate has no VIP in v1. 10.12.4.227 becomes unused space within the 10.12.4.224-.254 range — same status as 10.12.4.225 (reserved for v2 ceph-radosgw HA).

Diff

 # NetBox VIP Queue (post-deploy)

 The following 12 IPAddress entries should be added to NetBox after the
 v1 deploy completes and the engineer-review of `netbox/ipv4-prefixes-
 import.py` has landed.

-| VIP | Service | Notes |
-|---|---|---|
-| 10.12.4.224 | barbican | per D-003 |
-| 10.12.4.226 | cinder | per D-003 |
-| 10.12.4.227 | designate | per D-003 + D-008 |
-| 10.12.4.228 | glance | per D-003 |
-| 10.12.4.229 | keystone | per D-003 |
-| 10.12.4.230 | magnum | per D-003 |
-| 10.12.4.231 | neutron-api | per D-003 |
-| 10.12.4.232 | nova-cloud-controller | per D-003 |
-| 10.12.4.233 | octavia | per D-003 |
-| 10.12.4.234 | openstack-dashboard | per D-003 |
-| 10.12.4.235 | placement | per D-003 |
-| 10.12.4.236 | vault | per D-003 |
+The following 11 IPAddress entries should be added to NetBox after the
+v1 deploy completes and the engineer-review of `netbox/ipv4-prefixes-
+import.py` has landed.
+
+| VIP | Service | Notes |
+|---|---|---|
+| 10.12.4.224 | barbican | per D-003 |
+| 10.12.4.226 | cinder | per D-003 |
+| 10.12.4.228 | glance | per D-003 |
+| 10.12.4.229 | keystone | per D-003 |
+| 10.12.4.230 | magnum | per D-003 |
+| 10.12.4.231 | neutron-api | per D-003 |
+| 10.12.4.232 | nova-cloud-controller | per D-003 |
+| 10.12.4.233 | octavia | per D-003 |
+| 10.12.4.234 | openstack-dashboard | per D-003 |
+| 10.12.4.235 | placement | per D-003 |
+| 10.12.4.236 | vault | per D-003 |
+
+**Reserved (unused in v1):**
+
+- `10.12.4.225` — reserved for v2 ceph-radosgw HA (workstream-2 decision)
+- `10.12.4.227` — reserved for v2 designate (D-019 deferral)

(The exact line counts may differ if the existing file has more preamble — adjust to the actual structure when editing.)

Commit message

docs/netbox-vip-queue: drop designate row per D-019

VIP 10.12.4.227 is no longer in v1; designate deferred to v2. Adds a
"reserved (unused in v1)" section to make the unused slots in the
.224-.254 range explicit for future reference.

10. Verification (read-only) after the nine commits land

cd "$HOME/openstack-caracal-ipv4"
git pull
git log --oneline -10

echo "=== 1. ceph-osd no longer has a storage block ==="
grep -A 12 "^  ceph-osd:" bundle.yaml | grep "storage:" \
  && echo "[FAIL] storage: block still present in ceph-osd" \
  || echo "[OK] no storage: in ceph-osd"

echo "=== 2. D-002 vault token removed ==="
grep -E "OpenStack core API charms" docs/design-decisions.md | grep -v "vault)" \
  && echo "[OK]" \
  || echo "[FAIL] D-002 still contains vault token"

echo "=== 3. D-014 reflects new repo path ==="
grep "OpenStack/openstack-caracal-ipv4" docs/design-decisions.md \
  && echo "[OK]" \
  || echo "[FAIL]"

echo "=== 4. README reflects v1-do-doc set ==="
grep "v1-do-doc-NN" README.md \
  && echo "[OK]" \
  || echo "[FAIL]"

echo "=== 5. Designate gone from bundle ==="
grep -E "^  designate" bundle.yaml \
  && echo "[FAIL] designate-related app still present" \
  || echo "[OK] no designate apps"

grep -E "designate" bundle.yaml | grep -vE "^[[:space:]]*#"
# Expect: empty (only commented references remain)

echo "=== 6. VIP count is 11 ==="
VIP_COUNT=$(grep -cE "^[[:space:]]+vip: 10\.12\.4\." bundle.yaml)
echo "VIPs: $VIP_COUNT (expect 11)"

echo "=== 7. D-019 added ==="
grep "^## D-019" docs/design-decisions.md \
  && echo "[OK]" \
  || echo "[FAIL]"

echo "=== 8. D-008 marked superseded ==="
grep -A 2 "^## D-008" docs/design-decisions.md | grep "Superseded by D-019" \
  && echo "[OK]" \
  || echo "[FAIL]"

echo "=== 9. netbox-vip-queue.md has 11 entries ==="
QUEUE_COUNT=$(grep -cE "^\| 10\.12\.4\." docs/netbox-vip-queue.md)
echo "VIP queue entries: $QUEUE_COUNT (expect 11)"

echo "=== 10. runbooks/deprecated/ has 10 files + README ==="
ls runbooks/deprecated/ | wc -l
# Expect: 11 (10 deprecated runbooks + README.md)

echo "=== 11. YAML still parses ==="
python3 -c "import yaml; yaml.safe_load(open('bundle.yaml')); print('[OK] YAML parses')"

11. Acceptance criteria

  • All 9 commits landed and pushed to main
  • Verification section 10 returns [OK] for all 11 checks
  • git log --oneline shows the 9 commits in order
  • bundle.yaml parses cleanly via python3 -c "import yaml; yaml.safe_load(open('bundle.yaml'))"
  • No untracked changes (git status clean except for any local scratch)

Once all checked, proceed to v1-do-doc-01-prep.md execution.


12. Change log

Date Change Reference
2026-05-27 v1 (six commits): ceph-osd storage block; D-002/D-014 cleanups; README refresh; deprecated runbook moves; deprecation README Initial drafting
2026-05-27 v2 (nine commits): + Designate deferral commits 7/8/9; amended commit 5/6/10 to reflect 07-dns-zones permanent deprecation; updated commit 4 README hunks; VIP count expectations updated from 12 to 11 D-019 decision; testcloud topology investigation