# Environment - Omega Cloud (VR0 DC0 testcloud)

Facts here are ANCHORS, not command inputs. Anything marked (verify) must be
re-measured or re-read from the repo/live cloud before use in a command -
hard rule 2 applies. Snapshot date: 2026-07. The repo is fresher than this file.

## The two deployments

- **Testcloud (now):** VR0 DC0, four KVM host VMs (openstack0-3) on a single
  hypervisor, managed by MAAS + Juju. Single-DC virtual rehearsal.
- **Roosevelt (future):** bare-metal, multi-DC, commercial production
  (3310 Roosevelt Blvd, Eugene OR). Dedicated node roles (gateway/controller/
  compute split) - unlike the hyperconverged testcloud. Every design choice is
  judged by its transfer to Roosevelt.

## Stack (verify against appendix-B for pins)

Charmed OpenStack Caracal 2024.1 - Juju 3.6, MAAS 3.7.2, Vault TLS (charm-pki
root CA), OVN 24.03, Ceph Squid, Octavia (amphora), Barbican,
mysql-innodb-cluster, RabbitMQ, Magnum + magnum-capi-helm driver + azimuth
capi-helm-charts (kubeadm engine), in-cloud single-homed CAPI mgmt VM
(capi-mgmt-v2, k8s-snap, D-035). NetBox is the IPAM apex: never hand-edit
downstream MAAS or overlays for network values.

## Control points

- **Jumphost:** `vopenstack-jesse` - all live commands run here. Has juju,
  the openstack CLI (SNAP - cannot read /tmp; use $HOME), jq, kubectl.
- **Repo:** `https://git.baldurkeep.com/OpenStack/openstack-caracal-ipv4`
  (web) / `.../git/OpenStack/openstack-caracal-ipv4.git` (clone). Operator
  commits from Windows (PowerShell / GitHub Desktop - strips exec bits;
  `.gitattributes` pins LF); the jumphost only pulls.
- **Juju model:** `openstack`. **MAAS profile:** `admin` (call
  `maas admin ...` directly; NEVER `maas list` - it prints the API key).
- **Management substrate (verify; NEVER touch in teardown):** the MAAS
  machines hosting juju, lxd, and tailscale are hard-excluded from teardown
  scripts. Resolve system_ids live via `scripts/lib-hosts.sh` - system_ids
  are re-minted on every re-enrollment (DOCFIX-040).

## The six network planes (D-052 / D-053; verify against scripts/lib-net.sh)

| Plane | CIDR | Carries | Notes |
|---|---|---|---|
| provider-public | 10.12.4.0/22 | Public API VIPs + tenant FIPs (Pattern A, D-060) | gw .4.1; untagged |
| metal-admin | 10.12.8.0/22 | MAAS PXE, operator/admin endpoint, default binding | gw .8.1; DC-local |
| metal-internal | 10.12.12.0/22 | ALL service-to-service control (internal API, DB, MQ, Vault, peers) | tagged VID 103 via br-internal; no gw |
| data-tenant | 10.12.16.0/22 | Tenant Geneve overlay | no gw |
| storage | 10.12.32.0/22 | Ceph public | no gw |
| replication | 10.12.36.0/22 | Ceph cluster (OSD replication) | no gw |

- API VIPs: triple per clustered charm (provider/admin/internal), matching
  last octet in the .50-.60 band, 11 clustered charms (verify count live).
- Tenant pool: 10.20.0.0/16 (hybrid model D-016 - pool in NetBox, per-project
  /24s Neutron-managed). Avoid collisions with capi-mgmt (10.20.0.0/24) and
  existing tenant /24s - list live before allocating.
- Provider NIC rule (D-057/D-060): the provider uplink must land in OVS
  `br-ex`, never enslaved to a Linux bridge, and `br-ex` carries no L3 config.

## Repo map (what lives where)

- `bundle.yaml` - the canonical bundle; VIPs/units baked in for testcloud.
- `runbooks/phase-00..08-*.md` - the gated deploy sequence, in order, each
  ending in a hard gate. `runbooks/README.md` has the label conventions.
- `runbooks/appendix-A-troubleshooting.md` - symptom->cause->fix index keyed
  by D-NNN/DOCFIX-NNN. First stop for any known-looking failure.
- `runbooks/appendix-B` - version lock. `appendix-C` - identity/RBAC.
  `appendix-D` - Magnum trust model. `ops-capi-recovery.md` - CAPI/Magnum
  post-deploy operations.
- `docs/design-decisions.md` - the D-NNN architectural record (append-only
  discipline; superseded entries stay, marked).
- `scripts/` - phase scripts + `lib-net.sh` / `lib-hosts.sh` (pinned values)
  + tenant onboarding/acceptance. `tests/<script>/` - offline fakebin
  regression harnesses.
- `policies/domain-manager-policy.yaml` + `policies/overrides.zip` - the SCS
  Domain Manager RBAC override (D-051/D-064); the zip ships IN the bundle
  (keystone resources, DOCFIX-071) and provider-bundle-check drift-guards it.
- Operational tooling (2026-07 hardening set): `scripts/preflight.sh` (single
  pre-deploy gate: lint -> bundle invariants -> Charmhub channel assert -> live
  MAAS pre-flight), `scripts/repo-lint.sh`/`repo_lint.py` (static hygiene,
  L1-L6), `scripts/cloud-assert.sh` (behavioral verifier + `--capture` BOM to
  `asbuilt/<date>/`), `scripts/run-logged.sh` (as-executed session logger),
  `scripts/channel_assert.py`. `runbooks/ops-restart-procedure.md` (full-cloud
  restart). `docs/security-ledger.md` (exposure/obligation rows).
  `logs/as-executed-index.md` (committed index; log content stays jumphost-only).
- No KVM snapshot restore path exists (D-070 superseded D-012):
  rebuild-from-runbooks IS the restore path; baselines come from cloud-assert
  `--capture`.

## Identity / tenancy model (see appendix-C/D and D-051, D-064, D-066)

Domain-per-client. Operator provisions: domain + a domain `manager` (SCS
Domain Manager persona - the plain `admin` role is NOT domain-confinable) +
quotas. The tenant self-services everything inside: projects, users, roles
(only member + load-balancer_member assignable - never admin/manager),
app credentials, networks, templates, clusters. Magnum mints per-cluster
trust app-creds carrying the trustor's roles frozen at mint time (D-039:
trustor needs load-balancer_member or CAPO 403s on Octavia). Cluster create
must run as a password identity, not an app-cred (trust-creation block,
D-066). Every identity command is DOMAIN-QUALIFIED (`--domain`,
`--user-domain`, `--project-domain`) - scope-default resolution silently
lands in the wrong domain and 404s misleadingly.
