diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..b2e8939 --- /dev/null +++ b/.gitignore @@ -0,0 +1,37 @@ +# Local artifacts / secrets — must never be committed +*-openrc +*.tar.gz +*.tar.zst +backups/ +.vault-keys/ +*.pem +*.key +*.crt +# Keep CA certs that are NOT secrets if explicitly added in docs/ +!docs/**/*.crt + +# Python +__pycache__/ +*.pyc +*.pyo +.venv/ +venv/ + +# Editor +.vscode/ +.idea/ +*.swp +*~ + +# Logs +*.log +deploy-*.log + +# Juju local artifacts +*.bundle +bundle-pre-rebuild.yaml +juju-status-pre-rebuild.yaml + +# NetBox token files +.netbox-env +.env diff --git a/README.md b/README.md new file mode 100644 index 0000000..3052ddc --- /dev/null +++ b/README.md @@ -0,0 +1,112 @@ +# openstack-caracal-ipv4 — VR0 DC0 Omega Cloud (v1) + +**Scope:** Charmed OpenStack Caracal (2024.1) IPv4-only testcloud deployment +on the 4-VM KVM lab, modeled in NetBox as **VR0 DC0 Omega Cloud**. + +## v1 vs. v2 — read this first + +This repository is the **v1 deliverable** — IPv4-only Caracal Charmed +OpenStack on the existing MAAS-provisioned network layout. v1 ships first +because the upstream router infrastructure is not yet IPv6-ready; deferring +on IPv6 lets v1 prove the bundle, Option B binding fix, Magnum CAPI graft, +Designate-from-day-one, and the hacluster relation pattern at testcloud scale +without waiting on network-side IPv6 readiness. + +**v2** adds IPv6 / dual-stack per the address-family matrix retained as +v2-scope decisions in `docs/design-decisions.md` (D-004, D-004a). v2 will +ship either as a sibling overlay in this repository (`overlays/v2-dualstack.yaml` +on a `v2` branch) or as a separate repository — TBD when v2 work begins. + +The IPv6 prefixes already imported into NetBox under VR0 DC0 remain in +NetBox as **Reservation status** to document the v2 intent without +implying they are active. See `netbox/ipv6-mark-reserved.py`. + +## Repository purpose + +This repository is the deployment method. Bundle, overlays, runbooks, and +validation scripts together describe everything required to bring up the +cloud from a clean MAAS-managed Juju model. Anyone with NetBox read access, +MAAS access, and the Juju controller can clone this repository and reproduce +the cloud. + +## Source of truth + +**NetBox is authoritative for IPAM.** Any IP, prefix, or VLAN value +referenced in this repository traces back to NetBox. The exception is +tenant per-project subnets, which under the v1 hybrid model (D-016) are +Neutron-managed within a NetBox-modeled upstream tenant pool — i.e., the +pool has NetBox standing, individual tenant subnets do not. + +## Repository layout + +``` +openstack-caracal-ipv4/ +├── README.md # this file +├── bundle.yaml # canonical Charmed OpenStack bundle (IPv4) +├── overlays/ +│ └── vr0-dc0-testcloud.yaml # 4-VM lab specifics; num_units=1 + hacluster +├── runbooks/ +│ ├── 00-pre-deploy.md # backups, capi-mgmt graceful teardown +│ ├── 01-destroy-model.md # destroy openstack model + verify +│ ├── 02-deploy.md # juju deploy + settle wait +│ ├── 03-vault-init.md # vault unseal + cert auth +│ ├── 04-magnum-domain.md # domain-setup action + keystone wiring +│ ├── 05-magnum-capi-driver.md # pip install driver + kubeconfig + systemd +│ ├── 06-tenant-setup.md # project, user, openrc, app credentials +│ ├── 07-dns-zones.md # Designate zones + API VIP A records (v1) +│ └── 08-validate.md # Roosevelt-rehearsal validation criteria +├── scripts/ +│ ├── pre-flight-checks.sh # pre-deploy sanity checks +│ └── validate.sh # end-to-end validation runner +├── netbox/ +│ ├── README.md # what's here vs. what's deferred to v2 +│ ├── ipv4-prefixes-import.py # adds IPv4 prefixes + IPv4 tenant pool +│ └── ipv6-mark-reserved.py # marks IPv6 entries as Reservation (Q3) +└── docs/ + └── design-decisions.md # architectural record (D-001 through D-016) +``` + +## v1 deployment order + +1. Verify NetBox state — run NetBox imports if not already applied + - `netbox/ipv4-prefixes-import.py` — required + - `netbox/ipv6-mark-reserved.py` — required (Q3: tag existing IPv6 entries) +2. Run pre-flight checks (`scripts/pre-flight-checks.sh`) +3. Backup current cloud state (`runbooks/00-pre-deploy.md`) +4. Destroy existing OpenStack model (`runbooks/01-destroy-model.md`) +5. Deploy new bundle (`runbooks/02-deploy.md`) +6. Initialize Vault (`runbooks/03-vault-init.md`) +7. Set up Magnum domain (`runbooks/04-magnum-domain.md`) +8. Install Magnum CAPI Helm driver (`runbooks/05-magnum-capi-driver.md`) +9. Recreate tenant resources (`runbooks/06-tenant-setup.md`) +10. Populate DNS zones (`runbooks/07-dns-zones.md`) +11. Run validation (`runbooks/08-validate.md` + `scripts/validate.sh`) + +## v1-specific design decisions (summary; see docs/design-decisions.md for full record) + +- **D-015 v1/v2 fork** — IPv4-only v1; IPv6/dual-stack v2 deferred +- **D-016 IPv4 tenant pool hybrid model** — NetBox owns upstream `/16` pool; + Neutron owns per-project subnets within it +- **D-003 Option B network architecture** — Provider `/22` carries both + ext_net FIPs (`10.12.4.10–.223`) and OpenStack public API VIPs + (`10.12.4.224–.254`) on the same L2 segment; fixes the tenant→API + unreachability that caused Magnum OCCM crashloop on Bobcat testcloud +- **D-005 Ceph Squid** — matches Caracal default; rehearses Roosevelt +- **D-006 Vault HA backend = etcd + easyrsa** +- **D-007 Magnum from day one** — charm in bundle + CAPI Helm driver graft +- **D-008 DNS via Designate from day one** — static /etc/hosts for bootstrap; + Designate handles tenant-level resolution (A records only for v1) +- **D-009 Hacluster relations included at num_units=1** — decorative on + testcloud; documents the relation pattern for Roosevelt scale-up +- **No OVN pinning on testcloud** — Roosevelt bare-metal will pin via `ovn-source` + +## v2-scope decisions (deferred — read but do not action in v1) + +- **D-004 Dual-stack/IPv6-only matrix** — applies in v2 only +- **D-004a Host management → Metal (Option A)** — applies in v2 only; + v1 keeps openstack0-3 host management IPs on the storage fabric +- **VLAN modeling in NetBox** (Q2) — the VR0 DC0-VLANs group remains with + only VID 240 (OS-Provider) imported during prior session work; remaining + VLAN entries are deferred to v2 when actual VLAN tagging is in play. + Currently MAAS uses untagged-per-fabric, so the additional VLAN entries + would be misleading documentation diff --git a/bundle.yaml b/bundle.yaml new file mode 100644 index 0000000..731359a --- /dev/null +++ b/bundle.yaml @@ -0,0 +1,48 @@ +# Charmed OpenStack Caracal (2024.1) bundle for VR0 DC0 Omega Cloud +# +# STATUS: PLACEHOLDER — drafting begins after NetBox IPv4 import completes. +# +# This file will contain the canonical Juju bundle. The overlay +# (overlays/vr0-dc0-testcloud.yaml) will pin testcloud-specific values +# (num_units, machine constraints). +# +# Source-of-truth references (per D-010): +# - Channel pins per D-002 (channel matrix) +# - VIPs from NetBox VR0 DC0 Provider /22 API VIP range (10.12.4.224-.254) +# and IPv6 Provider API VIP /64 (2602:f3e2:e02:11::/64) +# - Hostnames per D-008 DNS convention +# (.omega.dc0.vr0.cloud.neumatrix.local) +# - Endpoint bindings per D-003 Option B +# +# Expected sections (final ordering when drafted): +# - name, description, series +# - variables (anchors for shared values) +# - machines (TBD post-MAAS inventory) +# - applications (Charmed OpenStack services + subordinates) +# - relations (per the documented Caracal relation graph) +# +# Required relations to verify before drafting: +# - keystone <-> mysql-router <-> mysql-innodb-cluster +# - keystone <-> vault:certificates +# - Each API charm <-> hacluster + vault:certificates (D-009) +# - Octavia <-> ovn-chassis-octavia (its OWN subordinate, NOT main ovn-chassis) +# - Barbican <-> barbican-vault <-> vault:secrets +# - Magnum: shared-db (with colon — see known bugs in design-decisions.md) +# - Vault HA: vault:ha <-> vault-hacluster:ha; etcd:db <-> vault:etcd; +# etcd:certificates <-> easyrsa:client (D-006) +# - Designate (D-008): standard keystone/mysql-router/rabbit + dns-mdns/dns-frontend +# +# Known anti-patterns to avoid (from docs/design-decisions.md): +# - magnum-shared-db missing colon (deploy-blocker) +# - empty/shared osd-devices YAML anchor (deploy-blocker; not applicable +# at testcloud scale but pattern matters for Roosevelt) +# - ovn-chassis binding `overlay-suffix` (invalid; use `data`) +# - hardcoded NIC names in bridge-interface-mappings +# - GUI annotation collisions +# +# TODO before drafting: +# - [ ] NetBox IPv4 prefixes imported (netbox/ipv4-prefixes-import.py) +# - [ ] NetBox VLANs imported (netbox/vlans-import.py) +# - [ ] MAAS hardware inventory of openstack0-3 (NIC names, disk paths) +# - [ ] Decide MAAS tags for testcloud machines +# - [ ] Verify Caracal channel matrix on Charmhub (D-002) is current diff --git a/docs/design-decisions.md b/docs/design-decisions.md new file mode 100644 index 0000000..df57d6d --- /dev/null +++ b/docs/design-decisions.md @@ -0,0 +1,409 @@ +# Design Decisions — VR0 DC0 Omega Cloud + +This document is the architectural record for the VR0 DC0 testcloud rebuild. +Every decision listed here has been deliberately made and discussed; it is +not a wishlist or a brainstorm. If a decision is changed, this document is +updated and the change is committed with a referencing message. + +**Scope split:** This repository implements **v1 (IPv4-only)**. Several +decisions below are tagged with **[v2-scope]** — they remain valid design +intent but are deferred to a future v2 deployment when upstream router +infrastructure supports IPv6. See **D-015** for the v1/v2 fork record. + +--- + +## D-001: Deployment target paradigm + +**Decision:** Path 2A — Charmed OpenStack Caracal (2024.1) via Juju bundle. + +**Alternatives considered:** + +- Path 2B — Canonical Sunbeam (microk8s-based). Rejected: discards most of + the test-cloud experience accumulated to date; different operator paradigm. +- Path 1 — Stay on Bobcat 2023.2. Rejected: defeats the purpose of a Caracal + rehearsal ahead of Roosevelt bare-metal. + +**Consequences:** + +- Bundle-based deployment, suitable for both KVM testcloud and bare-metal scale. +- Caracal-stable channel matrix applies (see D-002). +- EOL date is April 2027 (Caracal upstream support window). + +--- + +## D-002: Channel pinning matrix + +**Decision:** Pin every charm to a Caracal-stable channel. No OVN pinning on +testcloud (Roosevelt will pin via `ovn-source`). + +| Charm group | Channel | +|---|---| +| OpenStack core (keystone, glance, nova-*, neutron-api, cinder, placement, octavia, barbican, designate, magnum, vault) | `2024.1/stable` | +| OVN (ovn-central, ovn-chassis, ovn-dedicated-chassis-octavia) | `24.03/stable` | +| Ceph (ceph-mon, ceph-osd, ceph-radosgw if used) | `squid/stable` (see D-005) | +| MySQL (mysql-innodb-cluster, mysql-router subordinates) | `8.0/stable` | +| RabbitMQ | `3.9/stable` | +| Vault | `1.8/stable` | +| etcd, easyrsa | `latest/stable` | + +**Verification source:** Caracal channel matrix per Canonical Charmed +OpenStack docs, current as of design date. Verify against Charmhub before +deploy via `scripts/pre-flight-checks.sh`. + +--- + +## D-003: Network architecture — Option B + +**Decision:** Provider network carries BOTH ext_net (tenant FIPs + SNAT +egress) AND OpenStack public API VIPs on the same L2 segment. + +**Rationale:** During Magnum CAPI Phase 3 on the Bobcat testcloud, OCCM +crashloop was traced to tenant networks being unable to reach OpenStack API +endpoints — the libvirt FORWARD chain rejected cross-bridge packets between +provider (virbr1) and metal (virbr2) bridges. With API VIPs on metal, tenant +workloads cannot reach them. Putting API VIPs on the same network as the +FIPs makes the API path tenant-reachable by construction. + +**Address space layout for v1 (IPv4-only):** + +| Range | Purpose | +|---|---| +| `10.12.4.10 – 10.12.4.223` | Neutron FIP pool | +| `10.12.4.224 – 10.12.4.254` | Charm API VIPs (excluded from Neutron allocation_pools) | + +The Provider `/22` (`10.12.4.0/22`) carries both ranges within a single +Neutron subnet. Neutron `allocation_pools` MUST exclude the API VIP range. + +**v2-scope extension:** IPv6 Provider subnet adds parallel FIP and API VIP +IPv6 IP Ranges within a single `/64`. See D-004. + +--- + +## D-004 [v2-scope]: Dual-stack vs IPv6-only matrix + +**Decision (v2-scope, NOT for v1):** Network role determines address family. +IPv6 preferred; IPv6-only where the network has no external clients. + +**v1 reality:** All networks are IPv4-only on the existing MAAS-provisioned +layout. This matrix becomes active in v2. + +| Role | IPv4 (v1) | IPv4 (v2) | IPv6 (v2) | Reasoning | +|---|---|---|---|---| +| Metal | ✓ | ✓ | ✓ | Charm-to-charm; MAAS PXE IPv4-first | +| Provider | ✓ | ✓ | ✓ | Tenant FIPs need IPv4; API VIPs reachable from both | +| Data (Geneve underlay) | ✓ | — | ✓ | v2: no external clients; underlay agnostic | +| Storage (Ceph public) | ✓ | — | ✓ | v2: `ms-bind-ipv6: true`; no external clients | +| Replication (Ceph cluster) | ✓ | — | ✓ | v2: internal OSD↔OSD only | +| LBaaS Management | ✓ | ✓ | ✓ | Amphora image compatibility | +| OOB | n/a | n/a | n/a | Bare-metal-only concern | +| OpenStack Tenant pool | ✓ (v1: D-016) | — | ✓ | v1 IPv4 hybrid; v2 IPv6 modeled | + +--- + +## D-004a [v2-scope]: Host management → Metal + +**Decision (v2-scope, NOT for v1):** Under v2, openstack0-3 host management +IPs move from storage (`10.12.16.40-.43`) to Metal (`10.12.8.0/22`) when +Storage becomes IPv6-only. v1 keeps host management on storage. + +--- + +## D-005: Ceph release + +**Decision:** Squid (Ceph 19, released October 2024). + +**Rationale:** Matches Caracal default; one fewer source override in bundle; +rehearses what Roosevelt will run. If Squid has rough edges, the testcloud +is the place to find them, not production. + +**Alternatives considered:** + +- Reef (Ceph 18) — current on Bobcat testcloud; lower risk; would require + `source: cloud:jammy-caracal` override on ceph-mon/ceph-osd while keeping + `reef/stable` channel. Rejected: defeats the rehearsal purpose. + +--- + +## D-006: Vault HA backend + +**Decision:** etcd + easyrsa, per Canonical Charmed Vault HA docs. + +**Rationale:** This is the documented charm path. The chicken-and-egg TLS +dependency (Vault needs certs to start, but Vault issues certs) is resolved +by easyrsa bootstrapping the etcd cluster's TLS, after which Vault relations +to etcd come up cleanly. + +**Topology on testcloud (v1):** Vault num_units=1 + hacluster relation +(decorative; documents the relation pattern). Vault HA quorum is not +actually exercised at testcloud scale. + +**Topology on Roosevelt:** Vault num_units=3 + hacluster on metal space; +etcd num_units=3; easyrsa num_units=1. + +--- + +## D-007: Magnum inclusion + +**Decision:** Magnum in bundle from day one. Two-layer install. + +**Layer A — Bundle:** + +- `magnum` charm +- `magnum-mysql-router` subordinate +- `magnum-dashboard` subordinate +- Standard relations: keystone, mysql-innodb-cluster (via router), + rabbitmq-server, vault (certificates), openstack-dashboard +- Binding: `public: provider` with VIP on provider API VIP range +- Hacluster relation included (decorative on testcloud) + +**Layer B — Post-deploy runbook (`runbooks/05-magnum-capi-driver.md`):** + +- `juju run magnum/leader domain-setup --wait=10m` +- pip install `stackhpc/magnum-capi-helm` v0.13.0 into the magnum charm venv + with `--break-system-packages` +- Deploy `/etc/magnum/kubeconfig` pointing at `capi-mgmt.maas` bootstrap k3s +- Systemd override replacing init.d ExecStart to load `--config-dir` +- `/etc/magnum/magnum.conf.d/99-capi.conf` setting + `enabled_drivers=k8s_capi_helm_v1` and + `[capi_helm] kubeconfig_file=/etc/magnum/kubeconfig` + +**CAPI mgmt plane:** stays on `capi-mgmt.maas` bootstrap k3s. Not in-cloud. +This pattern transfers to Roosevelt unchanged. + +--- + +## D-008: DNS architecture + +**Decision:** Layered — static /etc/hosts for bootstrap + Designate (in bundle +from day one) for tenant-level resolution. + +**Naming convention:** + +``` +....cloud.neumatrix.local +``` + +Examples: +- `keystone.omega.dc0.vr0.cloud.neumatrix.local` +- `nova.omega.dc0.vr0.cloud.neumatrix.local` + +**Bootstrap order:** + +1. Static `/etc/hosts` on jumphost + all openstack0-3 hosts + all LXD containers +2. Bundle deploys with `os-public-hostname: ` per API charm +3. Vault issues certs with FQDN in SAN +4. Post-deploy: Designate zone created, A records populated + (v1: A records only; v2 adds AAAA records) +5. Neutron `default_dns_domain` and `dns_servers` configured to point at Designate +6. Tenant subnets created with `--dns-nameserver ` + +--- + +## D-009: Hacluster modeling at testcloud scale + +**Decision:** Include hacluster + VIP relations at num_units=1 across all +HA-eligible API charms. + +**Rationale:** Decorative at testcloud scale (a single unit can't form a +real HA quorum). Documents the relation pattern so Roosevelt scale-up is +mechanical: change `num_units: 1` → `num_units: 3` and rerun. + +**Charms with hacluster relation:** keystone, glance, neutron-api, +nova-cloud-controller, placement, openstack-dashboard, cinder, octavia, +barbican, magnum, vault, designate. + +--- + +## D-010: NetBox-upstream policy + +**Decision:** NetBox is the single source of truth for IPAM at the **role and +cloud-level pool** layer. Per-project tenant subnets are exempt under the +hybrid model (D-016). + +**Workflow:** Update NetBox → update bundle/overlay → commit both with +cross-reference. + +**Standing imports for v1 (gating the bundle):** + +- VR0 DC0 site exists in NetBox ✓ +- IPv4 prefixes for v1: Metal /22, Provider /22, LBaaS Mgmt /22 + (via `netbox/ipv4-prefixes-import.py`) — **pending** +- Provider IP Ranges for FIPs and API VIPs (same script) — **pending** +- IPv4 tenant pool /16 (same script, per D-016) — **pending** +- IPv6 entries marked as Reservation status + (via `netbox/ipv6-mark-reserved.py`) — **pending** + +**Deferred to v2 (per Q2):** VR0 DC0-VLANs group additions beyond +VID 240 (already imported during prior session work). MAAS currently uses +untagged-per-fabric; modeling additional VLANs in NetBox without +corresponding network-side tagging would be misleading documentation. + +--- + +## D-011: Validation bar — Roosevelt-rehearsal level + +**Decision:** Deployment is not considered successful until all of the +following pass: + +1. All charms `active/idle` in `juju status` +2. API reachability from jumphost (all public VIPs respond on hostname) +3. API reachability from a tenant VM (Option B verification) +4. Octavia LB pattern re-passes (round-robin, failover, recovery — per + Bobcat v3 work) +5. End-to-end Magnum CAPI cluster creation succeeds, including OCCM not + crash-looping +6. Vault unseal + auto-unseal-after-reboot pattern verified +7. KVM snapshot baseline taken (Phase 5) +8. Designate zones populated and tenant VMs resolve API hostnames + +Validation script: `scripts/validate.sh` (TBD). + +--- + +## D-012: Snapshot strategy + +**Decision:** Two baseline snapshots. + +- **Snapshot 1:** Post-deploy, post-validation, pre-tenant-resources. Clean + cloud state — what a fresh install looks like. +- **Snapshot 2:** Post-tenant-setup. Includes domain1, project1, user1, + openrc, flavors, base images (noble-amd64), keypair. Restore point for + tenant work. + +Snapshots are KVM/qcow2-level on the jumphost hypervisor. Per-VM. + +--- + +## D-013: Clean teardown of existing capi-mgmt + +**Decision:** Before destroying the OpenStack model, gracefully delete the +CAPI workload cluster on capi-mgmt.maas to allow OpenStack resources (LBs, +FIPs, volumes) to be cleaned up properly by CAPI controllers. + +**Steps:** `kubectl delete cluster capi-mgmt-cluster` → wait for CAPI to +clean up tenant-side OpenStack resources → `juju destroy-model openstack +--destroy-storage --no-prompt`. + +**Preserved across rebuild:** capi-mgmt.maas bootstrap k3s + CAPI controllers +themselves. Re-used as the Magnum CAPI mgmt plane post-deploy. + +--- + +## D-014: Repository storage location and naming + +**Decision:** Self-hosted GitBucket at `git.baldurkeep.com`. + +**Repo path:** `jesse.austin/openstack-caracal-ipv4` (v1; IPv4-only). + +**v2 repository:** TBD when v2 work begins. Two viable paths: +sibling repo `openstack-caracal-ipv6` or `openstack-caracal-dualstack`, OR +`v2` branch in this repo with an `overlays/v2-dualstack.yaml`. The +single-repo-with-branch approach preserves history of what changed v1→v2 +together; the sibling-repo approach keeps v1 frozen as a reference once v2 +is in motion. + +**Branching strategy:** `main` is canonical. Per-phase work in feature +branches when a deploy is in progress; merge back to `main` at successful +validation. + +--- + +## D-015: v1 / v2 Fork + +**Decision:** Caracal testcloud ships in two iterations. + +**v1 (this repository, `openstack-caracal-ipv4`):** IPv4-only Caracal on +existing MAAS-provisioned network layout. Proves the bundle, Option B +binding fix, Magnum CAPI graft, Designate-from-day-one, hacluster relation +pattern, and validation framework. Ships first. + +**v2 (deferred):** Adds IPv6 / dual-stack per D-004. Requires upstream +router infrastructure to be IPv6-capable, which is not currently the case +in this environment. v2 work begins after v1 validation passes AND +router-side IPv6 is in place. + +**Rationale:** Decoupling the OpenStack-side rebuild from the network-side +IPv6 readiness lets us prove the more-important architectural fix (Option +B) without waiting on infrastructure work outside the OpenStack +deployment's control. The IPv6 design intent is preserved as NetBox +Reservation-status entries (per D-010 and `netbox/ipv6-mark-reserved.py`). + +**v1→v2 migration scope (forward-look):** + +- Re-IP roles per D-004 (add IPv6 sibling to Metal/Provider/LBaaS; move + Data/Storage/Replication to IPv6-only) +- Move host management IPs from storage to Metal (D-004a) +- Re-bind charms to listen on both families where dual-stack +- Add AAAA records to Designate zones +- Add tenant IPv6 pool carve-outs + +--- + +## D-016: IPv4 tenant pool — hybrid model (v1) + +**Decision:** NetBox owns one upstream IPv4 tenant pool prefix for VR0 DC0. +Per-project tenant subnets are Neutron-managed within that pool and are NOT +modeled in NetBox. + +**Pool allocation:** `10.20.0.0/16` (default; configurable in +`netbox/ipv4-prefixes-import.py`). 65,536 addresses; 256 `/24`s available +for per-project tenant subnets. Modeled under VR0 DC0 with role +`openstack-tenant`. + +**Per-project allocation pattern (operationally):** + +When a project is created, allocate a /24 from the pool. Operator records +the allocation in tenant-setup runbook output but does NOT create a NetBox +prefix entry for it. Suggested convention: `10.20..0/24`, +starting with `10.20.1.0/24` for project1, etc. + +**Rationale (Option C from the discussion):** + +- Option A (NetBox-modeled per-project) — full IPAM rigor; high friction + for tenant lifecycle; round-trips to NetBox for ephemeral tenants. +- Option B (Neutron-only, no NetBox standing) — minimum friction; loses + upstream visibility of total tenant footprint; violates D-010 in spirit. +- Option C (hybrid, chosen) — NetBox documents what space is reserved for + tenants and prevents accidental collision with infra ranges; Neutron + owns the lifecycle of individual tenant subnets without NetBox round-trips. + +**Constraint:** Tenant CIDRs MUST be within the pool. The pre-flight +checklist (`scripts/pre-flight-checks.sh`) should assert that proposed +tenant subnets fall within the modeled pool. + +**v2-scope counterpart:** IPv6 tenant pool `2602:f3e2:ff:0::/56` +(NetBox-modeled, Reservation status in v1) becomes active in v2 with the +same hybrid model — pool has NetBox standing, per-project IPv6 subnets +Neutron-managed. + +--- + +## Known bugs to avoid in bundle drafting + +From prior bundle review work — these are anti-patterns: + +- `magnum-shared-db` missing colon — causes a relation endpoint syntax + error, deploy-blocking. Bundle must use `- - magnum:shared-db` (with the + colon). +- Empty `osd-devices` YAML anchor referenced by multiple ceph-osd applications. +- `ovn-chassis` binding `overlay-suffix` — invalid binding name. Correct + value is `data`. +- GUI annotation collision between NUMA-split ceph-osd apps (not applicable + to testcloud since we don't NUMA-split, but flagged for Roosevelt). +- Hardcoded NIC name in `bridge-interface-mappings`. Use MAC where possible. +- `openstack -f value` column ordering — column order is not guaranteed; + use `-c -f value` for single-column output. +- Snap confinement: `openstackclients` snap has home-only interface; + commands cannot read paths under `/tmp`. File paths must resolve under + `$HOME`. +- Non-ASCII characters in `local_settings.d` overrides cause silent daemon + failures in Horizon. + +--- + +## Change log + +| Date | Change | Reference | +|---|---|---| +| 2026-05-22 | Initial document captured | Caracal rebuild planning session | +| 2026-05-22 | D-015 v1/v2 fork added; D-004 and D-004a marked v2-scope; D-016 IPv4 tenant pool hybrid model added; D-014 updated with new repo name | v1/v2 fork session (this update) | diff --git a/netbox/README.md b/netbox/README.md new file mode 100644 index 0000000..dea1732 --- /dev/null +++ b/netbox/README.md @@ -0,0 +1,80 @@ +# NetBox imports — v1 scope + +This directory contains the NetBox import scripts required for v1 (IPv4-only) +deployment. Each script is idempotent; re-running is safe. + +## Scripts + +### `ipv4-prefixes-import.py` + +Adds the IPv4 prefixes required for v1: + +- Metal `/22` (10.12.8.0/22) +- Provider `/22` (10.12.4.0/22) with Provider IP Ranges (FIP pool + API VIPs) +- LBaaS Mgmt `/22` (10.12.32.0/22) +- IPv4 tenant pool `/16` (default 10.20.0.0/16; configurable via + `TENANT_POOL_CIDR` env var) + +Run order: first. + +``` +NETBOX_URL=https://netbox.baldurkeep.com NETBOX_TOKEN= \ + python3 ipv4-prefixes-import.py +``` + +### `ipv6-mark-reserved.py` + +Marks existing IPv6 prefixes scoped to VR0 DC0 as **reservation status** per +**D-015** (v1/v2 fork). The IPv6 entries from earlier session work are NOT +deleted — they're preserved to document v2 design intent without implying +they are active during v1. + +Run order: second (after IPv4 prefixes are in place). + +``` +NETBOX_URL=https://netbox.baldurkeep.com NETBOX_TOKEN= \ + python3 ipv6-mark-reserved.py +``` + +Use `--dry-run` to preview without changes. Use `--revert` only when v2 +work begins (sets IPv6 prefixes back to active). + +## Deferred to v2 + +### VLAN imports (`vlans-import.py` — removed) + +Per the v1/v2 fork session Q2, additional VR0 DC0-VLANs group entries +(VIDs 50, 200, 220, 221, 222, 230, 260, 270) are **not imported in v1**. + +Rationale: MAAS currently uses untagged-per-fabric on the existing network +layout. Modeling additional VLAN entries in NetBox without corresponding +network-side VLAN tagging would be misleading documentation. The VID 240 +(OS-Provider) entry imported during prior session work is sufficient for v1 +since it pairs with the Provider /22 prefix. + +When v2 work begins, the VLAN import script will be re-introduced under +the v2 design with VLAN tagging actually in play. + +### IPv6 active prefixes + +The IPv6 prefix entries previously imported (Provider /60 + sub-/64s, Metal +/60, Data /60, Storage /60, Replication /60, LBaaS Mgmt /60, OOB /60) are +**reserved** in NetBox via `ipv6-mark-reserved.py`. They become active +again under the planned `/60 → /64` refactor and full re-import when v2 +work begins. + +## NetBox version + +These scripts assume NetBox 4.x: + +- Prefix scope: `scope_type="dcim.site"` + `scope_id=` (not legacy `site=` field) +- Prefix read: `p.scope` (not `p.site`) +- Status field: `"reserved"` / `"active"` as lowercase choice slugs + +If your NetBox is older, the scripts will need adjustment. + +## Sequence for v1 deploy gate + +1. `python3 ipv4-prefixes-import.py` — succeeds; verification block clean +2. `python3 ipv6-mark-reserved.py` — succeeds; all IPv6 prefixes now reserved +3. Bundle drafting can proceed; values traced to NetBox are now authoritative diff --git a/netbox/ipv4-prefixes-import.py b/netbox/ipv4-prefixes-import.py new file mode 100644 index 0000000..08566b1 --- /dev/null +++ b/netbox/ipv4-prefixes-import.py @@ -0,0 +1,299 @@ +#!/usr/bin/env python3 +""" +NetBox IPv4 prefix import for VR0 DC0 Omega Cloud (v1). + +Adds the IPv4 prefixes required for v1 (IPv4-only) deployment: + - Metal 10.12.8.0/22 (charm-to-charm relations) + - Provider 10.12.4.0/22 (ext_net FIPs + API VIPs per Option B) + - LBaaS Mgmt 10.12.32.0/22 (Octavia controller-to-amphora) + - Tenant pool 10.20.0.0/16 (v1 IPv4 hybrid per D-016; configurable) + +Within the Provider /22, two IP Ranges are created: + - 10.12.4.10 - 10.12.4.223 (FIP pool — Neutron allocates from here) + - 10.12.4.224 - 10.12.4.254 (API VIPs — exclude from Neutron pools) + +Per D-015 (v1/v2 fork), this script is v1-scope only and adds NO IPv6 prefixes. +Existing IPv6 entries (Provider /60 carve-outs, etc.) are handled separately +by netbox/ipv6-mark-reserved.py (sets them to Reservation status). + +Per D-016 (IPv4 tenant pool hybrid), the tenant pool default is 10.20.0.0/16. +Override via the TENANT_POOL_CIDR environment variable if a different range +is preferred. + +NetBox version: 4.x (uses scope_type / scope_id, not legacy 'site' field). + +Usage: + NETBOX_URL=https://netbox.baldurkeep.com NETBOX_TOKEN= \\ + python3 ipv4-prefixes-import.py + + # Override tenant pool: + TENANT_POOL_CIDR=10.30.0.0/16 \\ + NETBOX_URL=... NETBOX_TOKEN=... python3 ipv4-prefixes-import.py + +Idempotent: re-running is safe; existing prefixes/ranges are detected and +skipped with a message. To force update, pass --update on the command line. + +Verification block at end prints final state. +""" + +from __future__ import annotations + +import argparse +import os +import sys + +try: + import pynetbox +except ImportError: + sys.stderr.write("ERROR: pynetbox not installed. pip install pynetbox\n") + sys.exit(1) + +# ----------------------------------------------------------------------------- +# Configuration — values traced to NetBox via docs/design-decisions.md (D-003, D-004) +# ----------------------------------------------------------------------------- + +SITE_SLUG = "vr0-dc0" # VR0 DC0 site +SITE_NAME = "VR0 DC0" # human-readable for verification output + +# Role slugs as they appear in NetBox. Verify with: nb.ipam.roles.all() +ROLE_PROVIDER = "provider" +ROLE_METAL = "metal" +ROLE_LBAAS_MGMT = "lbaas-management" +ROLE_OPENSTACK_TENANT = "openstack-tenant" + +# VLAN: OS-Provider VID 240 in VR0 DC0-VLANs group (already exists from prior import) +PROVIDER_VLAN_VID = 240 +PROVIDER_VLAN_GROUP_SLUG = "vr0-dc0-vlans" + +# Default IPv4 tenant pool (per D-016). Configurable via TENANT_POOL_CIDR env var. +# Default 10.20.0.0/16 — 256 /24s available for per-project Neutron-managed tenant subnets. +DEFAULT_TENANT_POOL_CIDR = "10.20.0.0/16" +TENANT_POOL_CIDR = os.environ.get("TENANT_POOL_CIDR", DEFAULT_TENANT_POOL_CIDR) + +# Prefixes to create (CIDR -> {role_slug, description, optional vlan}) +# All entries are v1-scope IPv4. v2 IPv6 entries are NOT created here; +# see netbox/ipv6-mark-reserved.py for handling of existing IPv6 entries. +IPV4_PREFIXES = [ + { + "prefix": "10.12.4.0/22", + "role_slug": ROLE_PROVIDER, + "description": "VR0 DC0 Provider (ext_net FIPs + API VIPs per Option B / D-003)", + "vlan_vid": PROVIDER_VLAN_VID, + "vlan_group_slug": PROVIDER_VLAN_GROUP_SLUG, + }, + { + "prefix": "10.12.8.0/22", + "role_slug": ROLE_METAL, + "description": "VR0 DC0 Metal (charm-to-charm relations)", + "vlan_vid": None, + "vlan_group_slug": None, + }, + { + "prefix": "10.12.32.0/22", + "role_slug": ROLE_LBAAS_MGMT, + "description": "VR0 DC0 LBaaS Management (Octavia controller to amphora)", + "vlan_vid": None, + "vlan_group_slug": None, + }, + { + "prefix": TENANT_POOL_CIDR, + "role_slug": ROLE_OPENSTACK_TENANT, + "description": ( + "VR0 DC0 OpenStack Tenant pool (v1 IPv4 hybrid per D-016) — " + "Neutron-managed per-project /24 carve-outs within this pool" + ), + "vlan_vid": None, + "vlan_group_slug": None, + }, +] + +# IP Ranges to create within Provider /22 (start, end -> description) +PROVIDER_IP_RANGES = [ + { + "start": "10.12.4.10/22", + "end": "10.12.4.223/22", + "role_slug": ROLE_PROVIDER, + "description": "FIP pool — Neutron allocates floating IPs from this range", + }, + { + "start": "10.12.4.224/22", + "end": "10.12.4.254/22", + "role_slug": ROLE_PROVIDER, + "description": "API VIPs — charm hacluster VIPs (exclude from Neutron allocation_pools)", + }, +] + +# ----------------------------------------------------------------------------- +# Helpers +# ----------------------------------------------------------------------------- + + +def die(msg: str, code: int = 1) -> None: + sys.stderr.write(f"ERROR: {msg}\n") + sys.exit(code) + + +def get_nb() -> "pynetbox.api": + url = os.environ.get("NETBOX_URL") + token = os.environ.get("NETBOX_TOKEN") + if not url: + die("NETBOX_URL environment variable not set") + if not token: + die("NETBOX_TOKEN environment variable not set") + nb = pynetbox.api(url, token=token) + # Sanity check + try: + _ = nb.status() + except Exception as exc: # noqa: BLE001 + die(f"Could not reach NetBox at {url}: {exc}") + return nb + + +def find_site(nb, slug: str): + site = nb.dcim.sites.get(slug=slug) + if site is None: + die(f"Site with slug '{slug}' not found in NetBox") + return site + + +def find_role(nb, slug: str): + role = nb.ipam.roles.get(slug=slug) + if role is None: + die(f"IPAM role with slug '{slug}' not found in NetBox") + return role + + +def find_vlan(nb, vid: int, group_slug: str): + group = nb.ipam.vlan_groups.get(slug=group_slug) + if group is None: + die(f"VLAN group with slug '{group_slug}' not found") + vlan = nb.ipam.vlans.get(vid=vid, group_id=group.id) + if vlan is None: + die(f"VLAN VID {vid} not found in group '{group_slug}'") + return vlan + + +def create_or_report_prefix(nb, cfg: dict, site, update: bool = False) -> None: + cidr = cfg["prefix"] + role = find_role(nb, cfg["role_slug"]) + vlan = None + if cfg.get("vlan_vid") is not None: + vlan = find_vlan(nb, cfg["vlan_vid"], cfg["vlan_group_slug"]) + + existing = nb.ipam.prefixes.get(prefix=cidr) + payload = { + "prefix": cidr, + "role": role.id, + "description": cfg["description"], + "scope_type": "dcim.site", + "scope_id": site.id, + } + if vlan is not None: + payload["vlan"] = vlan.id + + if existing is None: + created = nb.ipam.prefixes.create(**payload) + print(f" CREATED prefix {cidr} (id={created.id}) role={cfg['role_slug']}") + else: + if update: + existing.update(payload) + print(f" UPDATED prefix {cidr} (id={existing.id}) role={cfg['role_slug']}") + else: + print(f" EXISTS prefix {cidr} (id={existing.id}) — skipped (use --update to overwrite)") + + +def create_or_report_iprange(nb, cfg: dict, update: bool = False) -> None: + start = cfg["start"] + end = cfg["end"] + role = find_role(nb, cfg["role_slug"]) + + # Look for an existing range with these endpoints + existing_list = list(nb.ipam.ip_ranges.filter(start_address=start, end_address=end)) + payload = { + "start_address": start, + "end_address": end, + "role": role.id, + "description": cfg["description"], + } + + if not existing_list: + created = nb.ipam.ip_ranges.create(**payload) + print(f" CREATED IP Range {start} - {end} (id={created.id})") + else: + existing = existing_list[0] + if update: + existing.update(payload) + print(f" UPDATED IP Range {start} - {end} (id={existing.id})") + else: + print(f" EXISTS IP Range {start} - {end} (id={existing.id}) — skipped") + + +def verify(nb, site) -> None: + print() + print("=" * 72) + print(f"Verification — final state for site {SITE_NAME} (id={site.id})") + print("=" * 72) + + print("\nIPv4 Prefixes:") + for cfg in IPV4_PREFIXES: + cidr = cfg["prefix"] + p = nb.ipam.prefixes.get(prefix=cidr) + if p is None: + print(f" MISSING {cidr}") + continue + scope_id = getattr(p, "scope_id", None) + role_slug = p.role.slug if p.role else "(none)" + vlan_str = f"vlan={p.vlan.vid}" if p.vlan else "vlan=none" + site_ok = "OK" if scope_id == site.id else f"SCOPE-MISMATCH(id={scope_id})" + print(f" {cidr.ljust(18)} role={role_slug.ljust(20)} {vlan_str.ljust(12)} {site_ok}") + + print("\nIP Ranges within Provider /22:") + for cfg in PROVIDER_IP_RANGES: + ranges = list(nb.ipam.ip_ranges.filter(start_address=cfg["start"], end_address=cfg["end"])) + if not ranges: + print(f" MISSING {cfg['start']} - {cfg['end']}") + continue + r = ranges[0] + role_slug = r.role.slug if r.role else "(none)" + print(f" {cfg['start']} - {cfg['end']} role={role_slug} (id={r.id})") + + +# ----------------------------------------------------------------------------- +# Main +# ----------------------------------------------------------------------------- + + +def main() -> int: + parser = argparse.ArgumentParser(description=__doc__.split("\n\n", 1)[0]) + parser.add_argument( + "--update", + action="store_true", + help="Update existing prefixes/ranges in place (default: skip if exists)", + ) + parser.add_argument( + "--verify-only", + action="store_true", + help="Skip writes; only print verification block", + ) + args = parser.parse_args() + + nb = get_nb() + site = find_site(nb, SITE_SLUG) + print(f"Connected. Site '{SITE_NAME}' (id={site.id}).") + + if not args.verify_only: + print("\nIPv4 Prefixes:") + for cfg in IPV4_PREFIXES: + create_or_report_prefix(nb, cfg, site, update=args.update) + + print("\nProvider IP Ranges:") + for cfg in PROVIDER_IP_RANGES: + create_or_report_iprange(nb, cfg, update=args.update) + + verify(nb, site) + print("\nDone.") + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/netbox/ipv6-mark-reserved.py b/netbox/ipv6-mark-reserved.py new file mode 100644 index 0000000..10477e1 --- /dev/null +++ b/netbox/ipv6-mark-reserved.py @@ -0,0 +1,196 @@ +#!/usr/bin/env python3 +""" +NetBox IPv6 entries — mark as Reservation status (v1/v2 fork support). + +Per D-015 (v1/v2 fork) and Q3 from the v1/v2 fork session: existing IPv6 +prefixes scoped to VR0 DC0 are NOT decommissioned. They remain in NetBox to +document v2 design intent, but their status is set to "reserved" rather +than "active" so it is clear they are not in use during the v1 deployment. + +The expected behavior: + - Find all IPv6 prefixes scoped to the VR0 DC0 site. + - For each, set status = "reserved" (NetBox's standard reservation state). + - Optionally append a description suffix noting v2-scope. + - Print a verification block. + +Idempotent: prefixes already in "reserved" status are detected and reported +without modification. Pass --revert to set them back to "active" (use only +when v2 work begins). + +NetBox version: 4.x. + +Usage: + NETBOX_URL=https://netbox.baldurkeep.com NETBOX_TOKEN= \\ + python3 ipv6-mark-reserved.py + + # Preview without changes: + NETBOX_URL=... NETBOX_TOKEN=... python3 ipv6-mark-reserved.py --dry-run + + # Revert to active (only when v2 work begins): + NETBOX_URL=... NETBOX_TOKEN=... python3 ipv6-mark-reserved.py --revert + +WARNING: This script touches only IPv6 prefixes (those with ':' in the CIDR) +scoped to the VR0 DC0 site. IPv4 prefixes are NEVER modified. +""" + +from __future__ import annotations + +import argparse +import os +import sys + +try: + import pynetbox +except ImportError: + sys.stderr.write("ERROR: pynetbox not installed. pip install pynetbox\n") + sys.exit(1) + +SITE_SLUG = "vr0-dc0" +SITE_NAME = "VR0 DC0" + +# NetBox status value for reservation. NetBox 4.x exposes these as choices +# on the prefix.status field. The canonical lowercase slug is "reserved". +STATUS_RESERVED = "reserved" +STATUS_ACTIVE = "active" + +# Description suffix appended when marking as reserved. Idempotent: only +# appended if not already present. +V2_SUFFIX = " [v2-scope; reserved per D-015]" + + +def die(msg: str, code: int = 1) -> None: + sys.stderr.write(f"ERROR: {msg}\n") + sys.exit(code) + + +def get_nb(): + url = os.environ.get("NETBOX_URL") + token = os.environ.get("NETBOX_TOKEN") + if not url: + die("NETBOX_URL environment variable not set") + if not token: + die("NETBOX_TOKEN environment variable not set") + nb = pynetbox.api(url, token=token) + try: + _ = nb.status() + except Exception as exc: + die(f"Could not reach NetBox at {url}: {exc}") + return nb + + +def find_site(nb, slug: str): + site = nb.dcim.sites.get(slug=slug) + if site is None: + die(f"Site with slug '{slug}' not found in NetBox") + return site + + +def is_ipv6(prefix_str: str) -> bool: + """Cheap and safe check — IPv6 prefixes contain ':' in the CIDR.""" + return ":" in prefix_str + + +def get_status_value(p) -> str: + """ + Extract a comparable lowercase status slug from a pynetbox prefix. + pynetbox's status field can be a string or a Choices object depending + on the API and library version. We normalize to lowercase string. + """ + s = getattr(p, "status", None) + if s is None: + return "" + # pynetbox Choices object has .value; string status is just the string + val = getattr(s, "value", None) + if val is None: + val = str(s) + return str(val).lower() + + +def find_ipv6_prefixes_at_site(nb, site): + """Return list of IPv6 prefixes scoped to the given site.""" + all_prefixes = list(nb.ipam.prefixes.filter(scope_id=site.id, scope_type="dcim.site")) + return [p for p in all_prefixes if is_ipv6(p.prefix)] + + +def update_status(nb, p, target_status: str, dry_run: bool) -> str: + """Update prefix status, optionally appending/removing the v2 suffix. + + Returns one of: "CREATED-RESERVED", "ALREADY-RESERVED", "REVERTED-ACTIVE", + "ALREADY-ACTIVE", "DRY-RUN-WOULD-CHANGE", "DRY-RUN-NO-CHANGE". + """ + current = get_status_value(p) + cur_desc = p.description or "" + + if target_status == STATUS_RESERVED: + if current == STATUS_RESERVED: + return "ALREADY-RESERVED" + new_desc = cur_desc if V2_SUFFIX in cur_desc else (cur_desc + V2_SUFFIX) + payload = {"status": STATUS_RESERVED, "description": new_desc.strip()} + elif target_status == STATUS_ACTIVE: + if current == STATUS_ACTIVE: + return "ALREADY-ACTIVE" + new_desc = cur_desc.replace(V2_SUFFIX, "").strip() + payload = {"status": STATUS_ACTIVE, "description": new_desc} + else: + die(f"Unknown target_status: {target_status}") + + if dry_run: + return "DRY-RUN-WOULD-CHANGE" + + p.update(payload) + return "CREATED-RESERVED" if target_status == STATUS_RESERVED else "REVERTED-ACTIVE" + + +def main() -> int: + parser = argparse.ArgumentParser(description=__doc__.split("\n\n", 1)[0]) + parser.add_argument( + "--revert", + action="store_true", + help="Set IPv6 prefixes back to 'active' (use only when v2 work begins)", + ) + parser.add_argument( + "--dry-run", + action="store_true", + help="Preview changes without modifying NetBox", + ) + args = parser.parse_args() + + target_status = STATUS_ACTIVE if args.revert else STATUS_RESERVED + + nb = get_nb() + site = find_site(nb, SITE_SLUG) + print(f"Connected. Site '{SITE_NAME}' (id={site.id}).") + print(f"Target status: {target_status}") + print(f"Dry-run: {args.dry_run}") + + ipv6_prefixes = find_ipv6_prefixes_at_site(nb, site) + if not ipv6_prefixes: + print(f"\nNo IPv6 prefixes found scoped to site '{SITE_NAME}'.") + print("Nothing to do.") + return 0 + + print(f"\nFound {len(ipv6_prefixes)} IPv6 prefix(es) at this site:") + for p in ipv6_prefixes: + print(f" - {p.prefix.ljust(32)} status={get_status_value(p).ljust(12)} (id={p.id})") + + print("\nProcessing:") + for p in ipv6_prefixes: + result = update_status(nb, p, target_status, args.dry_run) + print(f" {p.prefix.ljust(32)} {result}") + + # Verification block + print() + print("=" * 72) + print(f"Verification — IPv6 prefixes at site {SITE_NAME}") + print("=" * 72) + final = find_ipv6_prefixes_at_site(nb, site) + for p in final: + status_str = get_status_value(p) + print(f" {p.prefix.ljust(32)} status={status_str.ljust(12)} (id={p.id})") + print(f"\nTotal: {len(final)} IPv6 prefix(es).") + + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/overlays/vr0-dc0-testcloud.yaml b/overlays/vr0-dc0-testcloud.yaml new file mode 100644 index 0000000..f7c8c79 --- /dev/null +++ b/overlays/vr0-dc0-testcloud.yaml @@ -0,0 +1,20 @@ +# Testcloud overlay for VR0 DC0 Omega Cloud +# +# STATUS: PLACEHOLDER — drafted alongside bundle.yaml. +# +# This overlay pins values specific to the 4-VM KVM testcloud at jumphost +# vopenstack-jesse. Roosevelt bare-metal would use a different overlay +# (overlays/roosevelt-prod.yaml — not in this repo) that swaps num_units to 3+, +# adjusts machine constraints to MAAS tags, and removes any KVM-specific +# config tuned for libvirt bridges. +# +# Per D-009, hacluster relations remain in the main bundle.yaml even though +# num_units=1 on testcloud. The overlay only changes num_units, not the +# relation graph. +# +# TODO during bundle drafting: +# - [ ] num_units=1 overrides per API charm +# - [ ] machine constraints (system-id pinning for openstack0-3) +# - [ ] bridge-interface-mappings for libvirt virbr1 (provider) +# - [ ] storage-backend config for cinder/glance pointing at Ceph +# - [ ] Octavia lb-mgmt-* network values (per LBaaS Management VLAN/prefix) diff --git a/runbooks/00-pre-deploy.md b/runbooks/00-pre-deploy.md new file mode 100644 index 0000000..aa07d86 --- /dev/null +++ b/runbooks/00-pre-deploy.md @@ -0,0 +1,142 @@ +# Runbook 00 — Pre-Deploy + +## Purpose + +Prepare for a clean Caracal rebuild of the VR0 DC0 Omega Cloud. Capture all +state needed for rollback, gracefully tear down dependent workloads, and verify +the destination environment is ready before destroying the existing OpenStack +model. + +## Prerequisites + +- SSH access to jumphost `vopenstack-jesse` as `jessea123` +- `admin-openrc` and `user1-openrc` available in `$HOME` +- Access to the Juju controller hosting the `openstack` model +- Access to the capi-mgmt.maas k3s cluster (kubeconfig present) +- NetBox IPv4 imports completed (per `netbox/ipv4-prefixes-import.py`) +- NetBox VLAN imports completed (per `netbox/vlans-import.py`) + +## Phase 1 — Verify NetBox readiness (gating) + +Run the verification path of the NetBox import scripts. Confirm all entries +appear correctly scoped to VR0 DC0. + +```bash +cd ~/vr0-dc0-caracal +NETBOX_URL=https://netbox.baldurkeep.com NETBOX_TOKEN= \ + python3 netbox/ipv4-prefixes-import.py --verify-only +NETBOX_URL=https://netbox.baldurkeep.com NETBOX_TOKEN= \ + python3 netbox/vlans-import.py --verify-only +``` + +Expected: all prefixes and VLANs report scope-OK, no MISSING entries. + +## Phase 2 — Capture current state + +Backups needed for potential rollback: + +```bash +# Vault unseal keys and root CA cert +juju ssh vault/0 -- sudo cat /var/snap/vault/common/vault.crt > ~/backups/$(date +%F)/vault-root-ca.crt +# (Unseal keys MUST be on file from initial Vault setup; verify presence) +ls -la ~/.vault-keys + +# Export current bundle +juju export-bundle --model openstack > ~/backups/$(date +%F)/bundle-pre-rebuild.yaml + +# Snapshot of current 'juju status' +juju status --model openstack --format=yaml > ~/backups/$(date +%F)/juju-status-pre-rebuild.yaml + +# Inventory of FIPs and tenant resources we might want to recreate +source ~/admin-openrc +openstack floating ip list -c "Floating IP Address" -c "Fixed IP Address" \ + -c "Project" -f csv > ~/backups/$(date +%F)/floating-ips.csv +openstack server list --all-projects -c ID -c Name -c Project -c Status -f csv \ + > ~/backups/$(date +%F)/servers.csv +openstack network list --all-projects -c ID -c Name -c Project -f csv \ + > ~/backups/$(date +%F)/networks.csv +openstack loadbalancer list -c id -c name -c project_id -c vip_address -f csv \ + > ~/backups/$(date +%F)/loadbalancers.csv +``` + +## Phase 3 — KVM snapshots of openstack0-3 + +From the jumphost (which is the hypervisor): + +```bash +for vm in openstack0 openstack1 openstack2 openstack3; do + sudo virsh snapshot-create-as --domain "$vm" \ + --name "pre-caracal-rebuild-$(date +%F)" \ + --description "Pre-Caracal rebuild baseline" \ + --atomic +done +sudo virsh snapshot-list openstack0 +``` + +These snapshots are the disaster-recovery point. + +## Phase 4 — Graceful CAPI workload teardown (D-013) + +Delete the CAPI workload cluster cleanly so its OpenStack resources (LBs, FIPs, +volumes, Octavia members) are released by CAPI controllers before model destroy. + +```bash +export KUBECONFIG=~/magnum-capi/phase3/capi-mgmt-cluster.kubeconfig +# (Adjust path if kubeconfig has moved) + +# Delete the workload cluster — CAPI handles tenant OpenStack cleanup +kubectl delete cluster capi-mgmt-cluster -n default +# Wait for finalizers; this may take ~10 minutes +kubectl wait --for=delete cluster/capi-mgmt-cluster -n default --timeout=15m +``` + +Verify on the OpenStack side that resources were released: + +```bash +source ~/admin-openrc +openstack server list --all-projects | grep -i capi || echo "No CAPI servers remaining" +openstack loadbalancer list | grep -i capi || echo "No CAPI LBs remaining" +openstack floating ip list -c "Floating IP Address" -c "Fixed IP Address" -f csv +``` + +## Phase 5 — Preserve capi-mgmt.maas itself + +The bootstrap k3s + CAPI controllers on `capi-mgmt.maas` are NOT destroyed — +they will be re-used post-rebuild as the Magnum CAPI mgmt plane. Verify the +controllers are still healthy: + +```bash +ssh capi-mgmt.maas -- sudo kubectl --kubeconfig /etc/rancher/k3s/k3s.yaml \ + get pods -A +``` + +Confirm: +- `capi-system` namespace pods Running +- `capo-system` (CAPI OpenStack provider) pods Running +- `cert-manager` pods Running +- `orc-system` (OpenStack Resource Controller) pods Running + +## Phase 6 — Final go/no-go checklist + +Do not proceed to `runbooks/01-destroy-model.md` until all of the following pass: + +- [ ] NetBox verification clean +- [ ] Vault unseal keys backed up and verified readable +- [ ] `bundle-pre-rebuild.yaml` exists and is non-empty +- [ ] `juju-status-pre-rebuild.yaml` shows desired-pre-destroy state captured +- [ ] All four KVM snapshots created (`virsh snapshot-list` confirms) +- [ ] CAPI workload cluster deletion completed (`kubectl get cluster` returns + "no resources found") +- [ ] OpenStack-side resources from CAPI workload are released (no orphaned LBs, + FIPs, volumes) +- [ ] capi-mgmt.maas k3s cluster controllers all Running + +## Notes + +- Snapshot disk space consumption can grow significantly during the rebuild + window. Verify free space on `/var/lib/libvirt/images` prior to running + the rebuild deploy. +- If Vault unseal keys cannot be located, STOP. A failed Vault re-init without + the original keys means lost issued certificates and is destructive to any + data sealed under the existing root key. This MUST be confirmed before model + destroy. diff --git a/runbooks/01-destroy-model.md b/runbooks/01-destroy-model.md new file mode 100644 index 0000000..5a41968 --- /dev/null +++ b/runbooks/01-destroy-model.md @@ -0,0 +1,23 @@ +# Runbook 01 — Destroy Existing OpenStack Model + +**STATUS: PLACEHOLDER** — drafted alongside bundle drafting. + +## Purpose + +Cleanly destroy the existing Bobcat `openstack` model, freeing the Juju +controller to host the new Caracal model. + +## Prerequisites + +- All steps in `00-pre-deploy.md` completed including go/no-go checklist +- Vault unseal keys backed up +- KVM snapshots in place +- CAPI workload cluster gracefully torn down + +## TODO + +- [ ] `juju destroy-model openstack --destroy-storage --no-prompt` +- [ ] Verify storage cleanup (no orphaned LXD storage pools, no orphaned + volumes on Ceph) +- [ ] Verify MAAS-side machine state (machines back to Ready, not Deployed) +- [ ] Clean up any stale Juju agent state on KVM hosts if needed diff --git a/runbooks/02-deploy.md b/runbooks/02-deploy.md new file mode 100644 index 0000000..4a52845 --- /dev/null +++ b/runbooks/02-deploy.md @@ -0,0 +1,23 @@ +# Runbook 02 — Deploy New Caracal Bundle + +**STATUS: PLACEHOLDER** — drafted alongside bundle.yaml. + +## Purpose + +Deploy the new Charmed OpenStack Caracal bundle and wait for the cloud to +settle in `active/idle`. + +## Prerequisites + +- Runbook 01 complete (model destroyed, MAAS state clean) +- `bundle.yaml` and `overlays/vr0-dc0-testcloud.yaml` drafted and reviewed +- `scripts/pre-flight-checks.sh` passes + +## TODO + +- [ ] `juju add-model openstack` +- [ ] `juju deploy ./bundle.yaml --overlay overlays/vr0-dc0-testcloud.yaml --trust` +- [ ] Wait for settle (`juju-wait` or `juju status --watch 30s`) +- [ ] Pause-points for Vault init (per Runbook 03) +- [ ] Acceptance: all charms `active/idle` modulo Vault (sealed) and any + charms waiting on Vault certificates diff --git a/runbooks/03-vault-init.md b/runbooks/03-vault-init.md new file mode 100644 index 0000000..40db379 --- /dev/null +++ b/runbooks/03-vault-init.md @@ -0,0 +1,24 @@ +# Runbook 03 — Vault Initialization + +**STATUS: PLACEHOLDER** — drafted during deploy phase. + +## Purpose + +Initialize the Vault instance(s), unseal, authorize, and let certificate +relations resolve so dependent charms reach `active/idle`. + +## Prerequisites + +- Bundle deployed; Vault charm in `blocked` waiting for init +- etcd cluster in `active/idle` (Vault HA backend per D-006) +- easyrsa active (TLS bootstrap) + +## TODO + +- [ ] `juju run vault/leader generate-root-ca` — capture root CA cert +- [ ] `vault operator init -key-shares=5 -key-threshold=3` — capture keys +- [ ] Unseal with 3 of 5 keys +- [ ] `juju run vault/leader authorize-charm token=` +- [ ] Verify all `:certificates` relations complete (no charms stuck + waiting on certs) +- [ ] Store unseal keys in `~/.vault-keys/` (chmod 600); back up diff --git a/runbooks/04-magnum-domain.md b/runbooks/04-magnum-domain.md new file mode 100644 index 0000000..b400a18 --- /dev/null +++ b/runbooks/04-magnum-domain.md @@ -0,0 +1,21 @@ +# Runbook 04 — Magnum Keystone Domain Setup + +**STATUS: PLACEHOLDER** — drafted post-deploy. + +## Purpose + +Run the magnum charm's `domain-setup` action to create the Keystone domain, +trust role, and service user that Magnum requires for cluster operations. + +## Prerequisites + +- Magnum charm reached `active/idle` post Vault init +- Keystone reachable from jumphost via FQDN + +## TODO + +- [ ] `juju run magnum/leader domain-setup --wait=10m` +- [ ] Verify creation in Keystone: + `openstack domain show magnum` + `openstack user show magnum_domain_admin --domain magnum` +- [ ] Acceptance: domain present, trust role assigned, charm in active/idle diff --git a/runbooks/05-magnum-capi-driver.md b/runbooks/05-magnum-capi-driver.md new file mode 100644 index 0000000..cbfe158 --- /dev/null +++ b/runbooks/05-magnum-capi-driver.md @@ -0,0 +1,35 @@ +# Runbook 05 — Magnum CAPI Helm Driver Graft + +**STATUS: PLACEHOLDER** — drafted post-deploy. Per D-007 Layer B. + +## Purpose + +Install the stackhpc/magnum-capi-helm driver into the magnum charm venv, +configure Magnum to use it, and verify cluster-template creation succeeds. + +## Prerequisites + +- Runbook 04 complete (magnum domain setup done) +- capi-mgmt.maas k3s cluster healthy (CAPI/CAPO/cert-manager/ORC pods Running) +- Per D-007 Layer B, kubeconfig from capi-mgmt.maas accessible + +## TODO + +- [ ] `juju ssh magnum/leader` and `pip install --break-system-packages \ + "git+https://github.com/stackhpc/magnum-capi-helm@v0.13.0"` + into the charm venv +- [ ] Place `/etc/magnum/kubeconfig` pointing at capi-mgmt.maas bootstrap k3s +- [ ] Systemd override for magnum services to load `--config-dir /etc/magnum/magnum.conf.d/` +- [ ] Create `/etc/magnum/magnum.conf.d/99-capi.conf`: + ``` + [DEFAULT] + enabled_drivers = k8s_capi_helm_v1 + + [capi_helm] + kubeconfig_file = /etc/magnum/kubeconfig + ``` +- [ ] Restart magnum-api and magnum-conductor +- [ ] Verify driver loaded: `openstack coe cluster template list` + should show capi_helm_v1 driver option available +- [ ] Smoke test: create a test cluster template + 1-node cluster; + verify it reaches CREATE_COMPLETE diff --git a/runbooks/06-tenant-setup.md b/runbooks/06-tenant-setup.md new file mode 100644 index 0000000..3915229 --- /dev/null +++ b/runbooks/06-tenant-setup.md @@ -0,0 +1,41 @@ +# Runbook 06 — Tenant Resource Recreation + +**STATUS: PLACEHOLDER** — drafted post-deploy. + +## Purpose + +Recreate the standard testcloud tenant resources (domain, project, user, +networks, images, keypairs, flavors) using a proper IPAM-aligned design +per D-010 + D-016 (not the ad-hoc `user1` pattern from the original test +cloud). + +## Prerequisites + +- Cloud fully deployed and validated +- DNS zones populated (Runbook 07 may precede this if Designate-via-tenant + DNS is in scope at tenant create time) +- NetBox IPv4 tenant pool prefix present (per D-016; default `10.20.0.0/16`) + +## TODO + +- [ ] Create domain `domain1` +- [ ] Create project `project1` in domain `domain1` +- [ ] Create user `user1` in project1 (member role + load-balancer_member + role for Octavia) +- [ ] Tenant network with CIDR carved from NetBox IPv4 tenant pool + - Suggested convention: `10.20..0/24` per D-016 + - project1 → `10.20.1.0/24` + - Per D-016 hybrid model, the per-project /24 is Neutron-managed and + NOT added back to NetBox +- [ ] Tenant router connected to ext_net (Provider 10.12.4.0/22) +- [ ] Glance image: noble-amd64 (cloud-init enabled) +- [ ] Flavor m1.small (1 vCPU, 2 GiB RAM, 20 GiB root) +- [ ] Keypair for user1 +- [ ] openrc files: `~/admin-openrc`, `~/user1-openrc` +- [ ] Application credentials for user1 (audit trail) +- [ ] Take second KVM snapshot (per D-012 Snapshot 2) + +## v1 vs. v2 note + +In v1, tenant networks are IPv4-only. v2 adds IPv6 tenant subnets carved +from the v2 IPv6 tenant pool (currently reservation status in NetBox). diff --git a/runbooks/07-dns-zones.md b/runbooks/07-dns-zones.md new file mode 100644 index 0000000..3b780de --- /dev/null +++ b/runbooks/07-dns-zones.md @@ -0,0 +1,36 @@ +# Runbook 07 — Designate Zones and Records (v1: A records only) + +**STATUS: PLACEHOLDER** — drafted post-deploy. + +## Purpose + +Create the cloud's DNS zones in Designate, populate API VIP A records +(v1: IPv4 only), and configure Neutron defaults to push Designate as +tenant DNS resolver. + +## Prerequisites + +- Designate charm in `active/idle` +- Keystone, Neutron API reachable +- API VIP hostnames already in `/etc/hosts` on all OpenStack nodes + (per D-008 Layer 0 bootstrap) + +## TODO + +- [ ] Create primary zone: + `openstack zone create --email admin@neumatrix.local \ + omega.dc0.vr0.cloud.neumatrix.local.` +- [ ] Populate API VIP **A** records for each public service: + - keystone, glance, nova, neutron, cinder, placement, octavia, + barbican, magnum, horizon, designate + - **v1: A records only** (IPv4 VIPs from the Provider API VIP range + 10.12.4.224-.254) + - **v2 will add AAAA records when IPv6 Provider VIPs become active** +- [ ] Configure Neutron defaults: + `juju config neutron-api default-dns-domain=omega.dc0.vr0.cloud.neumatrix.local.` + `juju config neutron-api dns-domain=omega.dc0.vr0.cloud.neumatrix.local.` +- [ ] Configure Neutron DHCP to push Designate as resolver: + `juju config neutron-api dns-servers=` +- [ ] Verify from a test tenant VM: + `nslookup keystone.omega.dc0.vr0.cloud.neumatrix.local` + resolves to Provider API VIP diff --git a/runbooks/08-validate.md b/runbooks/08-validate.md new file mode 100644 index 0000000..7f7acfc --- /dev/null +++ b/runbooks/08-validate.md @@ -0,0 +1,33 @@ +# Runbook 08 — Validation (Roosevelt-Rehearsal Bar) + +**STATUS: PLACEHOLDER** — drafted with scripts/validate.sh. + +## Purpose + +Execute the validation criteria from D-011 and confirm the cloud is ready to +be considered a successful rebuild. + +## Prerequisites + +- All prior runbooks complete + +## Validation criteria (per D-011) + +- [ ] All charms `active/idle` in `juju status` +- [ ] All public API VIPs respond on FQDN from jumphost +- [ ] All public API VIPs respond on FQDN from a tenant VM (Option B path) +- [ ] Octavia LB pattern passes: create LB, two members, round-robin verified, + failover verified, recovery verified +- [ ] Magnum CAPI cluster create end-to-end: cluster template + cluster create, + OCCM does not crash-loop, cluster reaches CREATE_COMPLETE +- [ ] Vault unseal + auto-unseal-after-reboot pattern: reboot vault unit, + confirm auto-unseal via etcd (or manual unseal per HA pattern) +- [ ] Designate resolves API hostnames from tenant subnet +- [ ] Snapshot 1 (post-deploy, pre-tenant) taken (per D-012) +- [ ] Snapshot 2 (post-tenant) taken (per D-012) + +## TODO + +- [ ] Run `scripts/validate.sh` and capture output +- [ ] Document any divergences from validation criteria in + `docs/design-decisions.md` change log diff --git a/scripts/pre-flight-checks.sh b/scripts/pre-flight-checks.sh new file mode 100644 index 0000000..84eefdc --- /dev/null +++ b/scripts/pre-flight-checks.sh @@ -0,0 +1,51 @@ +#!/usr/bin/env bash +# scripts/pre-flight-checks.sh +# +# STATUS: PLACEHOLDER — drafted alongside deploy runbook. +# +# Pre-deploy sanity check. Reads-only; no state changes. Run before +# `juju deploy` to surface issues that would cause the deploy to fail +# during settle. +# +# Exit codes: +# 0 all checks pass +# 1 fatal — do not deploy +# 2 warning — review then decide + +set -euo pipefail + +# Strict mode hardening +shopt -s inherit_errexit 2>/dev/null || true +IFS=$'\n\t' + +FATAL=0 +WARN=0 + +fail() { echo "FAIL: $*" >&2; FATAL=$((FATAL+1)); } +warn() { echo "WARN: $*" >&2; WARN=$((WARN+1)); } +pass() { echo "PASS: $*"; } +note() { echo "NOTE: $*"; } + +# TODO during drafting: +# - Juju controller reachable +# - MAAS API reachable; machines in expected state +# - NetBox reachable; VR0 DC0 prefixes/VLANs present (use --verify-only on imports) +# - jumphost /etc/hosts contains all expected API VIP hostnames +# - All KVM VMs (openstack0-3) reachable and Ready in MAAS +# - capi-mgmt.maas k3s healthy +# - Vault unseal keys present and readable +# - Disk space on /var/lib/libvirt/images sufficient for snapshots +# - bundle.yaml parses as valid YAML +# - overlay parses as valid YAML +# - Channel pins in bundle resolvable on Charmhub + +note "Placeholder pre-flight script — not yet implemented." + +echo +echo "Summary: ${FATAL} fatal, ${WARN} warning" +if [[ $FATAL -gt 0 ]]; then + exit 1 +elif [[ $WARN -gt 0 ]]; then + exit 2 +fi +exit 0 diff --git a/scripts/validate.sh b/scripts/validate.sh new file mode 100644 index 0000000..91bfb95 --- /dev/null +++ b/scripts/validate.sh @@ -0,0 +1,35 @@ +#!/usr/bin/env bash +# scripts/validate.sh +# +# STATUS: PLACEHOLDER — drafted post-deploy. +# +# Roosevelt-rehearsal validation runner per D-011. Executes the validation +# criteria sequentially and produces a structured report. + +set -euo pipefail +shopt -s inherit_errexit 2>/dev/null || true +IFS=$'\n\t' + +FAIL=0 +PASS=0 +SKIP=0 + +result_fail() { echo "FAIL: $*" >&2; FAIL=$((FAIL+1)); } +result_pass() { echo "PASS: $*"; PASS=$((PASS+1)); } +result_skip() { echo "SKIP: $*"; SKIP=$((SKIP+1)); } + +# TODO during drafting: +# - all charms active/idle assertion (juju status --format=json | jq) +# - public API VIP reachability from jumphost (per service hostname) +# - public API VIP reachability from a test tenant VM (Option B verify) +# - Octavia LB pattern test (create -> two members -> round-robin -> failover -> recovery) +# - Magnum CAPI cluster create end-to-end +# - Vault unseal/reseal pattern +# - Designate resolves API hostnames from tenant VM +# - Snapshot 1 + Snapshot 2 existence verified + +echo "Placeholder validate.sh — not yet implemented." + +echo +echo "Summary: ${PASS} pass, ${FAIL} fail, ${SKIP} skip" +[[ $FAIL -gt 0 ]] && exit 1 || exit 0 diff --git a/setup-gitbucket-repo.sh b/setup-gitbucket-repo.sh new file mode 100644 index 0000000..35a6136 --- /dev/null +++ b/setup-gitbucket-repo.sh @@ -0,0 +1,304 @@ +#!/usr/bin/env bash +# setup-gitbucket-repo.sh +# +# Initialize this repository locally and push it to a self-hosted GitBucket +# instance at git.baldurkeep.com (or any GitBucket-compatible host). +# +# Usage: +# ./setup-gitbucket-repo.sh # interactive prompts +# ./setup-gitbucket-repo.sh --dry-run # show what would happen +# +# Environment overrides (skip the prompts when set): +# GITBUCKET_HOST e.g. git.baldurkeep.com +# GITBUCKET_USER GitBucket username +# GITBUCKET_OWNER Repo owner (user or group). Defaults to GITBUCKET_USER. +# GITBUCKET_REPO Repo name. Default: vr0-dc0-caracal +# GITBUCKET_TOKEN API token for creating the repo (if it does not exist yet) +# GIT_USER_NAME Local git author name (e.g. "Jesse Austin") +# GIT_USER_EMAIL Local git author email (e.g. jesse.austin@neumatrix.com) +# GIT_REMOTE_PROTO ssh|https — default https +# GIT_BRANCH default branch name — default main +# +# Idempotency: +# - Detects existing .git directory and skips `git init` +# - Detects existing remote 'origin' and adjusts URL if it differs (with confirmation) +# - Will not push if there are no commits (nothing to push) +# +# What this script does NOT do: +# - Store credentials. You will be prompted by git/SSH for auth at push time. +# - Create groups/organizations on GitBucket — the owner must exist already. +# - Force-push or rewrite history. + +set -euo pipefail +shopt -s inherit_errexit 2>/dev/null || true +IFS=$'\n\t' + +# ----- Helpers -------------------------------------------------------------- + +err() { printf '\033[1;31mERROR\033[0m %s\n' "$*" >&2; } +warn() { printf '\033[1;33mWARN\033[0m %s\n' "$*" >&2; } +info() { printf '\033[1;36mINFO\033[0m %s\n' "$*"; } +ok() { printf '\033[1;32mOK\033[0m %s\n' "$*"; } + +die() { err "$*"; exit 1; } + +prompt() { + # prompt VAR "Question text" "default" + local __var=$1 __q=$2 __default=${3:-} + local __reply + if [[ -n "${!__var:-}" ]]; then + # already set via env; skip + return 0 + fi + if [[ -n "$__default" ]]; then + read -r -p "$__q [$__default]: " __reply || true + __reply=${__reply:-$__default} + else + read -r -p "$__q: " __reply || true + fi + if [[ -z "$__reply" ]]; then + die "Empty response for $__var" + fi + printf -v "$__var" '%s' "$__reply" +} + +prompt_secret() { + local __var=$1 __q=$2 + local __reply + if [[ -n "${!__var:-}" ]]; then + return 0 + fi + read -r -s -p "$__q: " __reply || true + echo + printf -v "$__var" '%s' "$__reply" +} + +confirm() { + local __q=$1 + local __reply + read -r -p "$__q [y/N]: " __reply || true + [[ "${__reply,,}" == "y" || "${__reply,,}" == "yes" ]] +} + +# ----- Argument parsing ----------------------------------------------------- + +DRY_RUN=0 +for arg in "$@"; do + case "$arg" in + --dry-run) DRY_RUN=1 ;; + -h|--help) + sed -n '1,40p' "$0" | grep -E '^#' + exit 0 + ;; + *) die "Unknown argument: $arg" ;; + esac +done + +run() { + # Wrapper that echoes and (in dry-run) skips execution. + # Subshell with IFS=' ' so $* joins args with a space for display only. + (IFS=' '; printf '\033[1;90m+ %s\033[0m\n' "$*") + if [[ "$DRY_RUN" -eq 0 ]]; then + "$@" + fi +} + +# ----- Repo root sanity ----------------------------------------------------- + +SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &> /dev/null && pwd)" +cd "$SCRIPT_DIR" + +# Sanity check that we're in the expected repo structure. +for required in README.md bundle.yaml runbooks docs netbox; do + [[ -e "$required" ]] || die "Required path '$required' missing — are you in the repo root?" +done + +ok "Running from repo root: $SCRIPT_DIR" + +# ----- Tool checks ---------------------------------------------------------- + +command -v git >/dev/null || die "git not installed" +command -v curl >/dev/null || warn "curl not installed — repo auto-create on GitBucket disabled" + +# ----- Gather configuration ------------------------------------------------- + +prompt GITBUCKET_HOST "GitBucket host" "git.baldurkeep.com" +prompt GITBUCKET_USER "GitBucket username" +prompt GITBUCKET_OWNER "Repo owner (user or group)" "$GITBUCKET_USER" +prompt GITBUCKET_REPO "Repo name" "openstack-caracal-ipv4" +prompt GIT_USER_NAME "Git author name" +prompt GIT_USER_EMAIL "Git author email" +prompt GIT_REMOTE_PROTO "Remote protocol (ssh|https)" "https" +prompt GIT_BRANCH "Default branch" "main" + +case "$GIT_REMOTE_PROTO" in + ssh) + REMOTE_URL="git@${GITBUCKET_HOST}:${GITBUCKET_OWNER}/${GITBUCKET_REPO}.git" + ;; + https) + REMOTE_URL="https://${GITBUCKET_HOST}/git/${GITBUCKET_OWNER}/${GITBUCKET_REPO}.git" + ;; + *) + die "GIT_REMOTE_PROTO must be 'ssh' or 'https' (got: $GIT_REMOTE_PROTO)" + ;; +esac + +cat < +Remote URL : $REMOTE_URL +Default branch : $GIT_BRANCH +Dry-run : $([[ $DRY_RUN -eq 1 ]] && echo YES || echo no) +---------------------------------------------------------------------------- +EOF + +confirm "Proceed?" || die "Aborted by user" + +# ----- Create repo on GitBucket via API (if token provided) ----------------- + +GITBUCKET_API_BASE="https://${GITBUCKET_HOST}/api/v3" +REPO_API_URL="${GITBUCKET_API_BASE}/repos/${GITBUCKET_OWNER}/${GITBUCKET_REPO}" + +create_repo_via_api() { + if [[ -z "${GITBUCKET_TOKEN:-}" ]]; then + info "GITBUCKET_TOKEN not set — skipping API repo creation" + info "If the repo does not exist on GitBucket, create it manually now:" + info " https://${GITBUCKET_HOST}/${GITBUCKET_OWNER}" + info " → New repository → name: ${GITBUCKET_REPO}, do NOT initialize with README" + return 0 + fi + + if ! command -v curl >/dev/null; then + warn "curl missing; cannot call API" + return 0 + fi + + info "Checking if repo already exists on GitBucket..." + local http_code + http_code=$(curl -sS -o /dev/null -w '%{http_code}' \ + -H "Authorization: token ${GITBUCKET_TOKEN}" \ + "$REPO_API_URL" || true) + + case "$http_code" in + 200) + ok "Repo ${GITBUCKET_OWNER}/${GITBUCKET_REPO} already exists on GitBucket" + return 0 + ;; + 404) + info "Repo does not exist — creating via API" + ;; + *) + warn "Unexpected API response code: $http_code (continuing)" + return 0 + ;; + esac + + local create_url + if [[ "$GITBUCKET_OWNER" == "$GITBUCKET_USER" ]]; then + # User-owned repo + create_url="${GITBUCKET_API_BASE}/user/repos" + else + # Group-owned repo + create_url="${GITBUCKET_API_BASE}/orgs/${GITBUCKET_OWNER}/repos" + fi + + local payload + payload=$(printf '{"name":"%s","description":"%s","private":true,"auto_init":false}' \ + "$GITBUCKET_REPO" \ + "Charmed OpenStack Caracal 2024.1 — IPv4-only testcloud deployment (VR0 DC0 v1). IPv6/dual-stack tracked separately as v2.") + + if [[ "$DRY_RUN" -eq 1 ]]; then + info "[dry-run] would POST to $create_url with payload: $payload" + return 0 + fi + + http_code=$(curl -sS -o /tmp/gitbucket-create.json -w '%{http_code}' \ + -X POST \ + -H "Authorization: token ${GITBUCKET_TOKEN}" \ + -H "Content-Type: application/json" \ + -d "$payload" \ + "$create_url" || true) + + case "$http_code" in + 200|201) + ok "Repo created: ${GITBUCKET_OWNER}/${GITBUCKET_REPO}" + ;; + *) + err "Repo creation failed (HTTP $http_code). Response:" + cat /tmp/gitbucket-create.json >&2 || true + die "Aborting before git operations" + ;; + esac +} + +create_repo_via_api + +# ----- Git init ------------------------------------------------------------- + +if [[ -d .git ]]; then + info ".git exists — skipping git init" +else + run git init -b "$GIT_BRANCH" +fi + +run git config user.name "$GIT_USER_NAME" +run git config user.email "$GIT_USER_EMAIL" + +# ----- Remote setup --------------------------------------------------------- + +if git remote get-url origin >/dev/null 2>&1; then + EXISTING_URL=$(git remote get-url origin) + if [[ "$EXISTING_URL" != "$REMOTE_URL" ]]; then + warn "Existing 'origin' remote URL: $EXISTING_URL" + warn "Desired URL: $REMOTE_URL" + if confirm "Update remote URL?"; then + run git remote set-url origin "$REMOTE_URL" + fi + else + ok "Remote 'origin' already set correctly" + fi +else + run git remote add origin "$REMOTE_URL" +fi + +# ----- Stage + commit ------------------------------------------------------- + +run git add . + +if [[ "$DRY_RUN" -eq 0 ]]; then + if git diff --staged --quiet; then + info "No staged changes — nothing to commit" + else + if [[ -z "$(git log --oneline -1 2>/dev/null || true)" ]]; then + MSG="Initial commit — VR0 DC0 Omega Cloud Caracal repo scaffolding" + else + MSG="Update repo content" + fi + run git commit -m "$MSG" + fi +else + info "[dry-run] would commit (skipping)" +fi + +# ----- Push ---------------------------------------------------------------- + +if [[ "$DRY_RUN" -eq 1 ]]; then + info "[dry-run] would push to origin/$GIT_BRANCH" + exit 0 +fi + +if [[ -z "$(git log --oneline -1 2>/dev/null || true)" ]]; then + info "No commits to push" + exit 0 +fi + +info "Pushing to origin/$GIT_BRANCH (you may be prompted for credentials)..." +run git push -u origin "$GIT_BRANCH" + +ok "Push complete" +ok "Repository ready at: https://${GITBUCKET_HOST}/${GITBUCKET_OWNER}/${GITBUCKET_REPO}"