name: openstack-cloud-ops
Operating skill for the Omega Cloud: a commercial, multi-tenant, tenant self-administered OpenStack cloud. Current phase: single-DC virtual rehearsal ("testcloud", VR0 DC0) on four KVM hosts, rehearsing a future bare-metal multi-datacenter deployment ("Roosevelt"). The governing design constraint is MINIMIZE DELTA TO ROOSEVELT: the runbooks and scripts are primary deliverables alongside the running cloud, so transferable answers beat quick fixes.
The repository openstack-caracal-ipv4 (GitBucket, git.baldurkeep.com) is authoritative for everything: bundle, runbooks, scripts, design decisions, as-built values. This skill is a discipline-and-routing layer OVER that repo, not a substitute for it.
~/openstack-caracal-ipv4, a repo dir in the working tree, /home/claude/repo). If found, git log -1 to note HEAD and work from it.https://git.baldurkeep.com/git/OpenStack/openstack-caracal-ipv4.git). The repo may be private; if the clone fails, ask the operator to provide access or the relevant files.Divergence rule: if this skill and repo HEAD disagree, the repo wins - but FLAG the divergence to the operator rather than silently following either. The repo is a living draft; this skill's invariants (discipline, hardening) change slowly, its facts (IPs, versions, phase status) go stale fast.
vopenstack-jesse or similar): you may RUN read-only audits directly. Every mutation remains individually human-gated - present the command, state what it changes, wait for approval. A live shell relaxes the transport, never the discipline.Read references/operating-discipline.md before doing either.
scripts/lib-net.sh, lib-hosts.sh), keyed by stable identity (CIDR, hostname) - never by drifting IDs.Corollary that governs everything: verify before mutate. A read-only audit precedes every mutation; destructive and secret-handling steps are gated individually, never batched.
| Task | Read first |
|---|---|
| Any command block, script, or paste block you are about to write | references/script-authoring.md |
| Deploy / redeploy / teardown | repo runbooks/README.md, then the phase-NN runbook; conventions in references/operating-discipline.md |
| Something is broken (triage, incidents) | references/troubleshooting.md, then repo runbooks/appendix-A-troubleshooting.md |
| CAPI / Magnum / mgmt-VM recovery | repo runbooks/ops-capi-recovery.md |
| Deliver ANY repo change (script, runbook, doc) | run bash scripts/repo-lint.sh + the touched script's tests/<name>/run-tests.sh BEFORE handing it over |
| Pre-deploy gate (before add-model) | bash scripts/preflight.sh -- THE single entry; do not run the sub-gates piecemeal |
| Is the cloud actually healthy? (post-deploy, post-restart, pre-change baseline, incident) | bash scripts/cloud-assert.sh (add --capture at deploy completion for the committed BOM) |
| Full-cloud restart after outage/maintenance | repo runbooks/ops-restart-procedure.md |
| Starting any consequential live session | bash scripts/run-logged.sh <label> first (as-executed log; docs/as-executed-log-convention.md) |
| Credential exposures / security TODOs | repo docs/security-ledger.md -- add a row, never only a script comment |
| Tenant onboarding / tenant self-service | repo scripts/tenant-onboard.sh + runbooks/tenant-onboarding-v2-DRAFT.md + appendix-C/D |
| Network / plane / IPAM questions | references/environment.md, repo scripts/lib-net.sh, NetBox (the IPAM apex) |
| ANY change request to a built surface | grep repo docs/design-decisions.md for the governing D-NNN FIRST - PROPOSED/OPEN means the operator has not ruled: present options, do not implement |
| Why is it built this way? / proposing changes | repo docs/design-decisions.md (D-NNN); grep before assigning a new number |
| Versions, channels, pins | repo runbooks/appendix-B-asbuilt-version-lock.md |
| Environment facts (hosts, repo, planes) | references/environment.md |
Session bootstrap (jumphost): git -C ~/openstack-caracal-ipv4 pull -> bash scripts/repo-lint.sh (0 fail expected) -> if touching the live cloud, bash scripts/run-logged.sh <label> to open the logged shell. Repo HEAD and a clean lint are the preconditions for everything else.
Change-delivery loop: grep for prior art (zeroth decision) -> grep design-decisions for the governing D-NNN -> edit -> bash scripts/repo-lint.sh -> run the touched script's harness (create one if missing -- no script change ships without its harness) -> deliver as repo-relative ZIP + a changelog entry with a per-item revert. Under blanket approval, the changelog IS the review surface: every item states what, why (evidence), and how to revert.
Deploy loop: phase-00 runbook (D-061 destroy path) -> bash scripts/preflight.sh PASS -> phase-01..08 gated -> bash scripts/cloud-assert.sh --capture -> commit the asbuilt/ BOM.
Incident loop: capture the verbatim error -> bash scripts/cloud-assert.sh (the service-own-verdict sweep localizes the layer) -> appendix-A by exact message -> recorded fix, gated -> log the finding (new root causes become appendix-A/DOCFIX material).