# Change log -- 2026-07-02/03 redeploy-readiness + process-hardening patchset

CUMULATIVE: this covers BOTH work blocks (the DOCFIX-066..072 sweep patches and
the process-improvement build). One ZIP, extract over repo root. Every item
lists its revert. Executed under blanket approval; nothing here touched live
infrastructure -- all changes are repo content. Validation state at packaging:
merged-tree repo lint PASS (1 documented WARN), bundle gate PASS, four test
harnesses 33/33 cases, all python compiles. Identifier numbering verified
next-free at HEAD 690779a; RE-GREP at commit (per discipline).

================================================================================
## Block 1 -- redeploy-readiness fixes (DOCFIX-066..072)
================================================================================

### 1. DOCFIX-066 -- teardown runbook rebuilt around the D-061 fork
FILE: runbooks/phase-00-teardown-maas-reset.md (REWRITE)
WHY: old spine drove the DEPRECATED phase-00-teardown.sh ("releases to Ready"
  -- the premise that decomposed machines 3x). New spine: destroy path as the
  validated redeploy route (with the previously MISSING reenroll step), release
  path documented with the honest re-acquisition caveat, all invocations
  bash-prefixed.
REVERT: git checkout HEAD~ -- runbooks/phase-00-teardown-maas-reset.md
  (reverting re-arms the decompose trap; recommend not).

### 2. DOCFIX-067 -- octavia PKI SAN derived, not baked
FILE: runbooks/phase-01-bundle-deploy.md (CNF block + one prose line)
WHY: IP.1 was the pre-R14 literal 10.12.4.233; current VIP is .57. Now derived
  from the bundle at generation time, echoed, regex-gated. VERIFY-LIVE still
  queued: read the DEPLOYED cert SAN before deciding whether any live action
  is ever needed (none recommended -- regenerates next redeploy).
REVERT: restore the literal block from git history.

### 3. DOCFIX-068 -- phase-01 constants de-staled
FILE: runbooks/phase-01-bundle-deploy.md ("Constants and env-literals")
WHY: carried the pre-D-052 plane map (incl. retired lbaas), hardcoded MAAS
  subnet ids (violates PATTERN-1) and system_ids (violates DOCFIX-040). Now
  points at lib-net.sh / lib-hosts.sh; only verified-current literals remain.
REVERT: git history (not recommended; values were wrong).

### 4. DOCFIX-069 -- exec-bit reality encoded
FILES: phase-00 runbook (folded into #1); apply-notes optional
  `git update-index --chmod=+x scripts/*.sh`.
WHY: all 37 scripts are mode 100644 (Windows commit path); bare invocation
  fails on a fresh clone. bash-prefix is the durable invariant; repo-lint L6
  now guards regressions.
REVERT: n/a (wording only) -- to reject the rule, remove lint check L6.

### 5. DOCFIX-070 -- one bundle gate instead of two disagreeing ones
FILES: scripts/provider-bundle-check.py (checks 4-7 added);
  scripts/review-bundle.py (git rm -- manual action, see apply notes);
  tests/provider-bundle-check/ (NEW: 8-case harness).
WHY: review-bundle.py was pre-D-052; FAIL=71 pure noise against HEAD (alarm
  fatigue). Its still-valid checks (relation syntax/existence) are absorbed;
  added: D-062 unit count, VIP-octet uniqueness, DOCFIX-071 policy wiring +
  zip/source content drift guard.
REVERT: git checkout the old provider-bundle-check.py; git restore
  review-bundle.py. (Harness T-cases will then fail -- delete tests/ dir too.)

### 6. DOCFIX-071 -- keystone policy ships IN the bundle
FILES: bundle.yaml (keystone resources: stanza, +6 lines);
  policies/overrides.zip (NEW, binary; built from committed source);
  .gitattributes (+ *.zip binary); runbooks/phase-03-core-verify.md
  (NEW Step 3.4: PO: gate + behavioral G3 gate);
  runbooks/appendix-C-identity-rbac.md (attach block: subshell-wrapped,
  /tmp -> repo path [snap trap], reframed as live-update path).
WHY: the D-064 attach was unreachable from the schedule -- a clean redeploy
  shipped WITHOUT the SCS Domain Manager RBAC; the only written procedure used
  the snap-blocked /tmp form.
REVERT: remove the 6 bundle lines + the zip; restore appendix-C block from
  history. (Reverting re-opens the blocker; the phase-03 gate would then FAIL
  by design -- that is the guard working.)

### 7. DOCFIX-072 / D-043 -- decision brought level with the bundle
FILE: docs/design-decisions.md (D-043 RESOLVED -> ADOPTED option (a)).
WHY: bundle set resume-guests-state-on-host-boot=True while D-043 was still
  PROPOSED. Adopted: auto-resume is the tenant-VM norm; D-041 unchanged for
  control-plane. JUDGMENT CALL FLAGGED EARLIER: capi-mgmt-v2 WILL auto-resume;
  manual-start policy now governs deliberate stops only. If you want a real
  exclusion, that needs a mechanism -- say so and I will draft options.
REVERT: append a reversal entry (append-only doc) + remove the bundle option.

### 8. D-doc alignment amendments (appended, append-only discipline)
FILE: docs/design-decisions.md -- D-002 (channel matrix reconciled: etcd/easyrsa
  dead, memcached latest-only, vault 1.16 per D-068), D-009 (rabbitmq
  min-cluster-size Roosevelt delta), D-011 (item 6 = manual unseal, matching
  phase-08), D-051 (bundle-native delivery), D-061 (validation scope:
  survival vs re-acquisition; destroy = validated spine).
REVERT: each is a discrete appended section -- delete the section.

================================================================================
## Block 2 -- process improvements (DOCFIX-073..078, D-069, D-070)
================================================================================

### 9. DOCFIX-073 -- preflight.sh: THE single pre-deploy gate
FILES: scripts/preflight.sh (NEW), scripts/channel_assert.py (NEW),
  tests/preflight/ (NEW, 7 cases), runbooks/phase-01-bundle-deploy.md
  (prerequisites GATE added -- pre-flight was previously invoked by NO runbook).
WHY: gate surface was six artifacts remembered independently. Orchestrates
  repo-lint -> bundle gate -> channel assert -> live pre-flight, worst-exit
  aggregation, stage-2 reminders. channel_assert fulfils D-002's "verify
  against Charmhub each deploy" claim (previously unimplemented): every pinned
  channel must exist on Charmhub; typo/retired-track = FAIL; offline = WARN.
REVERT: delete the three new paths + the phase-01 GATE paragraph.

### 10. DOCFIX-074 -- repo-lint: the drift sweep, productized
FILES: scripts/repo_lint.py + scripts/repo-lint.sh (NEW),
  tests/repo-lint/ (NEW, 9 cases). Plus SANITIZE remediation of every ASCII-rule
  violation it found: docs/v1-pre-deploy-fixes.md, docs/netbox-vip-queue.md,
  .gitignore, netbox/README.md, netbox/ipv4-prefixes-import.py,
  netbox/ipv6-mark-reserved.py (punctuation transliteration only; python files
  re-compiled OK). Plus two catches it made against MY OWN morning patch and
  bundle HEAD, both fixed: a missed .233 prose line in phase-01; a stale
  storage/replication CIDR comment in bundle.yaml.
CHECKS: L1 encoding, L2 stale tokens (with explicit per-file opt-out marker for
  guard scripts), L3 ghost script refs, L4 deprecated refs, L5 numbering
  collisions + next-free report, L6 bare invocations.
REVERT: delete scripts + tests; restore pre-sanitize files from history.

### 11. DOCFIX-075 -- cloud-assert.sh: the behavioral verifier + BOM capture
FILES: scripts/cloud-assert.sh (NEW), tests/cloud-assert/ (NEW, 9 fakebin
  cases), runbooks/ops-restart-procedure.md (NEW -- the restart procedure,
  previously operator-local, adapted to current references with [REVALIDATE]
  markers and committed).
WHY: the D-045/D-046/D-051/D-042 family = "juju green, service broken". One
  idempotent read-only sweep of every service-own-verdict gate (vault seal,
  mysql 1xR/W, OVN unity, chassis, hypervisors, LBs, PO:, trustee domain, coe
  403, conductor LIVE args). Missing admin scope = HELD exit 2, never a silent
  pass. --capture writes a committed asbuilt/<date>/ BOM (Roosevelt drift
  baseline). Replaces the never-committed post-maintenance-health-check.sh.
REVERT: delete the three paths.

### 12. DOCFIX-076 -- as-executed log convention
FILES: scripts/run-logged.sh (NEW), docs/as-executed-log-convention.md (NEW),
  logs/as-executed-index.md (NEW, committed index; content stays jumphost-only).
WHY: the verbatim-retrieval rule for one-shot steps depended on an artifact
  with no defined location/format (DOCFIX-006 is the cautionary tale).
REVERT: delete the three paths.

### 13. DOCFIX-077 -- appendix-A completions
FILE: runbooks/appendix-A-troubleshooting.md (two appended sections):
  mysql-innodb-cluster recovery (D-062 signatures + the destructive-action
  guard) and the point-of-use identifier index (DOCFIX-027/028/029/034/037,
  BUNDLEFIX-001..006) closing the dangling-reference gap.
REVERT: delete the two appended sections.

### 14. DOCFIX-078 -- security exposure ledger
FILE: docs/security-ledger.md (NEW). Seeded: SEC-001 libvirt credential
  exposure (was living only in a script header), SEC-002 juju action-log
  token rule, SEC-003 vault custody, SEC-004 repo-public flag.
REVERT: delete the file.

### 15. D-069 -- vault unseal-key custody (ADOPTED, policy)
FILE: docs/design-decisions.md (appended).
WHAT: split custody (no individual holds threshold), second-person unseal
  rehearsal as an acceptance item, custody re-cut review at every re-init.
  Custodian ASSIGNMENT deliberately left as operator input (SEC-003).
REVERT: append a reversal entry.

### 16. D-070 -- supersedes D-012 (no KVM snapshot restore path)
FILE: docs/design-decisions.md (appended).
WHAT: D-012 was never exercised (no virsh snapshot step exists anywhere);
  rebuild-from-runbooks (D-017/D-018) is THE restore path; baseline-capture
  role moves to cloud-assert --capture. Counter-argument recorded in the entry
  (mid-rehearsal rollback convenience) with the revisit condition.
REVERT: append a reversal entry (this one is the most opinion-weighted item
  in the set -- if you disagree anywhere, it is probably here).

================================================================================
## Manual git actions (cannot ship in a zip)
================================================================================
  git rm scripts/review-bundle.py          # DOCFIX-070
  git rm scripts/phase-00-teardown.sh      # deprecated; runbook no longer
                                           # references it by path (repo-lint
                                           # L3/L4 assume it gone)
  # optional: git update-index --chmod=+x scripts/*.sh

## Pre-commit verify (jumphost or Windows+python3)
  python3 scripts/repo_lint.py .                       # expect: 0 fail, 1 legacy WARN
  python3 scripts/provider-bundle-check.py bundle.yaml # expect: PASS, 6 [ok]
  for t in tests/*/run-tests.sh; do bash "$t"; done    # expect: 4x ALL PASS (33 cases)

## Still queued (unchanged by this patchset)
  Verify-live: octavia deployed-cert SAN read; `juju info` channel probe on the
  jumphost (channel_assert now automates it at preflight); second-person unseal
  rehearsal (D-069). Operator inputs: vault custodian assignment (SEC-003);
  ruling if a real capi-mgmt auto-resume exclusion is wanted (item 7).

================================================================================
## Block 3 -- code hardening sweep (all repo scripts + harness estate)
================================================================================
Method: automated audit of every script and runbook paste block against the
house hardening rules (SIGPIPE, capture-die, inner-ssh stdin, /tmp-snap, gawk,
sed -i, column-order, juju-run capture, bare-exit-in-paste), then MANUAL
verification of every hit before any change. Audit honesty notes: H3 (inner
ssh) and H7 (paste-block exit) closed at ZERO true positives -- the rc/rcap/J
wrapper convention and the subshell/remote-quote discipline held everywhere;
H2/H4/H6/H8 clean.

### 17. DOCFIX-079 -- seven SIGPIPE (H1) fixes: capture-then-test conversions
FILES: scripts/phase-02-vault-preflight.sh, phase-03-core-verify.sh (model-
  presence gate: race read a PRESENT model as absent -> false HOLD);
  phase-06-bootstrap.sh (role verify-first: race re-ran grants);
  phase-06-net-setup.sh (SG-rule verify-first: race -> duplicate create -> 409
  aborts the run under set -e); phase-07-conductor-graft.sh + its runbook twin
  (helm version check: converted to substring param-expansion, no pipe at all);
  tenant-onboard.sh x2 -- the serious pair: the manager-grant verify could
  FALSE-DIE on a successful grant, and the duplicate-CIDR guard FAILED OPEN
  (match -> SIGPIPE 141 -> '&& die' skipped -> colliding subnet proceeds).
NEW TEST: tests/tenant-onboard/run-tests.sh proves the CIDR guard fires on a
  collision and stays quiet on a clear CIDR.
REVERT: each is a commented, anchored block -- git history per file.

### 18. DOCFIX-080 -- tenant-onboard image-ID resolution off column-order luck
FILE: scripts/tenant-onboard.sh (2 sites) + a jq presence gate added.
WHY: `-f value -c ID -c Name | awk '{print $1}'` rode alphabetical column
  ordering (ID<Name today). Converted to `-f json | jq` per the house rule.
  Display-only multi-column -f value uses elsewhere (phase-05/06 confirms,
  tenant-acceptance echo) were left as-is deliberately -- output is read by
  eyes, not parsed; converting them changes operator-facing format for no
  robustness gain. ADVISORY: do not parse those lines positionally later.

### 19. DOCFIX-081 -- cert-san verifier: dead SKIP branch / silent non-TLS drop
FILE: scripts/phase-04-internal-cert-san-verify.sh.
WHY: a jq https-only pre-filter made the non-TLS SKIP branch unreachable --
  an unexpectedly plain-HTTP internal endpoint was silently hidden, which is
  precisely a finding the operator must see. Feed now carries ALL internal
  endpoints; the existing SKIP line reports non-TLS visibly. This also un-
  breaks tests/phase-04-internal-cert-san (the harness encoded the correct
  behavior; the script had regressed under it).
REVERT: restore the jq select() -- not recommended.

### 20. NEW -- tests/phase-00-teardown-d061: harness for the D-061 pair
The most destructive scripts in the repo (teardown-release/-destroy) shipped
with NO tests despite a --no-prompt flag documented "tested automation only".
Stateful fakebin (maas machine state advances on remove/destroy; instant
sleep): 8 cases incl. canary-stops-after-one, DECOMPOSE-detection fails loud
and blocks destroy-model, substrate-sid collision aborts pre-mutation,
--release-storage vs --destroy-storage flag assertions, orphan deletion.
REVERT: delete the test dir (coverage loss only).

### 21. Harness-estate findings (pre-existing at HEAD; verified against pristine)
RETIRE (manual git actions -- harnesses for retired scripts test nothing):
  git rm -r tests/phase-00-teardown        # script git rm'd this patchset;
                                           # replaced by tests/phase-00-teardown-d061
  git rm -r tests/provider-vip-standup     # script already gone at HEAD
  git rm -r tests/phase-00-maas-recidr     # script already gone at HEAD
REPAIR SPECS (stale-but-live; harnesses still test the D-058 world their
scripts left at D-060 -- red-at-HEAD today, which trains ignoring red):
  - tests/phase-00-maas-standup: regenerate make_fixtures.py fixtures and the
    WOULD/SKIP/DRIFT expectation strings for the D-052/D-053 six-plane table
    (source of truth: scripts/lib-net.sh) -- drop provider-vip/VID-104/
    10.12.20.0/22 expectations; fresh-cloud case expects the six-plane create
    plan + current reserved ranges; drift cases keyed to the current wording
    ("occupied by the wrong plane", "re-CIDR/migration").
  - tests/carve-host-interfaces: rebuild fix/*.json interface trees and
    expectations to Pattern A (br-ex OVS on enp1s0, br-metal/.103/br-internal,
    raw data/storage/replication statics; metal-internal fixtures must carry
    VID 103) -- drop enp1s0.104 / br-prov-api expectations.
  Process lesson encoded for the future: a script migration commit MUST carry
  its harness (these went stale in the D-060 revert commit).

### 22. Audit-clean attestations (for the record)
Runbook paste blocks: every `exit` verified nested in a subshell or a remote-
quoted command -- zero operator-shell-killing exits. Inner ssh/juju-ssh: all
argv-style invocations route through </dev/null-appending wrappers or terminate
with </dev/null; heredoc-payload ssh (bash -s) correctly exempt. No /tmp-snap,
gawk-ism, unasserted sed -i, or capture-die instances anywhere.

================================================================================
## Block 4 -- gauntlet runner, harness-estate completion, skill source
================================================================================

### 23. NEW -- scripts/run-tests-all.sh (the tooling gauntlet)
One command runs every tests/*/run-tests.sh: summary line each, worst exit,
optional substring filter, exit 2 on no match. Deliberately NOT folded into
preflight.sh: preflight gates the DEPLOY TARGET, this gates the TOOLS --
different cadence, different failure meaning (run it after any script/harness
change per the change-delivery loop). REVERT: delete the file.

### 24. tests/phase-00-maas-standup -- rewritten to the D-052/D-053 six-plane scheme
make_fixtures.py + run-tests.sh regenerated from the CURRENT script vocabulary
and lib-net plane table (was: D-058 provider-vip/VID-104/10.12.20 expectations,
red at HEAD). Four scenarios: fresh full create plan (asserts the retired
provider-vip can never reappear), all-SKIP done, wrong-plane DRIFT + refuse,
wrong-VID drift. REVERT: git history (restores a red harness).

### 25. tests/carve-host-interfaces -- rewritten to Pattern A (D-060)
Fixture generator (new make_fixtures.py) + expectations rebuilt: fresh host
full Pattern-A plan (uplink L3 unlink -> br-ex OVS + static; br-metal/.103/
br-internal chain; raw data/storage/replication statics; enp11s0 idle),
missing-subnet and wrong-VID hard gates, all-SKIP idempotent re-run; forbids
br-prov-api/enp1s0.104/provider-vip forever. REVERT: git history (red harness).

### 26. scripts/validate.sh -- placeholder TODO aligned to amended decisions
The D-011 runner's drafting list still targeted DROPPED features (Designate
resolution -- D-019; snapshot existence -- D-012, superseded by D-070) and
predated the manual-unseal ruling. List now points at the existing building
blocks (cloud-assert / tenant-acceptance / run-tests-all) and the genuinely
missing items, so the eventual implementation is built to the real spec.

### 27. skills/openstack-cloud-ops/ -- unpacked skill SOURCE committed
The repo carried only the packaged .skill zip (opaque to grep/diff/repo-lint).
The unpacked SKILL.md + references/ (v1.2) now live beside it as the reviewable
source of truth; regenerate the .skill from this folder on any change (or drop
the zip -- operator's call). Divergence rule applies: repo source wins over any
installed copy.

### Review attestations (Block 4 sweep)
Runbook->script contract cross-check: CLEAN (both automated hits verified
false -- reenroll --check and carve --apply exist under parser styles the
checker missed). Loop-efficiency scan: clean (flagged sites are watchers/polls
by design). Exit-code audit: clean (tenant-acceptance's 11-14 are a declared
per-phase contract). deploy-watch, maas-fabric-prune, juju-spaces-check,
osd-blank-check: read; disciplined; no findings.
