How work is executed on this cloud. These conventions exist because the runbooks and scripts ARE the product: a fix that works but is undocumented, unrepeatable, or Roosevelt-incompatible is a failure.
Every command block is labeled so a command line is never mistaken for prose, and so mutation risk is explicit (same convention as the repo runbooks):
RUN -- <loc> - the block CHANGES state; run it at (jumphost, a unit via juju ssh, the mgmt VM...).CHECK (read-only) -- <loc> - safe to re-run.GATE: - hard stop. Do not proceed unless the stated condition holds.Expect: - what a passing result looks like. Always state it: the operator should never have to guess whether output is good.> CAUTION: - destructive, secret-handling, or irreversible.Sequence discipline: read-only audit -> present the mutation -> operator approves/runs -> verify the result -> next step. One gated mutation at a time; never batch destructive steps. If output comes back unexpected, STOP and re-derive from the live state - do not improvise a correction inline.
Chat (no shell): you prepare blocks, the operator runs and pastes output. Treat pasted output as the only evidence; a block you wrote but saw no output for did not happen.
Live shell: you may execute CHECK (read-only) blocks yourself. RUN and > CAUTION: blocks still get presented and human-approved first - state what will change and why it is the minimal correct action.
bash scripts/run-logged.sh <label> (opens a script(1)-logged shell to ~/as-executed/; index the session in logs/as-executed-index.md -- content NEVER commits, the index always does).vault operator init loses the unseal keys forever.)umask 077; unseal via hidden prompt (L4); transfer via base64 pipe into a root-written 0600 file (L-P6-4). Never echo a secret to verify it - verify by length/format from the file.maas list (prints the API key - DOCFIX-016). Never trust a juju action's human-formatted output for a captured secret or cert - pull from --format json (indented YAML block-scalars corrupt PEMs; DOCFIX-021).juju run persists action params in the operation log (DOCFIX-011).D-NNN - design decisions (docs/design-decisions.md). Append-only: superseded entries stay in place, marked, with the superseding entry appended. DOCFIX-NNN - runbook fixes. BUNDLEFIX-NNN - bundle fixes.scripts/ledger-scan.sh derives next-free from decision HEADERS (not prose mentions -- a "next-free D-072" pointer must not inflate the count). Keep "Next-free:" pointer lines on ONE line (a word-wrapped continuation escapes the line-based exclusion); or just rely on the scan, which is the next-free authority.grep -nP '[^\x00-\x7F]' (or a Python byte read); CR bytes with a Python data.count(b"\x0d") - a grep $'\r' false-positives on $r... tokens. Non-ASCII in OpenStack config has caused silent daemon failures (mod_wsgi UnicodeDecodeError).juju ssh goes as a base64 pipe decoded to a file, then the FILE is executed - never a raw heredoc (paste-mangling and stdin-consumption both bite; see script-authoring on read vs pipes).Windows-side steps are PowerShell-native (no bash heredocs, no backslash continuations). The operator commits from Windows; the jumphost only pulls.
Credential exposures and security obligations get a ROW in docs/security-ledger.md at discovery time (owner + status) - never only a script-comment note; the ledger is reviewed at every phase-00 and handoff.
Long sessions on this cloud get COMPACTED, often several times. Work that lives only in chat scrollback is lost at compaction. docs/session-ledger.md is the durable record of what is IN FLIGHT (active build, backlog, operator-gated items, open decisions, numbering state, state facts). Standing practice:
scripts/ledger-scan.sh at session start; reconcile.When tightening a security boundary (SG rules, firewall, policy) that a live flow depends on, NEVER remove the permissive rule before the specific replacements are proven present. Create the narrow rules, READ THEM BACK (fail if any is missing), and only THEN remove the wide rule by its MEASURED id. There must be no window in which a required flow (e.g. the magnum conductor -> mgmt apiserver poll) is unauthorized. Verify the tightened boundary by FUNCTIONAL outcome afterwards, not by assuming the rule text is sufficient.
Some acceptance items cannot be scripted -- e.g. a SECOND person performing the Vault unseal (D-069). Never auto-pass such an item. Gate it on a ledger ROW (docs/security-ledger.md): report PASS_PENDING_MANUAL while the row is OPEN, PASS only when it is attested CLOSED/rehearsed, HOLD if the row is missing. Auto-passing an undone safety item is the exact failure D-070 retired.
Before any server-side hypothesis for "X used to work, now it doesn't" on a web UI: eliminate client state first (incognito window, ~10 seconds), then a server-side curl, THEN hypothesize. Before any command that acts on tenant resources: confirm which project/domain scope the shell holds (openstack token issue-level certainty) - server create and friends use ambient scope silently. Full triage method: references/troubleshooting.md.