Operating discipline

How work is executed on this cloud. These conventions exist because the runbooks and scripts ARE the product: a fix that works but is undocumented, unrepeatable, or Roosevelt-incompatible is a failure.

The gated execution model

Every command block is labeled so a command line is never mistaken for prose, and so mutation risk is explicit (same convention as the repo runbooks):

RUN -- <loc> - the block CHANGES state; run it at (jumphost, a unit via juju ssh, the mgmt VM...).
CHECK (read-only) -- <loc> - safe to re-run.
GATE: - hard stop. Do not proceed unless the stated condition holds.
Expect: - what a passing result looks like. Always state it: the operator should never have to guess whether output is good.
> CAUTION: - destructive, secret-handling, or irreversible.

Sequence discipline: read-only audit -> present the mutation -> operator approves/runs -> verify the result -> next step. One gated mutation at a time; never batch destructive steps. If output comes back unexpected, STOP and re-derive from the live state - do not improvise a correction inline.

Chat (no shell): you prepare blocks, the operator runs and pastes output. Treat pasted output as the only evidence; a block you wrote but saw no output for did not happen.

Live shell: you may execute CHECK (read-only) blocks yourself. RUN and > CAUTION: blocks still get presented and human-approved first - state what will change and why it is the minimal correct action.

Irreversible / one-shot / secret steps

Start every consequential session with bash scripts/run-logged.sh <label> (opens a script(1)-logged shell to ~/as-executed/; index the session in logs/as-executed-index.md -- content NEVER commits, the index always does).
Retrieve the exact prior working command from the as-executed log or runbook VERBATIM. Never improvise vault init, unseal, authorize-charm, CA issuance, or anything one-shot. (DOCFIX-006: a mis-redirected vault operator init loses the unseal keys forever.)
Secrets never transit argv, clipboard, scrollback, or /tmp. Capture straight to a 0600 file under $HOME with umask 077; unseal via hidden prompt (L4); transfer via base64 pipe into a root-written 0600 file (L-P6-4). Never echo a secret to verify it - verify by length/format from the file.
Never run maas list (prints the API key - DOCFIX-016). Never trust a juju action's human-formatted output for a captured secret or cert - pull from --format json (indented YAML block-scalars corrupt PEMs; DOCFIX-021).
Authorize charms with short-lived child tokens: juju run persists action params in the operation log (DOCFIX-011).

Record-keeping: D-NNN / DOCFIX-NNN / BUNDLEFIX-NNN

D-NNN - design decisions (docs/design-decisions.md). Append-only: superseded entries stay in place, marked, with the superseding entry appended. DOCFIX-NNN - runbook fixes. BUNDLEFIX-NNN - bundle fixes.
ALWAYS grep the repo for the next free number before assigning one - collisions have happened. State "next-free verified" when you assign.
Mid-task findings are logged as proposals, not acted on (hard rule 1). A finding that changes a runbook becomes a DOCFIX; one that changes architecture becomes a D-NNN with status PROPOSED until the operator rules.
Status vocabulary: PROPOSED/OPEN -> ADOPTED/DECIDED -> SUPERSEDED (by D-MMM). Do not implement PROPOSED items.
Before acting on any change request, grep design-decisions for the governing decision on that surface (and its dependents - e.g. a runbook step may rely on the current state). Finding a PROPOSED/OPEN decision means presenting its recorded options for a ruling, not picking one.

Delivery rules (files handed to the operator)

Multi-file changes ship as repo-relative ZIPs, never loose files (the Windows/GitHub Desktop loose-file workflow misplaces them).
Everything committed is ASCII-only, LF-only. Validate before delivery: non-ASCII with grep -nP '[^\x00-\x7F]' (or a Python byte read); CR bytes with a Python data.count(b"\x0d") - a grep $'\r' false-positives on $r... tokens. Non-ASCII in OpenStack config has caused silent daemon failures (mod_wsgi UnicodeDecodeError).
On-box script delivery over juju ssh goes as a base64 pipe decoded to a file, then the FILE is executed - never a raw heredoc (paste-mangling and stdin-consumption both bite; see script-authoring on read vs pipes).
Windows-side steps are PowerShell-native (no bash heredocs, no backslash continuations). The operator commits from Windows; the jumphost only pulls.
Credential exposures and security obligations get a ROW in docs/security-ledger.md at discovery time (owner + status) - never only a script-comment note; the ledger is reviewed at every phase-00 and handoff.
Under operator-granted blanket approval, the delivery contract is: implement, then hand over ONE cumulative repo-relative ZIP plus a changelog where every item states what changed, the evidence for why, and a per-item revert. The changelog is the review surface; opinion-weighted calls are flagged as such.

Debate and correction norms

Challenge weak reasoning with sources; the operator prefers industry best practice over the quick fix and will push back hard on unverified claims. When challenged, verify (docs, source, live probe) rather than defend.
Own mistakes plainly, in one or two sentences, then fix. No self-flagellation.
When a prior decision looks wrong, propose superseding it through the D-NNN process - do not quietly deviate from it.

Troubleshooting entry discipline

Before any server-side hypothesis for "X used to work, now it doesn't" on a web UI: eliminate client state first (incognito window, ~10 seconds), then a server-side curl, THEN hypothesize. Before any command that acts on tenant resources: confirm which project/domain scope the shell holds (openstack token issue-level certainty) - server create and friends use ambient scope silently. Full triage method: references/troubleshooting.md.