# Operating discipline

How work is executed on this cloud. These conventions exist because the
runbooks and scripts ARE the product: a fix that works but is undocumented,
unrepeatable, or Roosevelt-incompatible is a failure.

## The gated execution model

Every command block is labeled so a command line is never mistaken for prose,
and so mutation risk is explicit (same convention as the repo runbooks):

- `RUN -- <loc>` - the block CHANGES state; run it at <loc> (jumphost, a unit
  via `juju ssh`, the mgmt VM...).
- `CHECK (read-only) -- <loc>` - safe to re-run.
- `GATE:` - hard stop. Do not proceed unless the stated condition holds.
- `Expect:` - what a passing result looks like. Always state it: the operator
  should never have to guess whether output is good.
- `> CAUTION:` - destructive, secret-handling, or irreversible.

Sequence discipline: read-only audit -> present the mutation -> operator
approves/runs -> verify the result -> next step. One gated mutation at a time;
never batch destructive steps. If output comes back unexpected, STOP and
re-derive from the live state - do not improvise a correction inline.

**Chat (no shell):** you prepare blocks, the operator runs and pastes output.
Treat pasted output as the only evidence; a block you wrote but saw no output
for did not happen.

**Live shell:** you may execute `CHECK (read-only)` blocks yourself. `RUN` and
`> CAUTION:` blocks still get presented and human-approved first - state what
will change and why it is the minimal correct action.

## Irreversible / one-shot / secret steps

- Start every consequential session with `bash scripts/run-logged.sh <label>`
  (opens a script(1)-logged shell to ~/as-executed/; index the session in
  logs/as-executed-index.md -- content NEVER commits, the index always does).
- Retrieve the exact prior working command from the as-executed log or runbook
  VERBATIM. Never improvise vault init, unseal, authorize-charm, CA issuance,
  or anything one-shot. (DOCFIX-006: a mis-redirected `vault operator init`
  loses the unseal keys forever.)
- Secrets never transit argv, clipboard, scrollback, or /tmp. Capture straight
  to a 0600 file under $HOME with `umask 077`; unseal via hidden prompt (L4);
  transfer via base64 pipe into a root-written 0600 file (L-P6-4). Never echo
  a secret to verify it - verify by length/format from the file.
- Never run `maas list` (prints the API key - DOCFIX-016). Never trust a juju
  action's human-formatted output for a captured secret or cert - pull from
  `--format json` (indented YAML block-scalars corrupt PEMs; DOCFIX-021).
- Authorize charms with short-lived child tokens: `juju run` persists action
  params in the operation log (DOCFIX-011).

## Record-keeping: D-NNN / DOCFIX-NNN / BUNDLEFIX-NNN

- `D-NNN` - design decisions (`docs/design-decisions.md`). Append-only:
  superseded entries stay in place, marked, with the superseding entry
  appended. `DOCFIX-NNN` - runbook fixes. `BUNDLEFIX-NNN` - bundle fixes.
- ALWAYS grep the repo for the next free number before assigning one -
  collisions have happened. State "next-free verified" when you assign.
- Mid-task findings are logged as proposals, not acted on (hard rule 1).
  A finding that changes a runbook becomes a DOCFIX; one that changes
  architecture becomes a D-NNN with status PROPOSED until the operator rules.
- Status vocabulary: PROPOSED/OPEN -> ADOPTED/DECIDED -> SUPERSEDED (by
  D-MMM). Do not implement PROPOSED items.
- Before acting on any change request, grep design-decisions for the
  governing decision on that surface (and its dependents - e.g. a runbook
  step may rely on the current state). Finding a PROPOSED/OPEN decision
  means presenting its recorded options for a ruling, not picking one.

## Delivery rules (files handed to the operator)

- Multi-file changes ship as repo-relative ZIPs, never loose files (the
  Windows/GitHub Desktop loose-file workflow misplaces them).
- Everything committed is ASCII-only, LF-only. Validate before delivery:
  non-ASCII with `grep -nP '[^\x00-\x7F]'` (or a Python byte read); CR bytes
  with a Python `data.count(b"\x0d")` - a `grep $'\r'` false-positives on
  `$r...` tokens. Non-ASCII in OpenStack config has caused silent daemon
  failures (mod_wsgi UnicodeDecodeError).
- On-box script delivery over `juju ssh` goes as a base64 pipe decoded to a
  file, then the FILE is executed - never a raw heredoc (paste-mangling and
  stdin-consumption both bite; see script-authoring on `read` vs pipes).
- Windows-side steps are PowerShell-native (no bash heredocs, no backslash
  continuations). The operator commits from Windows; the jumphost only pulls.

- Credential exposures and security obligations get a ROW in
  docs/security-ledger.md at discovery time (owner + status) - never only a
  script-comment note; the ledger is reviewed at every phase-00 and handoff.
- Under operator-granted blanket approval, the delivery contract is: implement,
  then hand over ONE cumulative repo-relative ZIP plus a changelog where every
  item states what changed, the evidence for why, and a per-item revert. The
  changelog is the review surface; opinion-weighted calls are flagged as such.

## Debate and correction norms

- Challenge weak reasoning with sources; the operator prefers industry best
  practice over the quick fix and will push back hard on unverified claims.
  When challenged, verify (docs, source, live probe) rather than defend.
- Own mistakes plainly, in one or two sentences, then fix. No self-flagellation.
- When a prior decision looks wrong, propose superseding it through the D-NNN
  process - do not quietly deviate from it.

## Troubleshooting entry discipline

Before any server-side hypothesis for "X used to work, now it doesn't" on a
web UI: eliminate client state first (incognito window, ~10 seconds), then a
server-side curl, THEN hypothesize. Before any command that acts on tenant
resources: confirm which project/domain scope the shell holds
(`openstack token issue`-level certainty) - `server create` and friends use
ambient scope silently. Full triage method: references/troubleshooting.md.
