# Phase 03 -- Core Verify (settle, admin-openrc, Horizon)

After vault's cert cascade (phase-02), confirm the cloud settled to active/idle
(except the expected post-deploy blocks), build the IP-only `admin-openrc`, verify
API reachability, and repoint the external Horizon reverse proxy.

Decisions: B5 (IP-only endpoints; no FQDN), D-021 (octavia stays BLOCKED awaiting
configure-resources -- expected, cleared in phase-05), D-044 (Horizon Secure-cookie
override on the plain-HTTP proxy leg; Step 3.3, PER-REBUILD), D-045 / DOCFIX-031 (haproxy
backends confirmed LOADED via a functional sweep, NOT juju status; Step 3.1). Troubleshooting:
appendix-A -- DOCFIX-021 (action human-output corrupts captured artifacts), DOCFIX-018 (IP-only
OS_AUTH_URL), DOCFIX-022 (admin project discovered, not hardcoded), D-045/DOCFIX-031 (haproxy
plaintext-check-vs-SSL backend DOWN), nginx reverse-proxy lessons.

---

## Prerequisites (must be true entering phase-03)
- phase-02 done: vault unsealed + authorized; root CA generated; the cert cascade is
  running/settling.
- The Vault root CA is available via the vault charm action (pulled below).

## Constants and env-literals (TAG: confirm per site on rebuild)
- `ENV(keystone-vip)` 10.12.4.50      (keystone PUBLIC endpoint = provider VIP; verify vs bundle)
- `ENV(admin-domain)` admin_domain    (charmed-keystone admin user + project domain)
- `ENV(dashboard-vip)` 10.12.4.58     (Horizon provider VIP; was .234 pre-R14)
- admin project: DISCOVERED at runtime (do not hardcode -- DOCFIX-022).

## Run-location legend
- `# RUN: jumphost` -- vopenstack-jesse as jessea123; `juju` + `openstack` + `openssl`.

---

## Step 3.1 -- Settle the cert cascade + acceptance walk
`# RUN: jumphost`  The cascade here is NARROW (mysql bootstrapped before vault init,
so only the Vault consumers clear: ovn-central x3, ovn-chassis x3,
ovn-chassis-octavia, neutron-api-plugin-ovn, barbican-vault). Watch, then walk units
AND subordinates.
```bash
juju status --color --watch 30s -m openstack     # Ctrl-C once settled
```
Acceptance walk (counts non-active/idle across units + subordinates):
```bash
juju status -m openstack --format=yaml | python3 -c "
import yaml,sys
d=yaml.safe_load(sys.stdin); apps=d.get('applications',{}); bad=[]
def chk(n,u):
    ws=(u.get('workload-status') or {}).get('current',''); js=(u.get('juju-status') or {}).get('current','')
    msg=(u.get('workload-status') or {}).get('message','')
    if ws!='active' or js!='idle': bad.append('%s: workload=%s juju=%s msg=%s'%(n,ws,js,msg))
for app,info in apps.items():
    for un,ud in (info.get('units') or {}).items():
        chk(un,ud)
        for sn,sd in (ud.get('subordinates') or {}).items(): chk(sn,sd)
print('Non-active/idle units: %d'%len(bad))
for b in bad: print('  '+b)
"
```
GATE: expected non-active/idle = **1** (octavia/0 BLOCKED "Awaiting configure-resources",
the D-021 next step) or briefly **2** (+ glance-simplestreams-sync, normal pre-run).
Any TLS consumer (the five above) persisting waiting/error past ~15 min is the concern
-- STOP and read its log + relations (do NOT assume TLS; a prior stall was a MySQL 1045
desync):
```bash
juju status --relations -m openstack ovn-central ovn-chassis ovn-chassis-octavia neutron-api-plugin-ovn barbican-vault
# juju ssh -m openstack <unit> -- 'sudo tail -120 /var/log/juju/unit-<unit-dashed>.log' </dev/null
```

### Step 3.1 backend-health gate (DOCFIX-031 / D-045) -- juju status is BLIND to a dead backend
`# RUN: jumphost`  The acceptance walk above gates on juju active/idle. That is NOT
sufficient: a unit can be active/idle while a charm-rendered haproxy backend is silently
DOWN -- observed 2026-06-12, nova-cc nova-api down ~3.2 days with juju green (root cause
D-045: haproxy not reloaded after the cert cascade -> plaintext checks vs the SSL backend).
Probe haproxy's own verdict on every unit:
```bash
( {
  echo "=== POST-TLS GATE: haproxy backend health sweep across all units ==="
  for unit in $(juju status -m openstack --format=json | python3 -c 'import json,sys; d=json.load(sys.stdin); [print(u) for a in d.get("applications",{}).values() for u in (a.get("units") or {})]'); do
    juju ssh -m openstack "$unit" -- "test -S /var/run/haproxy/admin.sock || exit 0; sudo python3 -c 'import socket;s=socket.socket(socket.AF_UNIX);s.connect(\"/var/run/haproxy/admin.sock\");s.sendall(b\"show stat\n\");print(s.makefile().read())' | grep -vE 'FRONTEND|BACKEND' | grep ',DOWN,'" </dev/null 2>/dev/null | sed "s|^|[$unit] DOWN: |"
  done
  echo "=== sweep complete -- no DOWN lines above means every haproxy backend is UP ==="
} )
```
GATE: zero `[unit] DOWN:` lines. On a DOWN line (check token L7STS/400 == plaintext-vs-SSL),
remediate the flagged unit (set U, then validate-and-reload):
```bash
U=nova-cloud-controller/0
juju ssh -m openstack "$U" -- 'sudo haproxy -c -f /etc/haproxy/haproxy.cfg' </dev/null   # gate: must say valid
juju ssh -m openstack "$U" -- 'sudo systemctl reload haproxy' </dev/null                 # graceful master-worker
```
Re-run the sweep until clean. (Signature confirm if needed: plaintext `curl http://SVC-IP:876x/`
returns 400, TLS `curl -k https://SVC-IP:876x/` returns 200 -- the transport is the difference.)

## Step 3.2 -- Build admin-openrc (IP-only; canonical block)
`# RUN: jumphost`  Keystone PUBLIC = the provider VIP IP over HTTPS with the vault
CA (no FQDN, no /etc/hosts -- B5). This canonical block folds in three fixes:
the CA is pulled via `--format json` + jq because the action's human output wraps the
PEM in an INDENTED YAML block that is not valid PEM (appendix-A: DOCFIX-021); the
OS_AUTH_URL is the VIP IP (DOCFIX-018); and the admin project is DISCOVERED by a
scope-test loop rather than hardcoded, because the scoping project name varies by
charm rev (DOCFIX-022 -- the cause of a prior HTTP 401). `( set -e )` keeps
OS_PASSWORD inside the subshell and aborts cleanly on any failure.

```bash
KEYSTONE_VIP=10.12.4.50              # keystone PUBLIC endpoint = provider VIP (verify vs bundle on rebuild)
ADMIN_DOMAIN=admin_domain            # charmed-keystone admin user + project domain
PROJECT_CANDIDATES="admin admin_domain"   # tried in order; first that SCOPES wins (DOCFIX-022 variance)
CA="$HOME/vault-init/vault-ca-root.pem"
RC="$HOME/admin-openrc"

( set -e
  mkdir -p "$HOME/vault-init"
  # 1. Vault root CA -> file (JSON extract; DOCFIX-021 -- human output indents the PEM)
  juju run vault/leader get-root-ca -m openstack --format json \
    | jq -r '[.. | strings | select(test("-----BEGIN CERTIFICATE-----"))][0]' > "$CA"
  openssl x509 -in "$CA" -noout -subject -dates
  # 2. Admin password -> var (JSON extract, not human output)
  ADMIN_PASS=$(juju run keystone/leader get-admin-password -m openstack --format json | python3 -c "
import json,sys
d=json.load(sys.stdin)
def f(o):
    if isinstance(o,dict):
        for k in ('admin-password','password','Stdout'):
            if k in o and o[k]: return str(o[k]).strip()
        for v in o.values():
            r=f(v)
            if r: return r
    elif isinstance(o,list):
        for v in o:
            r=f(v)
            if r: return r
    return ''
print(f(d))
")
  [ -n "$ADMIN_PASS" ] || { echo "FATAL: password extract failed"; exit 1; }
  # 3. PROJECT LOOKUP: first candidate that issues a SCOPED token wins (DOCFIX-022)
  export OS_AUTH_URL="https://${KEYSTONE_VIP}:5000/v3" OS_USERNAME=admin OS_PASSWORD="$ADMIN_PASS"
  export OS_USER_DOMAIN_NAME="$ADMIN_DOMAIN" OS_PROJECT_DOMAIN_NAME="$ADMIN_DOMAIN"
  export OS_IDENTITY_API_VERSION=3 OS_REGION_NAME=RegionOne OS_CACERT="$CA"
  ADMIN_PROJECT=""
  for P in $PROJECT_CANDIDATES; do
    if OS_PROJECT_NAME="$P" openstack token issue >/dev/null 2>&1; then ADMIN_PROJECT="$P"; break; fi
  done
  [ -n "$ADMIN_PROJECT" ] || { echo "FATAL: no candidate project scoped (tried: $PROJECT_CANDIDATES)"; exit 1; }
  echo "[OK] admin project = $ADMIN_PROJECT ; password len ${#ADMIN_PASS}"
  # 4. Write ~/admin-openrc (backs up any existing one first)
  [ -f "$RC" ] && mv "$RC" "$RC.pre-$(date -u +%Y%m%dT%H%M%SZ)"
  cat > "$RC" <<EOF
export OS_AUTH_URL=https://${KEYSTONE_VIP}:5000/v3
export OS_USERNAME=admin
export OS_PASSWORD='$ADMIN_PASS'
export OS_PROJECT_NAME=$ADMIN_PROJECT
export OS_USER_DOMAIN_NAME=$ADMIN_DOMAIN
export OS_PROJECT_DOMAIN_NAME=$ADMIN_DOMAIN
export OS_IDENTITY_API_VERSION=3
export OS_REGION_NAME=RegionOne
export OS_CACERT=$CA
EOF
  chmod 600 "$RC"
)
# 5. Verify from the written file (password stayed inside the subshell above)
( source "$RC"; echo "auth -> $OS_AUTH_URL  project=$OS_PROJECT_NAME"; openstack token issue 2>&1 | head -6 )
( source "$RC"; openstack endpoint list -f value -c "Service Name" -c Interface -c URL 2>&1 | sort )
```
GATE: `token issue` returns a SCOPED token; `endpoint list` is IP-only across all
services (public on the provider VIP `.5x`, internal+admin on the metal VIP `.8.5x`,
keystone admin on `:35357`). Two non-blocking notes for later: s3/swift is registered
on the radosgw VIP `.60:443` (re-check vs the radosgw `:80` listener during any
Swift/S3 smoke); the gss image-stream is HTTP on metal `10.12.8.172`.

## Step 3.3 -- Horizon access via the external nginx reverse proxy
`# RUN: operator (outside the Juju model) + jumphost`  Horizon is fronted by an
operator-managed nginx reverse proxy. On each rebuild / VIP relocation: (1) repoint the
upstream to the CURRENT dashboard provider VIP (now `https://10.12.4.58`, was `.234`
pre-R14), and (2) reapply the Horizon Secure-cookie override (DOCFIX-030 / D-044,
PER-REBUILD -- below). Two interplays:
- ALLOWED_HOSTS: Horizon (bundle B5) must permit the Host header that reaches it, else
  HTTP 400 DisallowedHost. As-built keeps the client Host (`proxy_set_header Host $http_host`);
  rewriting it to the VIP would emit redirects corporate clients may not route.
- Upstream TLS name-match: the dashboard cert is vault-signed and embeds the unit HOSTNAME
  as a DNS SAN (e.g. `juju-ffe3b8-2-lxd-2`) alongside IP SANs. nginx upstream verification
  is DNS-only (X509_check_host), so the proxy_pass IP NEVER matches -- `proxy_ssl_verify on`
  requires `proxy_ssl_name` set to the cert's DNS SAN. (B5 is IP-only for ENDPOINTS, not certs.)

Proxy topology (as-executed 2026-06-12 -- confirm/refresh per site):
- Proxy host `nginx` 10.12.4.7 (Ubuntu 24.04, nginx 1.24.0 native/systemd); also fronts
  MAAS (listen 80 -> 10.12.4.10:5240). Horizon vhost `/etc/nginx/sites-available/openstack`
  (symlinked into sites-enabled), listen 81; corporate clients reach it via 10.17.11.246:81.

As-executed change set (gate every edit -- `sed -i` exits 0 on zero matches, so grep-assert
the expected line after any mutation):
```bash
# RUN: jumphost -- ship the vault root CA to the proxy
scp ~/vault-init/vault-ca-root.pem jessea123@10.12.4.7:/tmp/
```
```bash
# RUN: operator ON 10.12.4.7 -- install CA, back up + edit the Horizon vhost, validate, restart.
sudo install -o root -g root -m 644 /tmp/vault-ca-root.pem /etc/nginx/vault-ca-root.pem && rm -f /tmp/vault-ca-root.pem
sudo cp -a /etc/nginx/sites-available/openstack "/etc/nginx/sites-available/openstack.bak-$(date -u +%Y%m%dT%H%M%SZ)"
# Set in the Horizon server block (then `grep` to confirm each landed):
#   proxy_pass https://10.12.4.58:443;
#   proxy_ssl_trusted_certificate /etc/nginx/vault-ca-root.pem;
#   proxy_ssl_verify on;
#   proxy_ssl_name juju-ffe3b8-2-lxd-2;   # the dashboard cert's DNS SAN -- per site (discover: openssl s_client -connect 10.12.4.58:443 </dev/null 2>/dev/null | openssl x509 -noout -ext subjectAltName)
#   proxy_redirect https://$http_host/ http://$http_host/;   # unwind the scheme-mismatch redirect loop (Horizon emits absolute https:// on the client Host -> browser then speaks TLS to the :81 plaintext listener)
sudo nginx -t                       # GATE: configuration ok
sudo systemctl restart nginx        # prefer restart over reload for a definitive cutover (a curl ~2s after `reload` can be served by a draining old worker; ~2s blip incl. the co-hosted MAAS proxy)
```
GATE (on the proxy): `curl -sI http://127.0.0.1:81/horizon/` -> 302 to .../auth/login; no TLS errors in error.log.

### DOCFIX-030 -- Horizon Secure-cookie override (D-044; PER-REBUILD)
The charm renders `CSRF_COOKIE_SECURE`/`SESSION_COOKIE_SECURE = True` (vault:certificates).
On the plain-HTTP client leg the browser drops the Secure csrftoken and login fails with
"CSRF cookie not set" -- so a clean follow of 3.3 otherwise stalls at the browser login.
Drop an ASCII-only post-load override on the dashboard unit, then graceful-reload apache2:
```bash
# RUN: jumphost -- D-044 cookie override on the dashboard unit (ASCII-only; PER-REBUILD)
juju ssh -m openstack openstack-dashboard/leader -- "printf 'CSRF_COOKIE_SECURE = False\nSESSION_COOKIE_SECURE = False\n' | sudo tee /usr/share/openstack-dashboard/openstack_dashboard/local/local_settings.d/_99_internal_http_cookies.py >/dev/null && sudo systemctl reload apache2" </dev/null
```
Verify the csrftoken Set-Cookie carries NO Secure attribute (over the VIP, vault CA):
```bash
# RUN: jumphost
CK=$(curl -s -o /dev/null -D - --cacert ~/vault-init/vault-ca-root.pem https://10.12.4.58/horizon/auth/login/ | grep -i 'set-cookie:.*csrftoken')
[ -n "$CK" ] || echo "WARN: no csrftoken Set-Cookie on this GET -- confirm via the browser login"
printf '%s\n' "$CK" | grep -iq 'secure' && echo "FAIL: csrftoken still Secure" || echo "OK: csrftoken not Secure"
```
Then confirm an external browser login over the proxy succeeds.
PER-REBUILD: teardown wipes the unit; reapply each rebuild until edge TLS (the Roosevelt
access/DNS workstream) makes the override unnecessary. The upstream stays PLAIN HTTP
(as-built); the abandoned upstream-TLS and self-signed-client-TLS approaches are NOT part
of v1 (D-044 rationale). Diagnostic lessons (reload race, proxy_ssl_name DNS-SAN, sed no-op,
scheme-mismatch redirect loop) are in appendix-A.

---

## EXIT GATE (phase-03 complete)
- Cloud settled: acceptance walk shows only the expected block(s) (octavia; maybe gss).
- Every haproxy backend UP/L7OK by the functional sweep (DOCFIX-031), not merely juju active/idle.
- `~/admin-openrc` (0600) authenticates and returns a SCOPED token; endpoint list IP-only.
- Vault root CA at `~/vault-init/vault-ca-root.pem` validates TLS to the keystone VIP.
- Horizon reachable through the repointed reverse proxy AND login works (D-044 cookie override
  applied). Dashboard webroot is the charm default `/horizon` (the root path 404s) -- probe
  `/horizon/auth/login/`. Two VIPs per bundle B1: 10.12.4.58 (provider) + 10.12.8.58 (metal).

## As-built reference (2026-06-03 run -- audit trail)
- Cascade settled ~04:15Z: all five Vault consumers active/idle; only expected
  non-active/idle = octavia (blocked, D-021) + gss (pre-run). mysql primary on
  mysql-innodb-cluster/1 (R/W), /0+/2 R/O (normal innodb-cluster).
- admin-openrc IP-only: OS_AUTH_URL=https://10.12.4.50:5000/v3, OS_USERNAME=admin,
  OS_PROJECT_NAME=admin (scoped; project_id 65ce73e6798e4d1e8dd066609b7033ef),
  domains admin_domain, OS_CACERT=~/vault-init/vault-ca-root.pem.
- Vault root CA: subject "Vault Root Certificate Authority (charm-pki-local)",
  notBefore 2026-06-03, notAfter 2036-05-31; TLS to 10.12.4.50:5000 OK (B5 IP-SAN holds).
- Dashboard VIP 10.12.4.58 (provider) + 10.12.8.58 (metal); nginx upstream repoint + D-044
  cookie override captured in Step 3.3 (as-executed 2026-06-12).
- gss image-stream public endpoint is HTTP on the unit IP this rebuild (10.12.8.196; was .172
  the 06-03 snapshot -- a rebuild-variable container primary, not a defect; off the critical
  path, no jumphost route to the container space). Refresh the snapshot per rebuild.
- Exit-gate re-confirm (2026-06-16, read-only, all green): settle (only octavia D-021 + gss
  between-runs); admin-openrc scoped + Nova compute service list up (D-045 holds end-to-end);
  vault-CA -> keystone VIP TLS verify rc 0; haproxy backend sweep zero DOWN cloud-wide.

## Next
phase-04 -- network carve (external provider network).
