diff --git a/docs/DOCFIX-064-phase08-changelist.md b/docs/DOCFIX-064-phase08-changelist.md new file mode 100644 index 0000000..d7224b7 --- /dev/null +++ b/docs/DOCFIX-064-phase08-changelist.md @@ -0,0 +1,81 @@ +# DOCFIX-064 -- phase-08 runbook change-list (DRAFT, 2026-07-01) + +RESERVED number: DOCFIX-064 (per changelog next-free note). This is the accumulated +phase-08 operator-runbook (single-consumer acceptance) sweep. Written as a change-LIST with +exact anchors + evidence so the edit is mechanical when phase-08 is finalized. NOT yet applied +to runbooks/phase-08-workload-cluster-acceptance.md. + +Scope note: these are fixes to the OPERATOR single-consumer acceptance path (capi-test-1 in +capi-mgmt scope). The multi-tenant tenant->cluster flow is a SEPARATE deliverable +(tenant-onboarding-v2-DRAFT.md). Some items overlap (image --public, image-by-UUID, template +ownership scope) because both paths hit them. + +-------------------------------------------------------------------------------- +## Items +-------------------------------------------------------------------------------- + +1. IMAGE SEED MUST create the image `--public` [Step 8.0, image create] + Evidence: a shared/owner-only kube image causes magnum cluster/template create to fail with + `Cluster type (vm, Unset, kubernetes) not supported` -- a non-owner (or the driver acting in + another project) cannot read `os_distro`, so type-derivation returns Unset. Fix: the seed + `openstack image create ... --public` (and re-verify visibility=public post-create). + +2. SEED HARDENING [Step 8.0] + - curl with retry + connect/max timeout (fail loud on partial/hung download). + - sha512 verify against the published manifest is a hard GATE (already present -- keep). + - poll image to `active` as a hard-gate loop (not a fixed sleep). + - POST-active property re-verify: kube_version, os_distro, visibility=public, disk_format. + +3. IMAGE-ABSENT PRESENCE GUARD [Step 8.0] + Explicitly branch "image present -> verify props" vs "absent -> seed", so a re-run does not + double-seed and a present-but-wrong-visibility image is caught (ties to item 1). + +4. IMAGE BY UUID, not name [Step 8.0 template create; 8.1] + Evidence: a doubled-quoted image NAME resolved to the literal `'name'` (no image) -> Unset + type -> 400. Passing the resolved UUID removes the quoting/resolution surface. Gate the UUID + with `grep -qE '^[0-9a-f-]{36}$'` before use. + +5. TEMPLATE CREATE -- OWNER PROJECT SCOPE [Step 8.0] + Evidence: `coe cluster template create/show` and `cluster create --cluster-template ` + resolve the template within the CALLER'S project (templates are visible by ownership). + A private template created in capi-mgmt is NOT selectable by name from admin scope (create + 404s while `template list` still shows it). Fix: run the template create AND the cluster + create in the SAME project scope that owns the template (capi-mgmt for the operator path). + Add the capi-mgmt scope preamble (resolve `capi-mgmt` --domain capi dynamically; export + OS_PROJECT_ID) before both. + +6. FLAVOR-FLOOR PRE-CHECK [Step 8.0 template create] + Magnum requires master/node flavors >= 2 vcpu and >= 2048 MB. Pre-check the chosen flavors + against the floor and fail loud, rather than surfacing an opaque driver error later. + +7. OCTAVIA PREREQ -- CAPTURE REAL EXIT [Prerequisites / Step 8.0] + The octavia-healthy probe must capture the actual command result and test it, NOT + `... | head || echo` (which masks failure -- head succeeds on empty input). Same + capture-and-test-result discipline applied across the onboarding v2 blocks. + +8. 8.1 PRE-CHECKS -- D-039 role + keypair [Step 8.1] + Before cluster create, assert (a) the trustor holds member + load-balancer_member (+ reader) + on the cluster project (D-039 -- else CAPO 403s at the Octavia LB step), and (b) the keypair + exists in the creating scope. Fail loud pre-create. + +9. POLICYD ZIP PATH UNDER $HOME (snap confinement) [appendix-C section C.3] + Evidence: `juju attach-resource ... /tmp/overrides.zip` failed "no such file or directory" + though the shell saw the file -- the confined juju snap cannot read /tmp. Build the zip under + $HOME. Also: `zip` is absent on the jumphost -- build via python3 zipfile (arcname=top-level). + Fix appendix-C C.3 to use a $HOME path and the python3 zipfile method (currently shows + `zip -j /tmp/overrides.zip`). + +-------------------------------------------------------------------------------- +## Cross-doc corrections (already staged in this package) +-------------------------------------------------------------------------------- +- appendix-C: manager domain-enumeration is own-domain-only on this cloud (2.5d finding); + the cloud-wide names-only leak does NOT manifest. (Applied in appendix-C-identity-rbac.md here.) +- appendix-D: cluster-create trust model; D.7 status updated (Stages 1-4 validated, Stage 6 + create_trust outstanding). Needs committing (was packaged, not yet in repo). + +-------------------------------------------------------------------------------- +## Sequencing +-------------------------------------------------------------------------------- +Apply items 1-8 to phase-08 and item 9 to appendix-C only AFTER Stage 6 (create_trust) is +resolved -- if the multi-tenant trust step surfaces a further phase-08-relevant fix (e.g. a +CONF.trust.roles pin), fold it into the same DOCFIX-064 sweep rather than reopening. diff --git a/runbooks/appendix-C-identity-rbac.md b/runbooks/appendix-C-identity-rbac.md new file mode 100644 index 0000000..e8a1f55 --- /dev/null +++ b/runbooks/appendix-C-identity-rbac.md @@ -0,0 +1,125 @@ +# Appendix C -- Identity / RBAC reference + +Authoritative reference for the cloud's identity model: which keystone roles and personas exist, +what each is for, and which accounts should receive which assignments. Governed by D-051 (SCS +Domain Manager persona) as reconciled by D-064 (scs-0302 alignment + create-op templating fix). + +The persona is delivered by a keystone policy override (`policies/domain-manager-policy.yaml`) +attached as a `policyd-override` resource. Provenance: SCS standard scs-0302-w1 (Domain Manager). +This is a TRANSITIONAL policy for pre-2024.2 keystone; on any upgrade to 2024.2+ the override MUST +be removed in favour of the native domain-manager persona (see "Removal" below). + +Do NOT hand-assign roles ad hoc. Provision accounts per the tables below so the model stays +auditable and the tenant isolation guarantees hold. + +--- + +## C.1 Role / persona catalog + +| Role / persona | Keystone rule | What it is for | Typical scope | Notes | +|----------------|---------------|----------------|---------------|-------| +| Admin / admin | `admin_required` (role:Admin); becomes `cloud_admin` when scoped to the admin domain/project | Full cloud authority (as cloud_admin), or domain-scoped admin over a single service domain | admin domain + admin project (cloud_admin); or a service domain (e.g. magnum) | "Admin" and "admin" are the same role (case-insensitive alias). After D-064 an admin-on-a-domain can create users/projects in THAT domain. NOT given to tenants -- an admin role is not safely domain-confinable; tenants get `manager` instead. | +| manager (SCS Domain Manager) | `is_domain_manager` = role:manager, plus the D-051/D-064 override | Domain-confined IAM self-service: create/manage users, projects, groups, and role assignments WITHIN the manager's own domain | a tenant's domain | Requires the policy override attached. Can assign ONLY the roles in `is_domain_managed_role` (below); cannot grant `admin` or `manager` (anti-escalation). | +| member | standard project role | Operate project resources (compute, network, volumes, images) | a project | The baseline tenant working role. | +| load-balancer_member | Octavia project role | Create / manage Octavia load balancers | a project | Required for tenant LBs and for Magnum (the cluster apiserver LB). Must be held by the Magnum trustor on capi-mgmt (D-039). | +| reader | standard read role | Read-only visibility | project or domain | Optional. Part of the Magnum trustor role set on capi-mgmt (D-039). | + +Manager-assignable roles (`is_domain_managed_role`): **member** and **load-balancer_member** only. +This is the anti-escalation boundary -- a compromised or careless domain manager cannot mint an +admin or another manager, and cannot reach outside its own domain. + +--- + +## C.2 Account -> role assignment + +| Account | Role(s) | Scope | Why / reference | +|---------|---------|-------|-----------------| +| admin (operator super-admin) | Admin | admin domain + admin project (= cloud_admin) | Cloud operator; full authority. Bootstrap identity. | +| admin (as Magnum trustor) | member + load-balancer_member + reader | capi-mgmt project | So the app-cred Magnum mints per cluster carries Octavia authority for the apiserver LB (D-039). These are the frozen trustor roles delegated into each cluster trust. | +| magnum_domain_admin | Admin | magnum domain | Magnum trustee domain admin; creates the per-cluster trustee USER at cluster-create (D-046; Magnum docs). Works via the D-064 create-op fix -- no extra grant needed. Recreated by the `domain-setup` charm action after every teardown/redeploy (D-046). | +| -domain-admin | manager | the tenant's domain | SCS Domain Manager persona (D-051/D-064). Operator provisions the domain + this one account; the tenant self-services users, projects, and member/load-balancer_member grants from there. | +| human users | member (+ load-balancer_member if they use LBs or Magnum) | the tenant's project(s) | Created and assigned by the tenant's own domain-manager via Horizon/CLI. Operator is not in the loop. | +| -ci / service accounts | member + load-balancer_member | the tenant's project | Backing identity for the application credential that CI/automation authenticates with. load-balancer_member so tenant CI can drive Magnum/LBs. | +| per-cluster trustee | (delegated via trust -- not a direct grant) | -- | Magnum mints this at cluster-create and deletes it at cluster-delete. It carries the trustor's frozen roles through the trust (D-039). Never assign roles to it by hand. | + +Provisioning direction: the operator creates a tenant's DOMAIN and its single `manager` account, +then hands off. Everything below the domain (users, projects, member/LB grants) is tenant +self-service. This is the whole point of the persona -- it removes the operator from routine +tenant IAM while keeping a hard domain boundary. + +--- + +## C.3 Attaching the policy override + +Prerequisite: keystone deployed with `use-policyd-override=true` (already set in the bundle). +Note D-050: `use-policyd-override=true` with NO resource attached is a silent no-op -- the zip +must actually be supplied, or the override does nothing while reporting healthy. + +Pre-attach re-check on the jumphost (from the repo working copy), then attach: + +``` +# 1) validate the file before attaching (YAML + ASCII + connector lint) +python3 -c 'import yaml,sys; d=yaml.safe_load(open("policies/domain-manager-policy.yaml")); print("YAML OK:",len(d),"rules")' +LC_ALL=C grep -nP "[^\x00-\x7F]" policies/domain-manager-policy.yaml && { echo "NON-ASCII -- STOP"; exit 1; } || echo "ASCII clean" +grep -nE "\)\s+(rule:|role:|token|project_id|domain_id|user_id|None:)" policies/domain-manager-policy.yaml && { echo "MALFORMED connector -- STOP"; exit 1; } || echo "connector lint clean" + +# 2) package and attach (the zip's top-level file name is what keystone reads) +cd policies && zip -j /tmp/overrides.zip domain-manager-policy.yaml && cd - +juju attach-resource -m openstack keystone policyd-override=/tmp/overrides.zip +``` + +Gate: `juju status -m openstack keystone` must move from `PO (broken)` to `PO:` and settle +active/idle, with no other charm disturbed. If the charm rejects the file it stays `PO (broken)`. + +Rollback (immediate): `juju config -m openstack keystone use-policyd-override=false` +(the override stops applying; keystone reverts to shipped defaults on the next hook). + +The charm validates YAML only. It does NOT parse the oslo.policy rule grammar, so a syntactically +malformed rule can pass validation and silently no-op. Always run the oslo.policy parse in the +sandbox before delivering a change to this file. + +--- + +## C.4 Tenant self-service validation (G3) + +Run after the override is active. Confirms the persona works AND is properly bounded. + +PASS cases (a `manager`-on-domain account, scoped to its own domain, must succeed): +- create a user in its own domain +- create a project in its own domain +- grant `member` and `load-balancer_member` to a user on a project in its own domain + +DENY cases (the same account must be refused): +- grant `admin` or `manager` to anyone (anti-escalation) +- create/read/modify anything in a DIFFERENT domain (cross-domain isolation) + +Unaffected: +- cloud_admin (the operator admin) retains full authority everywhere + +Only when all three groups hold is the persona accepted for that release. + +--- + +## C.5 Known limitations (carried from scs-0302) + +- CORRECTION (verified live 2026-07-01): on THIS deployment a domain manager's `domain list` + returns ONLY its own domain -- keystone scope-filters the result even though the D-064 policy + authorizes `list_domains` for a manager. So the cloud-wide domain-name enumeration described + in the SCS worst-case does NOT manifest here; isolation is tighter than the standard warns. + Role names (`list_roles`) may still be broadly visible (the persona needs to resolve role + names); this remains names-only and confers no resource access. The original SCS caveat is + retained below as version/config-dependent, NOT asserted as this cloud's behavior: +- [SCS worst-case, not observed here] A domain manager could enumerate ALL domain names/ids + (`list_domains`) and ALL role names (`list_roles`) cloud-wide. This is names/ids only -- + no access to other domains' resources -- + and is required for the manager to resolve domains/roles by name. It is inherent to the + pre-2024.2 transitional policy; upstream RBAC-scoping of domain listing is a pending fix. +- The persona relies on `enforce_scope=False` (old-style policy). It is a bridge, not the + destination. + +## C.6 Removal (on upgrade to keystone 2024.2+) + +2024.2 ships a native domain-manager persona and secure-RBAC scope enforcement. On that upgrade: +detach the `policyd-override` resource (or set `use-policyd-override=false`), adopt the native +persona, and retire this file. Leaving the old-style override in place on a secure-RBAC keystone +is unsupported and will conflict. Tracked by D-051/D-064. diff --git a/runbooks/appendix-D-magnum-trust-model.md b/runbooks/appendix-D-magnum-trust-model.md new file mode 100644 index 0000000..b6d193c --- /dev/null +++ b/runbooks/appendix-D-magnum-trust-model.md @@ -0,0 +1,176 @@ +# Appendix D -- Magnum cluster-create trust model (multi-tenant) + +Fills the gap the onboarding runbook Stage 7 marks [PENDING]: exactly which identity +creates a Magnum cluster, and why the Keystone trust delegation constrains that choice. +Grounded in the magnum source (magnum/common/keystone.py, read live 2026-07-01) and the +D-039 / D-051 / D-064 identity model. Supersedes the single-consumer shortcut used on +2026-06-09 (admin creates in the admin-owned capi-mgmt project), which sidesteps -- rather +than exercises -- the trust constraint and therefore does NOT validate the tenant path. + +-------------------------------------------------------------------------------- +## D.1 What magnum does at cluster-create (the mechanism) +-------------------------------------------------------------------------------- + +Two Keystone writes happen before any infrastructure is touched +(magnum/conductor/handlers/common/trust_manager.py -> create_trustee_and_trust): + +1. create_trustee -> `identity:create_user` + Magnum's trustee_domain_admin (magnum_domain_admin, Admin on the magnum domain) + creates a per-cluster service user in the magnum domain. This is the step D-064 + unblocked (the create_user policy templating fix). VALIDATED live 2026-07-01: + trustee user is created successfully. + +2. create_trust -> `identity:create_trust` + Magnum creates a Keystone trust delegating the CALLER's roles to that trustee. + From magnum/common/keystone.py: + + def create_trust(self, trustee_user): + trustor_user_id = self.session.get_user_id() # the CALLER's user + trustor_project_id = self.session.get_project_id() # the CALLER's project + if CONF.trust.roles: + roles = CONF.trust.roles # (unset on this deploy) + else: + roles = self.context.roles # -> the roles in the CALLER's token + self.client.trusts.create( + trustor_user=trustor_user_id, project=trustor_project_id, + trustee_user=trustee_user, impersonation=True, role_names=roles) + +Two facts follow directly from that code, and they are the whole model: + + A. The TRUSTOR is the identity that issued `openstack coe cluster create` + (`self.session` is the request-context client). The Keystone policy + `identity:create_trust = "user_id:%(trust.trustor_user_id)s"` is therefore + satisfied by construction -- caller == trustor. (So the create_trust 403 is + NOT a trustor-identity policy failure.) + + B. The DELEGATED ROLES are `self.context.roles` -- the roles present in the + CALLER's token on `trustor_project_id`. Keystone's create_trust REFUSES to + delegate any role the trustor does not actually hold on that project + (a trust cannot grant more than the trustor has). `CONF.trust.roles` is unset + here, so magnum delegates the caller's token roles verbatim -- whatever they are. + +-------------------------------------------------------------------------------- +## D.2 Why the 2026-06-09 single-consumer path "worked" (and why we retired it) +-------------------------------------------------------------------------------- + +On 2026-06-09 the cluster was created by ADMIN, scoped to the admin-owned capi-mgmt +project. Admin trivially holds (or cloud-admin-bypasses) every role it delegates to +itself, so create_trust never exercised the delegation constraint. That is a +SINGLE-CONSUMER shortcut: one privileged operator standing in for the tenant. It +proves the driver/CAPI plumbing but NOT the multi-tenant identity path, because in +the real product the cluster creator is a TENANT, not the cloud operator. + +The admin-in-capi-mgmt attempt on 2026-07-01 then 403'd at create_trust because that +mixed scope (admin user, capi-mgmt project) is not a clean delegatable-role identity +on capi-mgmt -- and, under D-064, admin scoped to capi-mgmt is a RESTRICTED identity +there (it is not cloud_admin outside the admin domain; `list_role_assignments` 403s +in that scope, confirmed live). It is the wrong identity for the tenant model on two +counts: it is the operator, and its token roles are not the tenant delegatable set. + +-------------------------------------------------------------------------------- +## D.3 The multi-tenant rule (what identity must create the cluster) +-------------------------------------------------------------------------------- + +RULE: a Magnum cluster is created by the TENANT's own project-scoped identity, whose +token carries EXACTLY the delegatable tenant roles -- `member` and +`load-balancer_member` (and `reader` where used) -- and NOT `admin`. + +Rationale, straight from D.1.B: + - The trust delegates `context.roles`. If the creator's token carries `admin`, + magnum tries to delegate `admin` into the trust; Keystone refuses a trust that + grants a role the trustor does not properly hold as a delegatable project grant, + and even if it did, delegating `admin` into a long-lived cluster credential is a + privilege-escalation footgun (the trustee impersonates the trustor with + impersonation=True). The tenant set (member + load-balancer_member) is the + correct, least-privilege delegation. + - `load-balancer_member` MUST be in the creator's token: the magnum-capi-helm + driver provisions an Octavia LB for the apiserver, and the trust must carry + Octavia authority or CAPO 403s at LB reconcile (D-039). This is exactly why + D-039 grants the trustor `load-balancer_member` on the cluster project. + - `member` provides the compute/network/volume authority the cluster's CCM/CSI + need via the trust. + +WHO THIS IS, per the onboarding model (tenant-onboarding-runbook Stage 2/4): + - The tenant's SERVICE identity: `-ci` / `-svc`, holding + `member` + `load-balancer_member` on `-prod`, authenticating with its + UNRESTRICTED application credential (the app cred is required so the driver can + mint the per-cluster CAPO child cred -- D-039 / onboarding Stage 4). + - Equivalently a tenant human user with `member` + `load-balancer_member` on the + project, but the service/app-cred identity is the production path (Jenkins/CI). + +The operator (admin / cloud_admin) does NOT create tenant clusters. The capi-mgmt +project is the MANAGEMENT-plane project (where the CAPI mgmt cluster VM and the +operator's own D-039 roles live for the mgmt cluster itself); tenant clusters are +created in the TENANT's project by the TENANT's identity. + +-------------------------------------------------------------------------------- +## D.4 Trustor role-set validation (run before the create) +-------------------------------------------------------------------------------- + +Confirm the creating identity's TOKEN carries the delegatable set and nothing that +cannot be delegated. Run AS the tenant creator identity (app cred or password): + + # as the tenant service identity, project-scoped to -prod + openstack token issue -f value -c user_id -c project_id # confirm scope + # roles in THIS token == what magnum will delegate (context.roles): + openstack role assignment list --user \ + --project --effective --names -f value -c Role | sort + +GATE: the role set is a subset of { member, load-balancer_member, reader }, and +INCLUDES load-balancer_member. If `admin` appears, this is the wrong identity -- +do not create with it. + +Note: a tenant/app-cred identity cannot run `role assignment list` for other users +(policy 403, by design). Query only its own assignment, or read it as admin +beforehand during onboarding. + +-------------------------------------------------------------------------------- +## D.5 The create (tenant identity), and the trust it produces +-------------------------------------------------------------------------------- + + # authenticate as the tenant service identity via its app cred (onboarding Stage 4) + # OS_AUTH_TYPE=v3applicationcredential + the app cred id/secret from the 0600 file + # then, project-scoped to the tenant project: + openstack coe cluster create \ + --cluster-template \ + --keypair \ + --master-count 1 --node-count 2 + + # verify the trust was created and carries the tenant roles: + openstack coe cluster show -f value -c status -c trustee_user_id + # status -> CREATE_IN_PROGRESS (past trustee+trust), NOT CREATE_FAILED at ~3s. + +Expected: create_user (D-064) AND create_trust both pass, because the creator is the +trustor and its token roles (member + load-balancer_member) are cleanly delegatable +on the tenant project. The driver then proceeds to helm/CAPI provisioning. + +-------------------------------------------------------------------------------- +## D.6 Roosevelt +-------------------------------------------------------------------------------- + + - Cluster-create is a TENANT self-service operation, performed by the tenant's + app-cred identity carrying member + load-balancer_member on the tenant project. + Wire it into the tenant CI (Jenkins) path (onboarding Stage 7), never the + operator admin. + - Optionally pin `CONF.trust.roles = member,load-balancer_member` in magnum.conf + (via the D-037 conf.d mechanism) to make the delegated set EXPLICIT and + independent of whatever roles happen to be in the caller's token -- a hardening + that removes the "wrong token roles" failure mode entirely. Decide as a tracked + item; unset (inherit context.roles) is the upstream default and works when the + creator identity is correct. + - The management-plane capi-mgmt project + the operator's D-039 roles there remain + for the MGMT cluster; they are not the tenant cluster-create path. + +-------------------------------------------------------------------------------- +## D.7 Open validation item +-------------------------------------------------------------------------------- + +This appendix establishes the model from the magnum source and the identity design. +The live behavioral confirmation on THIS cloud -- create a cluster as a tenant +app-cred identity (member + load-balancer_member) and observe create_trust succeed -- +is the acceptance step, and folds into onboarding Stage 7 (currently [PENDING]) and +the D-011 gate. Until run, D.3 is design-derived-from-source, not yet live-verified +on the multi-tenant path. (UPDATE 2026-07-01: onboarding Stages 1-4 VALIDATED live as tenant acme -- manager +self-service, app-cred cluster-creator with member+load-balancer_member, tenant L3. Stage 5 +template = corrected-pending (image-by-UUID). Stage 6 create_trust = the outstanding item; +the create_user half (D-064) is confirmed live.) diff --git a/runbooks/tenant-onboarding-v2-DRAFT.md b/runbooks/tenant-onboarding-v2-DRAFT.md new file mode 100644 index 0000000..cdda314 --- /dev/null +++ b/runbooks/tenant-onboarding-v2-DRAFT.md @@ -0,0 +1,451 @@ +# Tenant Onboarding v2 -- multi-tenant self-service + cluster creation (DRAFT) + +STATUS: DRAFT, built 2026-07-01 from the live multi-tenant validation run (tenant `acme`). +SUPERSEDES the 2026-06-22 tenant-onboarding-runbook.md identity model: that draft used an +`admin`-on-domain tenant administrator (which needed an out-of-band bandaid and could not +self-service reliably). This v2 uses the SCS Domain Manager persona (`manager` role) per +D-051 / D-064, validated end-to-end via CLI this session. + +VALIDATION LEGEND (honesty markers -- do not finalize past what is proven): + [VALIDATED 2026-07-01] confirmed live this session (output captured) + [CORRECTED-PENDING] a failure was root-caused and the corrected block is staged, but + the corrected form has NOT been re-run yet + [PROCEDURE-PENDING] design-derived (appendix-D from magnum source); NOT yet live-verified + +DESIGN REFS: D-051 (Domain Manager persona), D-064 (policy reconciliation to scs-0302 + +create-op templating fix), D-039 (per-cluster app-cred carries load-balancer_member), +appendix-C (identity/RBAC reference), appendix-D (Magnum cluster-create trust model). + +-------------------------------------------------------------------------------- +## Validation status (this draft) +-------------------------------------------------------------------------------- + +| Stage | What | Status | +|-------|--------------------------------------------------|--------| +| 0 | Operator pre-flight (perimeters exist) | [VALIDATED 2026-07-01] | +| 1 | Operator provisions domain + manager + quotas | [VALIDATED 2026-07-01] | +| 2 | Manager self-services project + svc-user + grants| [VALIDATED 2026-07-01] (D-064 G3) | +| 2.5 | Tenant isolation (anti-escalation + cross-domain)| [VALIDATED 2026-07-01] (+ finding, below) | +| 3 | Service user mints app cred + keypair | [VALIDATED 2026-07-01] | +| 4 | Tenant builds L3 (net/subnet/router/ext-gw) | [VALIDATED 2026-07-01] | +| 5 | Tenant creates its own cluster template | [CORRECTED-PENDING] (image-by-UUID fix staged) | +| 6/7 | Tenant creates cluster (trustee + trust) | [PROCEDURE-PENDING] (appendix-D; trust step not yet confirmed) | + +FINDING (Stage 2.5d, 2026-07-01): on THIS deployment the manager's `domain list` returns +ONLY its own domain -- keystone scope-filters the result even though the D-064 policy +authorizes `list_domains` for a manager. So the "manager can enumerate all domain names +cloud-wide" limitation documented in appendix-C / D-064 does NOT manifest here; isolation is +TIGHTER than documented. appendix-C must be corrected to state the observed own-domain-only +behavior (see the appendix-C change in this package). + +-------------------------------------------------------------------------------- +## Model (operator provides vs. tenant self-services) -- v2 +-------------------------------------------------------------------------------- + +OPERATOR PROVIDES (minimal): + - A Keystone DOMAIN per client. + - ONE domain-admin account holding the `manager` role on that domain (SCS Domain Manager). + - Domain/project quotas (the envelope). + - The shared perimeters: public flavor catalog, public Magnum-ready image, external network + (provider-ext) + FIP pool, the Vault root CA. + +TENANT SELF-SERVICES (via the manager, then a manager-created service identity): + - Their own projects, users, and role assignments WITHIN their domain (member + + load-balancer_member only -- `manager` cannot grant admin or manager; anti-escalation). + - Their own application credential (the cluster-creator identity). + - Their own network / subnet / router / external gateway. + - Their own Magnum cluster template (visible by OWNERSHIP -- created in their project). + - Their own clusters. + +TRUST BOUNDARY: the operator never holds the tenant's working credentials. The manager owns +its domain's users (including resetting the service user's password). Admin never mutates +tenant resources. + +CONVENTIONS (carried; all exercised this session): + - env-clean before every identity switch: `for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done` + - OS_CACERT MUST be threaded into every session (Vault-issued keystone cert). A stripped CA + yields an opaque SSL error -- the Stage 2 pre-auth guard fails loud instead. + - Subshell-isolate identity switches `( ... )` so the operator admin shell is untouched. + - Dynamic resolution only -- never hardcode ids (domain/project/image regenerate per rebuild). + - Verify-before-mutate; capture real command output and test the RESULT (not `head||echo`). + - Secrets straight to 0600 files under $HOME (snap confinement: never /tmp). + - ASCII-only committed files; LF endings. + +AS-RUN REFERENCE (tenant `acme`, 2026-07-01 -- ids are per-run, shown for traceability only): + domain acme=7b65248e33e041c78793b7d0939ef631; project acme-prod=780fa2f0761541ba8bc283c346b6af4d; + svc-user acme-svc=af73b67aa8b24b07904f4d463c1528b2; app-cred=930b0b027b0e465f89f04ef53c4db18c + (unrestricted, 86-char secret); net=193a8915..., subnet 10.20.24.0/24, router SNAT 10.12.7.194. + +================================================================================ +## STAGE 0 -- Operator pre-flight (READ-ONLY) [VALIDATED 2026-07-01] +================================================================================ +Confirms the shared perimeters exist and the D-064 override is live BEFORE onboarding. + +--- BEGIN block: onboard-v2-00-preflight (RUN -- jumphost, admin) --- +for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done +source ~/admin-openrc +CLIENT=; fail=0 + +# D-064 override MUST be live (manager self-service depends on it) +juju status keystone -m openstack --format=yaml 2>/dev/null | python3 -c ' +import sys,yaml; m=yaml.safe_load(sys.stdin)["applications"]["keystone"]["units"]["keystone/0"].get("workload-status",{}).get("message","") +print("keystone:",m); sys.exit(0 if m.startswith("PO:") else 1)' || { echo "*** keystone not PO: active -- STOP ***"; fail=1; } + +# roles present +for R in manager member load-balancer_member reader; do + openstack role show "$R" -f value -c id &1 | grep -qE '^[0-9a-f]{32}$' \ + && echo "role $R ok" || { echo "*** role $R MISSING ***"; fail=1; } +done + +# public Magnum-ready image (kube_version + os_distro + public) +IMG=$(openstack image list --public -f value -c ID -c Name &1 | awk '/kube/{print $1;exit}') +openstack image show "$IMG" -f json &1 | python3 -c ' +import sys,json; d=json.load(sys.stdin); p=d.get("properties",{}) +ok=d.get("visibility")=="public" and p.get("kube_version") and p.get("os_distro") +print("image:",d.get("name"),d.get("visibility"),p.get("kube_version"),p.get("os_distro")) +sys.exit(0 if ok else 1)' || { echo "*** kube image not public / missing props -- STOP ***"; fail=1; } + +# a Magnum-capable public flavor (>=2 vcpu, >=2048 MB) +openstack flavor list --public -f json &1 | python3 -c ' +import sys,json; fs=json.load(sys.stdin) +ok=[f for f in fs if (f.get("VCPUs") or 0)>=2 and (f.get("RAM") or 0)>=2048] +print("magnum-capable flavors:",[f["Name"] for f in ok]); sys.exit(0 if ok else 1)' || fail=1 + +# external net + FIP capacity (note: ip availability columns print alphabetically -- total then used) +openstack network show provider-ext -f value -c id &1 | grep -qE '^[0-9a-f-]{36}$' \ + && echo "provider-ext ok" || { echo "*** provider-ext missing ***"; fail=1; } + +# clean slate for this client +openstack domain show "$CLIENT" -f value -c id &1 | grep -qE '^[0-9a-f]{32}$' \ + && { echo "*** $CLIENT domain EXISTS -- decide reuse/clean ***"; fail=1; } || echo "no $CLIENT domain (clean)" + +echo "=== PRE-FLIGHT $([ $fail -eq 0 ] && echo PASS || echo FAIL) ===" +--- END block --- +GATE: PRE-FLIGHT PASS. + +================================================================================ +## STAGE 1 -- Operator provisions domain + manager + quotas [VALIDATED 2026-07-01] +================================================================================ +The ONLY operator-side tenant provisioning. Everything after is tenant self-service. + +--- BEGIN block: onboard-v2-01-operator-domain (RUN -- jumphost, admin) --- +for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done +source ~/admin-openrc +CLIENT=; fail=0 + +openstack domain create --description "Client: ${CLIENT}" "$CLIENT" /dev/null 2>&1 +DOM=$(openstack domain show "$CLIENT" -f value -c id &1) +echo "$DOM" | grep -qE '^[0-9a-f]{32}$' && echo "domain $CLIENT id=$DOM" || { echo "*** domain FAIL ***"; fail=1; } + +# manager account (NOT admin -- admin is not safely domain-confinable; the persona is `manager`) +MPW=$(python3 -c 'import secrets;print(secrets.token_urlsafe(24))') +openstack user create --domain "$DOM" --password "$MPW" \ + --description "${CLIENT} domain manager (SCS Domain Manager; D-051/D-064)" \ + "${CLIENT}-domain-admin" /dev/null 2>&1 +MUID=$(openstack user show "${CLIENT}-domain-admin" --domain "$DOM" -f value -c id &1) +echo "$MUID" | grep -qE '^[0-9a-f]{32}$' && echo "manager user id=$MUID" || { echo "*** user FAIL ***"; fail=1; } + +openstack role add --domain "$DOM" --user "$MUID" manager &1 +# confine check: EXACTLY one assignment (manager on this domain), nothing else +echo "manager assignments (expect exactly: manager on $CLIENT):" +openstack role assignment list --user "$MUID" --names -f value &1 | sed 's/^/ /' + +# stash the manager credential (tenant handoff) -> 0600 file +MF="$HOME/${CLIENT}-domain-admin-cred.txt"; umask 077; : > "$MF"; chmod 600 "$MF" +printf 'domain=%s\ndomain_id=%s\nusername=%s-domain-admin\nuser_id=%s\npassword=%s\nauth_url=https://:5000/v3\n' \ + "$CLIENT" "$DOM" "$CLIENT" "$MUID" "$MPW" > "$MF"; chmod 600 "$MF"; unset MPW +echo "credential -> $MF" + +# quotas (the envelope) -- span nova/neutron/cinder; set explicitly (documents the record) +openstack quota set "${CLIENT}-prod" --instances 10 --cores 20 --ram 51200 /dev/null || true +# NOTE: run quota AFTER the manager creates the project (Stage 2), OR set on the project once it exists. +echo "=== STAGE 1 $([ $fail -eq 0 ] && echo PASS || echo FAIL) ===" +--- END block --- +GATE: STAGE 1 PASS -- domain created, manager holds EXACTLY `manager` on the domain (no +other assignment), credential stashed 0600. + +DELIVER TO CLIENT: Horizon URL, domain name, `-domain-admin` + password, Vault CA. +Nothing else. (Quotas: set on `-prod` after Stage 2 creates it, or pre-create the +project as operator; the 2026-07-01 run set quotas post-project.) + +================================================================================ +## STAGE 2 -- Manager self-service (project + service user + grants) [VALIDATED 2026-07-01] +================================================================================ +This is the D-064 G3 acceptance: the manager performs, VIA CLI, exactly the identity +operations the pre-D-064 policy rejected (create_project / create_user / create_grant), and +is correctly DENIED admin-grant and cross-domain access. + +--- BEGIN block: onboard-v2-02-manager-selfservice (RUN -- jumphost, AS manager) --- +CLIENT= +MF="$HOME/${CLIENT}-domain-admin-cred.txt" +CA="$HOME/vault-init/vault-ca-root.pem" # confirm current path from admin-openrc +[ -s "$MF" ] || { echo "missing $MF -- run Stage 1"; return 2>/dev/null||exit 1; } + +( for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done + DOM=$(awk -F= '/^domain_id=/{print $2}' "$MF") + export OS_AUTH_URL="https://:5000/v3" OS_IDENTITY_API_VERSION=3 OS_CACERT="$CA" + export OS_USERNAME="${CLIENT}-domain-admin" OS_USER_DOMAIN_ID="$DOM" OS_DOMAIN_ID="$DOM" + export OS_PASSWORD="$(awk -F= '/^password=/{print $2}' "$MF")" + fail=0 + + # HARDENING GUARD: TLS trust material present before ANY call (fails loud, not opaque SSL) + [ -n "$OS_CACERT" ] && [ -s "$OS_CACERT" ] || { echo "*** OS_CACERT unset/missing -- STOP ***"; exit 3; } + openssl x509 -in "$OS_CACERT" -noout -checkend 0 >/dev/null 2>&1 || { echo "*** CA expired/unreadable -- STOP ***"; exit 3; } + + scope=$(openstack token issue -f value -c domain_id &1) + [ "$scope" = "$DOM" ] && echo "manager authenticated, domain-scoped" || { echo "*** auth FAIL: $scope ***"; exit 1; } + + # 2.1 create_project (PASS) + openstack project create --domain "$DOM" --description "${CLIENT} production" "${CLIENT}-prod" /dev/null 2>&1 + PID=$(openstack project show "${CLIENT}-prod" --domain "$DOM" -f value -c id &1) + echo "$PID" | grep -qE '^[0-9a-f]{32}$' && echo "project ${CLIENT}-prod=$PID (create_project PASS)" || { echo "*** create_project FAIL ***"; fail=1; } + + # 2.2 create_user (service account) (PASS) + SPW=$(python3 -c 'import secrets;print(secrets.token_urlsafe(24))') + openstack user create --domain "$DOM" --password "$SPW" --description "${CLIENT} CI/service (cluster creator)" "${CLIENT}-svc" /dev/null 2>&1 + SUID=$(openstack user show "${CLIENT}-svc" --domain "$DOM" -f value -c id &1) + echo "$SUID" | grep -qE '^[0-9a-f]{32}$' && echo "user ${CLIENT}-svc=$SUID (create_user PASS)" || { echo "*** create_user FAIL ***"; fail=1; } + + # 2.3 grant member + load-balancer_member on the project (create_grant) -- capture RESULT, not exit + for R in member load-balancer_member; do + openstack role add --project "$PID" --user "$SUID" "$R" &1 + openstack role assignment list --project "$PID" --user "$SUID" --names -f value -c Role &1 | grep -qw "$R" \ + && echo "granted $R" || { echo "*** grant $R FAIL ***"; fail=1; } + done + + # 2.4 anti-escalation: admin grant MUST be denied (verify it did NOT take -- ground truth, not error text) + openstack role add --project "$PID" --user "$SUID" admin /dev/null 2>&1 + if openstack role assignment list --project "$PID" --user "$SUID" --names -f value -c Role &1 | grep -qw admin; then + echo "*** ESCALATION: manager granted admin -- POLICY FAILURE, STOP ***"; fail=1 + else echo "admin grant DENIED (anti-escalation holds)"; fi + + echo "=== STAGE 2 $([ $fail -eq 0 ] && echo PASS || echo FAIL) ===" + unset SPW OS_PASSWORD +) +--- END block --- +GATE: STAGE 2 PASS -- 2.1/2.2/2.3 succeed, 2.4 DENIED. + +NOTE (service-user password): the Stage-2 password is ephemeral inside the subshell. Stage 3 +resets it (as the MANAGER) to a fresh 0600-stashed value -- the manager owns its domain's users. + +--- BEGIN block: onboard-v2-02b-isolation (CHECK read-only -- AS manager) --- +# Cross-domain isolation. Resolve REAL other-domain targets AS ADMIN first (prove they exist), +# then confirm the manager is refused. Keystone returns "does not exist" for cross-scope reads +# (no-enumeration-oracle design) -- treat does-not-exist on a proven-real resource as isolation-holding. +source ~/admin-openrc +OTHER_DOM=$(openstack domain show admin_domain -f value -c id &1) +OTHER_USER=$(openstack user list --domain "$OTHER_DOM" -f value -c ID &1 | head -1) +OTHER_PROJ=$(openstack project show admin --domain admin_domain -f value -c id &1) +( for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done + DOM=$(awk -F= '/^domain_id=/{print $2}' "$MF") + export OS_AUTH_URL="https://:5000/v3" OS_IDENTITY_API_VERSION=3 OS_CACERT="$CA" + export OS_USERNAME="${CLIENT}-domain-admin" OS_USER_DOMAIN_ID="$DOM" OS_DOMAIN_ID="$DOM" + export OS_PASSWORD="$(awk -F= '/^password=/{print $2}' "$MF")" + refused(){ echo "$1" | grep -qiE 'forbidden|not authorized|403|could not be found|does not exist|No .* exists|HTTP 40[34]'; } + out=$(openstack user show "$OTHER_USER" &1); refused "$out" && echo "cross-domain user read DENIED/hidden" || echo "*** GAP ***" + out=$(openstack project show "$OTHER_PROJ" &1); refused "$out" && echo "admin project read DENIED/hidden" || echo "*** GAP ***" + echo "manager domain list (observed own-domain-only on this cloud, 2026-07-01):" + openstack domain list -f value -c Name &1 | sed 's/^/ /' ) +--- END block --- +GATE: both cross-domain reads DENIED/hidden; domain list shows only the client's own domain +(the appendix-C names-only-leak does NOT manifest here -- tighter isolation; see appendix-C fix). + +================================================================================ +## STAGE 3 -- Service user mints app cred + keypair (cluster-creator identity) [VALIDATED 2026-07-01] +================================================================================ +`-svc` is the cluster creator. Its token roles are EXACTLY member + load-balancer_member +(from Stage 2.3) -- the clean delegatable set the trust needs (appendix-D). App cred MUST be +unrestricted (the driver mints a per-cluster CAPO child cred; D-039). Secrets -> 0600 files. + +--- BEGIN block: onboard-v2-03-appcred-keypair (RUN -- jumphost) --- +CLIENT=; CA="$HOME/vault-init/vault-ca-root.pem" +MF="$HOME/${CLIENT}-domain-admin-cred.txt"; SF="$HOME/${CLIENT}-svc-cred.txt"; ACF="$HOME/${CLIENT}-svc-appcred.txt" +source ~/admin-openrc +DOM=$(openstack domain show "$CLIENT" -f value -c id &1) +PID=$(openstack project show "${CLIENT}-prod" --domain "$DOM" -f value -c id &1) +SUID=$(openstack user show "${CLIENT}-svc" --domain "$DOM" -f value -c id &1) + +# 3.1 manager sets the svc password -> 0600 (manager owns its domain's users; admin does NOT) +SPW=$(python3 -c 'import secrets;print(secrets.token_urlsafe(24))') +( for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done + export OS_AUTH_URL="https://:5000/v3" OS_IDENTITY_API_VERSION=3 OS_CACERT="$CA" + export OS_USERNAME="${CLIENT}-domain-admin" OS_USER_DOMAIN_ID="$DOM" OS_DOMAIN_ID="$DOM" + export OS_PASSWORD="$(awk -F= '/^password=/{print $2}' "$MF")" + openstack user set --password "$SPW" "$SUID" &1 && echo "svc password set" || echo "*** set FAIL ***" ) +umask 077; : > "$SF"; chmod 600 "$SF" +printf 'username=%s-svc\nuser_id=%s\nuser_domain_id=%s\nproject_id=%s\nauth_url=https://:5000/v3\npassword=%s\n' \ + "$CLIENT" "$SUID" "$DOM" "$PID" "$SPW" > "$SF"; chmod 600 "$SF"; unset SPW + +# 3.2 svc self-mints UNRESTRICTED app cred (project-scoped) +( for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done + export OS_AUTH_URL="https://:5000/v3" OS_IDENTITY_API_VERSION=3 OS_CACERT="$CA" + export OS_USERNAME="${CLIENT}-svc" OS_USER_DOMAIN_ID="$DOM" OS_PROJECT_ID="$PID" + export OS_PASSWORD="$(awk -F= '/^password=/{print $2}' "$SF")" + chk=$(openstack token issue -f value -c project_id &1) + [ "$chk" = "$PID" ] && echo "svc authenticated, project-scoped" || { echo "*** svc auth FAIL: $chk ***"; exit 1; } + umask 077; : > "$ACF"; chmod 600 "$ACF" + openstack application credential create "${CLIENT}-cluster-cred" --unrestricted \ + --description "${CLIENT} cluster-creator" -f shell "$ACF" 2>&1 + grep -qE '^id=' "$ACF" && { chmod 600 "$ACF"; echo "app cred minted -> $ACF"; \ + awk -F'"' '/^secret=/{print " secret length (measured): "length($2)}' "$ACF"; \ + grep -E '^unrestricted=|^project_id=' "$ACF" | sed 's/^/ /'; } || { echo "*** appcred FAIL ***"; cat "$ACF"; } ) + +# 3.3 svc creates keypair -> 0600 +( for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done + export OS_AUTH_URL="https://:5000/v3" OS_IDENTITY_API_VERSION=3 OS_CACERT="$CA" + export OS_USERNAME="${CLIENT}-svc" OS_USER_DOMAIN_ID="$DOM" OS_PROJECT_ID="$PID" + export OS_PASSWORD="$(awk -F= '/^password=/{print $2}' "$SF")" + KF="$HOME/${CLIENT}-key.pem"; umask 077 + openstack keypair create "${CLIENT}-key" "$KF" 2>&1 + head -1 "$KF" | grep -q 'PRIVATE KEY' && { chmod 600 "$KF"; echo "keypair -> $KF"; } || { echo "*** keypair FAIL ***"; cat "$KF"; } ) +--- END block --- +GATE: app cred unrestricted, project_id = -prod, secret length measured (86 on this +cloud -- do NOT assert; measure); keypair present, 0600. + +================================================================================ +## STAGE 4 -- Tenant builds L3 (net/subnet/router/ext-gw) [VALIDATED 2026-07-01] +================================================================================ +The tenant (app-cred identity) self-serves its own L3, INCLUDING the external gateway. +FINDING (2026-07-01): a non-admin app-cred identity CAN set the external gateway on this cloud +(confirms the onboarding Stage-5 finding for the automation identity, not just a Horizon human). + +--- BEGIN block: onboard-v2-04-network (RUN -- jumphost; L3 as app cred, checks as admin) --- +CLIENT=; CA="$HOME/vault-init/vault-ca-root.pem"; ACF="$HOME/${CLIENT}-svc-appcred.txt" +TENANT_CIDR=10.20..0/24 # pick from the tenant pool; MUST NOT collide (checked below) + +# 4.0 CIDR collision pre-check (operator IPAM concern; read-only as admin) +source ~/admin-openrc +if openstack subnet list -f value -c Subnet &1 | grep -qw "$TENANT_CIDR"; then + echo "*** $TENANT_CIDR IN USE -- pick another, STOP ***"; COLL=1 +else echo "$TENANT_CIDR free"; COLL=0; fi + +# 4.1-4.5 build L3 as the app-cred identity +[ "$COLL" = 0 ] && ( for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done + export OS_AUTH_TYPE=v3applicationcredential OS_AUTH_URL="https://:5000/v3" OS_IDENTITY_API_VERSION=3 OS_CACERT="$CA" + export OS_APPLICATION_CREDENTIAL_ID=$(awk -F'"' '/^id=/{print $2}' "$ACF") + export OS_APPLICATION_CREDENTIAL_SECRET=$(awk -F'"' '/^secret=/{print $2}' "$ACF") + openstack token issue -f value -c project_id &1 | grep -qE '^[0-9a-f]{32}$' || { echo "*** app-cred auth FAIL ***"; exit 1; } + openstack network create "${CLIENT}-net" /dev/null 2>&1 && echo "net ok" + openstack subnet create "${CLIENT}-subnet" --network "${CLIENT}-net" --subnet-range "$TENANT_CIDR" --dns-nameserver 8.8.8.8 /dev/null 2>&1 && echo "subnet ok" + openstack router create "${CLIENT}-router" /dev/null 2>&1 && echo "router ok" + openstack router set "${CLIENT}-router" --external-gateway provider-ext &1 && echo "ext-gw set" || echo "*** ext-gw FAIL (operator may need to attach) ***" + openstack router add subnet "${CLIENT}-router" "${CLIENT}-subnet" &1 && echo "interface added" ) + +# 4.6 verify SNAT (proof egress) -- read-only as admin +source ~/admin-openrc +P=$(openstack project show "${CLIENT}-prod" --domain "$CLIENT" -f value -c id &1) +RID=$(openstack router list --project "$P" -f value -c ID &1 | head -1) +openstack router show "$RID" -f json &1 | python3 -c ' +import sys,json; d=json.load(sys.stdin); g=d.get("external_gateway_info") or {} +print("router",d.get("name"),d.get("status"),"snat",g.get("enable_snat"), + "snat_ip",(g.get("external_fixed_ips") or [{}])[0].get("ip_address","none"))' +--- END block --- +GATE: router ACTIVE, snat=True, snat_ip allocated from provider-ext. + +================================================================================ +## STAGE 5 -- Tenant creates its OWN cluster template [CORRECTED-PENDING] +================================================================================ +Templates are visible by OWNERSHIP -- the tenant creates its own in its project (it cannot use +another project's private template). IMAGE PASSED BY UUID (not name): a name is subject to a +quoting/resolution hazard -- the first Stage-5 attempt 2026-07-01 failed with +`Cluster type (vm, Unset, kubernetes) not supported` because a doubled-quoted name resolved to +no image, so magnum could not derive the type. UUID removes the failure surface entirely. +STATUS: the corrected (UUID) block below is staged but was NOT re-run before this draft. + +--- BEGIN block: onboard-v2-05-template (RUN -- jumphost; template as app cred) --- +CLIENT=; CA="$HOME/vault-init/vault-ca-root.pem"; ACF="$HOME/${CLIENT}-svc-appcred.txt" +source ~/admin-openrc +IMG_ID=$(openstack image list --public -f value -c ID -c Name &1 | awk '/kube/{print $1;exit}') +echo "$IMG_ID" | grep -qE '^[0-9a-f-]{36}$' || { echo "*** image uuid resolve FAIL -- STOP ***"; } + +( for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done + export OS_AUTH_TYPE=v3applicationcredential OS_AUTH_URL="https://:5000/v3" OS_IDENTITY_API_VERSION=3 OS_CACERT="$CA" + export OS_APPLICATION_CREDENTIAL_ID=$(awk -F'"' '/^id=/{print $2}' "$ACF") + export OS_APPLICATION_CREDENTIAL_SECRET=$(awk -F'"' '/^secret=/{print $2}' "$ACF") + openstack token issue -f value -c project_id &1 | grep -qE '^[0-9a-f]{32}$' || { echo "*** auth FAIL ***"; exit 1; } + # idempotent pre-clean + openstack coe cluster template show "${CLIENT}-k8s" -f value -c uuid /dev/null 2>&1 \ + && openstack coe cluster template delete "${CLIENT}-k8s" &1 + openstack coe cluster template create "${CLIENT}-k8s" \ + --image "$IMG_ID" \ + --external-network provider-ext \ + --master-flavor gp.mid --flavor capi.node \ + --coe kubernetes --network-driver calico --docker-storage-driver overlay2 \ + --master-lb-enabled --floating-ip-enabled \ + --fixed-network "${CLIENT}-net" --fixed-subnet "${CLIENT}-subnet" \ + --keypair "${CLIENT}-key" &1 + TID=$(openstack coe cluster template show "${CLIENT}-k8s" -f value -c uuid &1) + echo "$TID" | grep -qE '^[0-9a-f-]{36}$' && echo "template ${CLIENT}-k8s=$TID" || echo "*** template FAIL ***" ) +--- END block --- +GATE: template created, coe=kubernetes, network_driver=calico, master_lb+floating_ip enabled, +image_id = the public kube image. +OPEN QUESTION (one variable at a time): the `--fixed-network/--fixed-subnet` pin is the strict +tenant-isolation posture. If the corrected create fails on the network params (image now moot), +drop those two flags and let the capi-helm driver manage the cluster network -- and record which +model this driver expects. + +================================================================================ +## STAGE 6/7 -- Tenant creates the cluster (trustee + trust) [PROCEDURE-PENDING] +================================================================================ +THE MULTI-TENANT TRUST TEST. Design basis: appendix-D (magnum/common/keystone.py, read live). +create_trust delegates `context.roles` (the CALLER's token roles) from the caller (trustor) to +the per-cluster trustee. The caller MUST be the tenant service identity whose token carries +EXACTLY member + load-balancer_member -- NOT admin (a trust cannot delegate a role the trustor +does not hold; and delegating admin is a privilege-escalation footgun). This is why the creator +is `-svc` via app cred, not the operator. + +STATUS: NOT YET LIVE-VERIFIED on the multi-tenant path as of this draft. D-064 fixed the +`create_user` step (trustee user creation -- confirmed live earlier this session). The +`create_trust` step under a clean tenant identity is the specific thing this stage confirms. +If it 403s despite a clean delegatable-role identity, that is a genuine finding (look at the +conductor's trust-session construction), NOT a policy gap -- do not loosen create_trust. + +--- BEGIN block: onboard-v2-06-cluster-create (RUN -- jumphost; create as app cred) --- +CLIENT=; CA="$HOME/vault-init/vault-ca-root.pem"; ACF="$HOME/${CLIENT}-svc-appcred.txt" + +# 6.0 mark conductor log (numeric-or-STOP guard) +source ~/admin-openrc +MARK=$(juju ssh -m openstack magnum/0 'sudo cat /var/log/magnum/magnum-conductor.log | wc -l' /dev/null | tr -dc '0-9') +[ -n "$MARK" ] || { echo "MARK empty -- STOP"; return 2>/dev/null||exit 1; } +echo "MARK=$MARK" + +# 6.1/6.2 create as the tenant app-cred identity +( for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done + export OS_AUTH_TYPE=v3applicationcredential OS_AUTH_URL="https://:5000/v3" OS_IDENTITY_API_VERSION=3 OS_CACERT="$CA" + export OS_APPLICATION_CREDENTIAL_ID=$(awk -F'"' '/^id=/{print $2}' "$ACF") + export OS_APPLICATION_CREDENTIAL_SECRET=$(awk -F'"' '/^secret=/{print $2}' "$ACF") + openstack token issue -f value -c project_id &1 | grep -qE '^[0-9a-f]{32}$' || { echo "*** auth FAIL ***"; exit 1; } + openstack coe cluster create "${CLIENT}-cluster" --cluster-template "${CLIENT}-k8s" \ + --keypair "${CLIENT}-key" --master-count 1 --node-count 1 &1 + sleep 12 + openstack coe cluster show "${CLIENT}-cluster" -f value -c uuid -c status -c status_reason &1 | sed 's/^/ /' ) + +# 6.3 conductor log since MARK -- trustee + trust outcome (the verdict) +juju ssh -m openstack magnum/0 "sudo tail -n +$((MARK+1)) /var/log/magnum/magnum-conductor.log 2>/dev/null | grep -iE 'trustee|create_user|create_trust|403|forbidden|created trust|CREATE_|ERROR' | tail -30" /dev/null +--- END block --- +GATE (expected if the model holds): status CREATE_IN_PROGRESS (not a ~3s CREATE_FAILED); log +shows trustee created and NO create_trust 403; driver proceeds to helm/CAPI. Then watch to +CREATE_COMPLETE (phase-08 Step 8.2 pattern) and verify nodes/CNI/CCM (Step 8.3). + +================================================================================ +## Changes folded from the 2026-07-01 session +================================================================================ +- Identity model: `manager` persona (D-051/D-064) replaces `admin`-on-domain (2026-06-22). + Manager CLI self-service VALIDATED (G3) -- the 2026-06-22 out-of-band bandaid is retired. +- appendix-C correction: manager domain-enumeration is own-domain-only on this cloud (the + documented names-only cloud-wide leak does NOT manifest); isolation tighter than documented. +- Cluster-creator identity + trust model documented in appendix-D (magnum source-derived). +- Template: create in owner project; image by UUID (quoting/resolution hazard). +- Hardening throughout: OS_CACERT pre-auth guard; numeric MARK guard; capture-and-test-result + (not head||echo); subshell isolation; dynamic id resolution; secrets 0600 under $HOME. + +## Open items (before this DRAFT becomes VALIDATED) +1. Re-run Stage 5 (corrected UUID form) -- confirm template creates. +2. Run Stage 6 -- confirm create_trust succeeds under the tenant identity (or capture the + finding if it does not). THIS IS THE OUTSTANDING TRUST VALIDATION. +3. Clean-room pass ("beta"): operate from ONLY the handed-over tenant credentials (zero + admin fallback), logging every point where admin is currently used for a read -- classify + each as legitimate operator-perimeter vs. a tenant-accessible lookup. +4. On completion: fold this into tenant-onboarding-runbook.md (Stage 2 rewrite + Stage 7 fill), + commit appendix-D, apply the appendix-C correction, and assign the D-06x number for the + manager-persona-validated onboarding model.