diff --git a/docs/design-decisions.md b/docs/design-decisions.md index f5d2a02..45452c2 100644 --- a/docs/design-decisions.md +++ b/docs/design-decisions.md @@ -1084,3 +1084,78 @@ **Related:** D-035 (the mgmt VM), D-039 (per-cluster app-creds), phase-06 (SG creation), phase-07 Step 7.1 (DOCFIX-063 verify-first). + + +--- + +## D-064: Reconcile D-051 to the SCS Domain Manager standard (scs-0302); fix create-op policy templating + +**Status:** ADOPTED 2026-07-01 (phase-08 keystone-policy blocker). Reconciles D-051 to the +authoritative SCS reference and discharges D-051's [LIVE-READ PENDING] base_* alignment gate. +Behavioral acceptance = Magnum trustee create + tenant G3 (below). + +**Context / trigger:** phase-08 workload-cluster create failed at the FIRST step -- +create_trustee_and_trust -> identity:create_user HTTP 403. magnum_domain_admin holds Admin on +the magnum domain (correct per Magnum docs + D-046) and was authenticated (401 would be auth; +403 = authorization). Root cause: this cloud's charm-rendered /etc/keystone/policy.json +(old-style defaults, enforce_scope=False) defines the create-op helper +admin_and_matching_user_domain_id with legacy templating domain_id:%(user.domain_id)s, which +does NOT resolve for create_user on Caracal (keystone populates target.user.domain_id, not the +bare user.domain_id). The admin-on-domain create branch silently evaluates false and only +cloud_admin passes -- the same defect that blocked tenant domain-admin self-service in the +2026-06-22 northwind rehearsal. One bug, two symptoms (Magnum trustee + tenant self-service). + +**Finding (why D-051 alone did not fix it):** D-051 deliberately deviated from scs-0302 -- it +dropped Section A (base_*) and pointed fallthroughs at the LIVE helpers by name, including the +broken admin_and_matching_user_domain_id, and dropped the `or rule:admin_required` tail. +Consequently D-051's create_user fallthrough inherited the broken helper; only its manager +branch (correct token.domain.id:%(target.user.domain_id)s templating) worked. Attaching D-051 +as-staged would have fixed tenant self-service (manager) but NOT Magnum (admin-on-domain trustee). + +**Decision:** reconcile to scs-0302 by correcting the create-op fallthroughs to the cloud's +CORRECT %(target.*.domain_id)s helpers (mirroring the standard's base_create_*) and complete the +base_* alignment against the now fully-read live policy. Exactly 7 rules change: +- create_user / create_project / create_group: fallthrough admin_and_matching__domain_id + (broken %(.domain_id)s) -> admin_and_matching_target__domain_id (correct + %(target..domain_id)s). BUG-FIX; restores the documented admin-on-domain create path and + unblocks Magnum. Confined (domain match); does NOT touch tenants (they hold manager, not + domain-admin). +- add_user_to_group: admin_and_matching_group_domain_id -> admin_and_matching_target_group_domain_id. + ALIGNMENT (restore live default; D-051 had drifted to the broken helper). +- list_users / list_projects / list_groups: fallthrough admin_required (D-051 conservative + [PENDING-LIVE-READ] guess, BROADER than live) -> cloud_admin or admin_and_matching_domain_id + (the live default). ALIGNMENT; behavior-preserving; removes an unintended widening. +list_roles keeps admin_required (its live default). The manager persona (Section B), +is_domain_manager = role:manager, and is_domain_managed_role = member + load-balancer_member +(no admin, no manager -- anti-escalation) are UNCHANGED from D-051 / scs-0302. + +**Consequence:** with create_user's admin fallthrough fixed, magnum_domain_admin creates +per-cluster trustees via the standard admin-on-domain path -- NO manager grant on the magnum +service account is required (an in-session Option A now moot). Magnum keeps its documented +trustee setup; the policy fix alone unblocks it. + +**Validation:** oslo.policy parses all 37 rules (the charm validates YAML only -- a malformed +rule passes YAML and silently no-ops, cf. D-046 "reports OK while broken"); YAML + ASCII clean; +malformed-connector lint clean; every fallthrough tail diffed against the live effective default. +Behavioral gate (the real acceptance): (a) Magnum -- re-run phase-08 8.1, trustee create_user +passes, cluster converges; (b) tenant G3 -- a manager-on-domain user self-services user/project ++ member/load-balancer_member within its domain (PASS), is denied admin/cross-domain (DENY), +cloud-admin unaffected. + +**Known SCS limitation (carried, per standard):** the manager persona grants list_domains and +list_roles cloud-wide (needed to resolve domain/role names). A tenant manager can ENUMERATE +domain names/ids + role names (not access resources). Inherent to the scs-0302 transitional +policy (upstream RBAC-scoping of domain list is a pending fix); the native 2024.2 persona closes +it. On upgrade to 2024.2+ this override MUST be removed (scs-0302 + D-051 caution). + +**Mechanism:** zip overrides.zip domain-manager-policy.yaml -> juju attach-resource keystone +policyd-override=overrides.zip (use-policyd-override already true). keystone PO (broken) -> PO:. +Rollback: juju config keystone use-policyd-override=false. Full attach + G3 procedure in +runbooks/appendix-C-identity-rbac.md. + +**Roosevelt:** single portable policy file replicated per-DC keystone; the attach + G3 gate +becomes a deploy step. Remove on any 2024.2+ upgrade. + +**Related:** D-051 (reconciled here), D-046 (magnum trustee domain), D-039 (per-cluster app-cred +roles), D-050 (resolved by supplying the zip), scs-0302-w1 (the authoritative standard), +appendix-C (identity/RBAC reference). **Supersedes:** D-051's [LIVE-READ PENDING] gate (discharged). diff --git a/docs/v1-redeploy-changelog.md b/docs/v1-redeploy-changelog.md index e6cbce6..b32e51a 100644 --- a/docs/v1-redeploy-changelog.md +++ b/docs/v1-redeploy-changelog.md @@ -1278,7 +1278,38 @@ and /v2.0 inputs -> /v3 in both sections); install -d precedes the tee; api-versions probe; the sha256-mismatch gate fails loud; k8s_capi_helm_v1 enabled gate; precondition failures. ALL PASS. +## 2026-07-01 -- phase-08 keystone-policy blocker -> D-064 (reconcile D-051 to scs-0302) + +phase-08 8.1 cluster create failed at create_trustee_and_trust: identity:create_user 403 +(magnum_domain_admin has Admin on the magnum domain, authenticated -> authorization gap, not +auth). Root cause: charm-rendered old-style policy.json defines the create-op helper +admin_and_matching_user_domain_id with legacy domain_id:%(user.domain_id)s templating that does +not resolve for create_user on Caracal (keystone populates target.user.domain_id). Same defect +blocked the 2026-06-22 northwind tenant self-service rehearsal. D-051 as-staged would NOT have +fixed Magnum (its create_user admin fallthrough inherited the broken helper; only its manager +branch worked). + +Reconciled the staged policies/domain-manager-policy.yaml to scs-0302 (D-064). 7 rules changed: +- create_user / create_project / create_group: broken %(.domain_id)s helper -> + %(target..domain_id)s helper (BUG-FIX; unblocks Magnum via documented admin-on-domain path). +- add_user_to_group: -> admin_and_matching_target_group_domain_id (ALIGN; restore live default; + D-051 drift). +- list_users / list_projects / list_groups: admin_required (conservative, broader than live) -> + cloud_admin or admin_and_matching_domain_id (ALIGN to live default; discharges D-051 + [LIVE-READ PENDING]). list_roles kept at admin_required (its live default). +Manager persona (is_domain_manager=role:manager; is_domain_managed_role=member + +load-balancer_member) unchanged. Consequence: NO manager grant on magnum_domain_admin needed. +Validated: oslo.policy parses all 37 rules (charm checks YAML only); YAML+ASCII+connector lint +clean; every fallthrough tail diffed vs the live effective policy. Behavioral gate pending live +attach: Magnum 8.1 re-run + tenant G3. New reference: runbooks/appendix-C-identity-rbac.md +(role/policy/account assignment tables + attach + G3 procedure). Known SCS limitation carried: +manager can enumerate domain + role names cloud-wide (list_domains/list_roles); remove override +on any 2024.2+ upgrade. + ### Next-free numbers -Design decision: D-064. Doc fix: DOCFIX-064. (D-063 ASSIGNED above = capi-mgmt-sg 0.0.0.0/0 -hardening, PROPOSED/OPEN. DOCFIX-063 ASSIGNED above = phase-07 runbook reconciliation, six fixes. -Prior: DOCFIX-062 phase-06 kubeconfig-server-rewrite; D-061 teardown, D-062 mysql.) +Design decision: D-065. Doc fix: DOCFIX-065. (D-064 ASSIGNED above = reconcile D-051 to scs-0302 ++ create-op templating fix. DOCFIX-064 RESERVED = phase-08 runbook sweep (image --public; seed +retry/timeout + poll hard-gate + post-active property re-verify; image-absent guard; template +capi-mgmt scope preamble + flavor floor; 8.1 D-039 role + keypair pre-checks; octavia prereq +real-exit capture), to be written at phase-08 close. D-063 = capi-mgmt-sg 0.0.0.0/0 hardening, +PROPOSED/OPEN. DOCFIX-063 = phase-07 reconciliation, six fixes.) diff --git a/policies/domain-manager-policy.yaml b/policies/domain-manager-policy.yaml index 667275b..893b9df 100644 --- a/policies/domain-manager-policy.yaml +++ b/policies/domain-manager-policy.yaml @@ -81,17 +81,17 @@ # --- Users (manager branch + verbatim live default) --- # [PENDING-LIVE-READ] list_users default not explicit in dump -> conservative admin_required -"identity:list_users": "(rule:is_domain_manager and token.domain.id:%(target.domain_id)s) or rule:admin_required" +"identity:list_users": "(rule:is_domain_manager and token.domain.id:%(target.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_domain_id" "identity:get_user": "(rule:is_domain_manager and token.domain.id:%(target.user.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_target_user_domain_id or rule:owner" -"identity:create_user": "(rule:is_domain_manager and token.domain.id:%(target.user.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_user_domain_id" +"identity:create_user": "(rule:is_domain_manager and token.domain.id:%(target.user.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_target_user_domain_id" "identity:update_user": "(rule:is_domain_manager and token.domain.id:%(target.user.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_target_user_domain_id" "identity:delete_user": "(rule:is_domain_manager and token.domain.id:%(target.user.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_target_user_domain_id" # --- Projects (manager branch + verbatim live default) --- # [PENDING-LIVE-READ] list_projects default not explicit in dump -> conservative admin_required -"identity:list_projects": "(rule:is_domain_manager and token.domain.id:%(target.domain_id)s) or rule:admin_required" +"identity:list_projects": "(rule:is_domain_manager and token.domain.id:%(target.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_domain_id" "identity:get_project": "(rule:is_domain_manager and token.domain.id:%(target.project.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_target_project_domain_id or project_id:%(target.project.id)s" -"identity:create_project": "(rule:is_domain_manager and token.domain.id:%(target.project.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_project_domain_id" +"identity:create_project": "(rule:is_domain_manager and token.domain.id:%(target.project.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_target_project_domain_id" "identity:update_project": "(rule:is_domain_manager and token.domain.id:%(target.project.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_target_project_domain_id" "identity:delete_project": "(rule:is_domain_manager and token.domain.id:%(target.project.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_target_project_domain_id" "identity:list_user_projects": "(rule:is_domain_manager and token.domain.id:%(target.user.domain_id)s) or rule:owner or rule:admin_and_matching_domain_id" @@ -105,13 +105,13 @@ # --- Groups (manager branch + verbatim live default) --- # [PENDING-LIVE-READ] list_groups default not explicit in dump -> conservative admin_required -"identity:list_groups": "(rule:is_domain_manager and token.domain.id:%(target.group.domain_id)s) or rule:admin_required" +"identity:list_groups": "(rule:is_domain_manager and token.domain.id:%(target.group.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_domain_id" "identity:get_group": "(rule:is_domain_manager and token.domain.id:%(target.group.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_target_group_domain_id" -"identity:create_group": "(rule:is_domain_manager and token.domain.id:%(target.group.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_group_domain_id" +"identity:create_group": "(rule:is_domain_manager and token.domain.id:%(target.group.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_target_group_domain_id" "identity:update_group": "(rule:is_domain_manager and token.domain.id:%(target.group.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_target_group_domain_id" "identity:delete_group": "(rule:is_domain_manager and token.domain.id:%(target.group.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_target_group_domain_id" "identity:list_groups_for_user": "(rule:is_domain_manager and token.domain.id:%(target.user.domain_id)s) or rule:owner or rule:admin_and_matching_target_user_domain_id" "identity:list_users_in_group": "(rule:is_domain_manager and token.domain.id:%(target.group.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_target_group_domain_id" "identity:remove_user_from_group": "(rule:is_domain_manager and token.domain.id:%(target.group.domain_id)s and token.domain.id:%(target.user.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_target_group_domain_id" "identity:check_user_in_group": "(rule:is_domain_manager and token.domain.id:%(target.group.domain_id)s and token.domain.id:%(target.user.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_target_group_domain_id" -"identity:add_user_to_group": "(rule:is_domain_manager and token.domain.id:%(target.group.domain_id)s and token.domain.id:%(target.user.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_group_domain_id" +"identity:add_user_to_group": "(rule:is_domain_manager and token.domain.id:%(target.group.domain_id)s and token.domain.id:%(target.user.domain_id)s) or rule:cloud_admin or rule:admin_and_matching_target_group_domain_id" diff --git a/runbooks/appendix-C-identity-rbac.md b/runbooks/appendix-C-identity-rbac.md new file mode 100644 index 0000000..5c8a95a --- /dev/null +++ b/runbooks/appendix-C-identity-rbac.md @@ -0,0 +1,117 @@ +# Appendix C -- Identity / RBAC reference + +Authoritative reference for the cloud's identity model: which keystone roles and personas exist, +what each is for, and which accounts should receive which assignments. Governed by D-051 (SCS +Domain Manager persona) as reconciled by D-064 (scs-0302 alignment + create-op templating fix). + +The persona is delivered by a keystone policy override (`policies/domain-manager-policy.yaml`) +attached as a `policyd-override` resource. Provenance: SCS standard scs-0302-w1 (Domain Manager). +This is a TRANSITIONAL policy for pre-2024.2 keystone; on any upgrade to 2024.2+ the override MUST +be removed in favour of the native domain-manager persona (see "Removal" below). + +Do NOT hand-assign roles ad hoc. Provision accounts per the tables below so the model stays +auditable and the tenant isolation guarantees hold. + +--- + +## C.1 Role / persona catalog + +| Role / persona | Keystone rule | What it is for | Typical scope | Notes | +|----------------|---------------|----------------|---------------|-------| +| Admin / admin | `admin_required` (role:Admin); becomes `cloud_admin` when scoped to the admin domain/project | Full cloud authority (as cloud_admin), or domain-scoped admin over a single service domain | admin domain + admin project (cloud_admin); or a service domain (e.g. magnum) | "Admin" and "admin" are the same role (case-insensitive alias). After D-064 an admin-on-a-domain can create users/projects in THAT domain. NOT given to tenants -- an admin role is not safely domain-confinable; tenants get `manager` instead. | +| manager (SCS Domain Manager) | `is_domain_manager` = role:manager, plus the D-051/D-064 override | Domain-confined IAM self-service: create/manage users, projects, groups, and role assignments WITHIN the manager's own domain | a tenant's domain | Requires the policy override attached. Can assign ONLY the roles in `is_domain_managed_role` (below); cannot grant `admin` or `manager` (anti-escalation). | +| member | standard project role | Operate project resources (compute, network, volumes, images) | a project | The baseline tenant working role. | +| load-balancer_member | Octavia project role | Create / manage Octavia load balancers | a project | Required for tenant LBs and for Magnum (the cluster apiserver LB). Must be held by the Magnum trustor on capi-mgmt (D-039). | +| reader | standard read role | Read-only visibility | project or domain | Optional. Part of the Magnum trustor role set on capi-mgmt (D-039). | + +Manager-assignable roles (`is_domain_managed_role`): **member** and **load-balancer_member** only. +This is the anti-escalation boundary -- a compromised or careless domain manager cannot mint an +admin or another manager, and cannot reach outside its own domain. + +--- + +## C.2 Account -> role assignment + +| Account | Role(s) | Scope | Why / reference | +|---------|---------|-------|-----------------| +| admin (operator super-admin) | Admin | admin domain + admin project (= cloud_admin) | Cloud operator; full authority. Bootstrap identity. | +| admin (as Magnum trustor) | member + load-balancer_member + reader | capi-mgmt project | So the app-cred Magnum mints per cluster carries Octavia authority for the apiserver LB (D-039). These are the frozen trustor roles delegated into each cluster trust. | +| magnum_domain_admin | Admin | magnum domain | Magnum trustee domain admin; creates the per-cluster trustee USER at cluster-create (D-046; Magnum docs). Works via the D-064 create-op fix -- no extra grant needed. Recreated by the `domain-setup` charm action after every teardown/redeploy (D-046). | +| -domain-admin | manager | the tenant's domain | SCS Domain Manager persona (D-051/D-064). Operator provisions the domain + this one account; the tenant self-services users, projects, and member/load-balancer_member grants from there. | +| human users | member (+ load-balancer_member if they use LBs or Magnum) | the tenant's project(s) | Created and assigned by the tenant's own domain-manager via Horizon/CLI. Operator is not in the loop. | +| -ci / service accounts | member + load-balancer_member | the tenant's project | Backing identity for the application credential that CI/automation authenticates with. load-balancer_member so tenant CI can drive Magnum/LBs. | +| per-cluster trustee | (delegated via trust -- not a direct grant) | -- | Magnum mints this at cluster-create and deletes it at cluster-delete. It carries the trustor's frozen roles through the trust (D-039). Never assign roles to it by hand. | + +Provisioning direction: the operator creates a tenant's DOMAIN and its single `manager` account, +then hands off. Everything below the domain (users, projects, member/LB grants) is tenant +self-service. This is the whole point of the persona -- it removes the operator from routine +tenant IAM while keeping a hard domain boundary. + +--- + +## C.3 Attaching the policy override + +Prerequisite: keystone deployed with `use-policyd-override=true` (already set in the bundle). +Note D-050: `use-policyd-override=true` with NO resource attached is a silent no-op -- the zip +must actually be supplied, or the override does nothing while reporting healthy. + +Pre-attach re-check on the jumphost (from the repo working copy), then attach: + +``` +# 1) validate the file before attaching (YAML + ASCII + connector lint) +python3 -c 'import yaml,sys; d=yaml.safe_load(open("policies/domain-manager-policy.yaml")); print("YAML OK:",len(d),"rules")' +LC_ALL=C grep -nP "[^\x00-\x7F]" policies/domain-manager-policy.yaml && { echo "NON-ASCII -- STOP"; exit 1; } || echo "ASCII clean" +grep -nE "\)\s+(rule:|role:|token|project_id|domain_id|user_id|None:)" policies/domain-manager-policy.yaml && { echo "MALFORMED connector -- STOP"; exit 1; } || echo "connector lint clean" + +# 2) package and attach (the zip's top-level file name is what keystone reads) +cd policies && zip -j /tmp/overrides.zip domain-manager-policy.yaml && cd - +juju attach-resource -m openstack keystone policyd-override=/tmp/overrides.zip +``` + +Gate: `juju status -m openstack keystone` must move from `PO (broken)` to `PO:` and settle +active/idle, with no other charm disturbed. If the charm rejects the file it stays `PO (broken)`. + +Rollback (immediate): `juju config -m openstack keystone use-policyd-override=false` +(the override stops applying; keystone reverts to shipped defaults on the next hook). + +The charm validates YAML only. It does NOT parse the oslo.policy rule grammar, so a syntactically +malformed rule can pass validation and silently no-op. Always run the oslo.policy parse in the +sandbox before delivering a change to this file. + +--- + +## C.4 Tenant self-service validation (G3) + +Run after the override is active. Confirms the persona works AND is properly bounded. + +PASS cases (a `manager`-on-domain account, scoped to its own domain, must succeed): +- create a user in its own domain +- create a project in its own domain +- grant `member` and `load-balancer_member` to a user on a project in its own domain + +DENY cases (the same account must be refused): +- grant `admin` or `manager` to anyone (anti-escalation) +- create/read/modify anything in a DIFFERENT domain (cross-domain isolation) + +Unaffected: +- cloud_admin (the operator admin) retains full authority everywhere + +Only when all three groups hold is the persona accepted for that release. + +--- + +## C.5 Known limitations (carried from scs-0302) + +- A domain manager can enumerate ALL domain names/ids (`list_domains`) and ALL role names + (`list_roles`) cloud-wide. This is names/ids only -- no access to other domains' resources -- + and is required for the manager to resolve domains/roles by name. It is inherent to the + pre-2024.2 transitional policy; upstream RBAC-scoping of domain listing is a pending fix. +- The persona relies on `enforce_scope=False` (old-style policy). It is a bridge, not the + destination. + +## C.6 Removal (on upgrade to keystone 2024.2+) + +2024.2 ships a native domain-manager persona and secure-RBAC scope enforcement. On that upgrade: +detach the `policyd-override` resource (or set `use-policyd-override=false`), adopt the native +persona, and retire this file. Leaving the old-style override in place on a secure-RBAC keystone +is unsupported and will conflict. Tracked by D-051/D-064.