diff --git a/runbooks/appendix-D-magnum-trust-model.md b/runbooks/appendix-D-magnum-trust-model.md new file mode 100644 index 0000000..f161763 --- /dev/null +++ b/runbooks/appendix-D-magnum-trust-model.md @@ -0,0 +1,173 @@ +# Appendix D -- Magnum cluster-create trust model (multi-tenant) + +Fills the gap the onboarding runbook Stage 7 marks [PENDING]: exactly which identity +creates a Magnum cluster, and why the Keystone trust delegation constrains that choice. +Grounded in the magnum source (magnum/common/keystone.py, read live 2026-07-01) and the +D-039 / D-051 / D-064 identity model. Supersedes the single-consumer shortcut used on +2026-06-09 (admin creates in the admin-owned capi-mgmt project), which sidesteps -- rather +than exercises -- the trust constraint and therefore does NOT validate the tenant path. + +-------------------------------------------------------------------------------- +## D.1 What magnum does at cluster-create (the mechanism) +-------------------------------------------------------------------------------- + +Two Keystone writes happen before any infrastructure is touched +(magnum/conductor/handlers/common/trust_manager.py -> create_trustee_and_trust): + +1. create_trustee -> `identity:create_user` + Magnum's trustee_domain_admin (magnum_domain_admin, Admin on the magnum domain) + creates a per-cluster service user in the magnum domain. This is the step D-064 + unblocked (the create_user policy templating fix). VALIDATED live 2026-07-01: + trustee user is created successfully. + +2. create_trust -> `identity:create_trust` + Magnum creates a Keystone trust delegating the CALLER's roles to that trustee. + From magnum/common/keystone.py: + + def create_trust(self, trustee_user): + trustor_user_id = self.session.get_user_id() # the CALLER's user + trustor_project_id = self.session.get_project_id() # the CALLER's project + if CONF.trust.roles: + roles = CONF.trust.roles # (unset on this deploy) + else: + roles = self.context.roles # -> the roles in the CALLER's token + self.client.trusts.create( + trustor_user=trustor_user_id, project=trustor_project_id, + trustee_user=trustee_user, impersonation=True, role_names=roles) + +Two facts follow directly from that code, and they are the whole model: + + A. The TRUSTOR is the identity that issued `openstack coe cluster create` + (`self.session` is the request-context client). The Keystone policy + `identity:create_trust = "user_id:%(trust.trustor_user_id)s"` is therefore + satisfied by construction -- caller == trustor. (So the create_trust 403 is + NOT a trustor-identity policy failure.) + + B. The DELEGATED ROLES are `self.context.roles` -- the roles present in the + CALLER's token on `trustor_project_id`. Keystone's create_trust REFUSES to + delegate any role the trustor does not actually hold on that project + (a trust cannot grant more than the trustor has). `CONF.trust.roles` is unset + here, so magnum delegates the caller's token roles verbatim -- whatever they are. + +-------------------------------------------------------------------------------- +## D.2 Why the 2026-06-09 single-consumer path "worked" (and why we retired it) +-------------------------------------------------------------------------------- + +On 2026-06-09 the cluster was created by ADMIN, scoped to the admin-owned capi-mgmt +project. Admin trivially holds (or cloud-admin-bypasses) every role it delegates to +itself, so create_trust never exercised the delegation constraint. That is a +SINGLE-CONSUMER shortcut: one privileged operator standing in for the tenant. It +proves the driver/CAPI plumbing but NOT the multi-tenant identity path, because in +the real product the cluster creator is a TENANT, not the cloud operator. + +The admin-in-capi-mgmt attempt on 2026-07-01 then 403'd at create_trust because that +mixed scope (admin user, capi-mgmt project) is not a clean delegatable-role identity +on capi-mgmt -- and, under D-064, admin scoped to capi-mgmt is a RESTRICTED identity +there (it is not cloud_admin outside the admin domain; `list_role_assignments` 403s +in that scope, confirmed live). It is the wrong identity for the tenant model on two +counts: it is the operator, and its token roles are not the tenant delegatable set. + +-------------------------------------------------------------------------------- +## D.3 The multi-tenant rule (what identity must create the cluster) +-------------------------------------------------------------------------------- + +RULE: a Magnum cluster is created by the TENANT's own project-scoped identity, whose +token carries EXACTLY the delegatable tenant roles -- `member` and +`load-balancer_member` (and `reader` where used) -- and NOT `admin`. + +Rationale, straight from D.1.B: + - The trust delegates `context.roles`. If the creator's token carries `admin`, + magnum tries to delegate `admin` into the trust; Keystone refuses a trust that + grants a role the trustor does not properly hold as a delegatable project grant, + and even if it did, delegating `admin` into a long-lived cluster credential is a + privilege-escalation footgun (the trustee impersonates the trustor with + impersonation=True). The tenant set (member + load-balancer_member) is the + correct, least-privilege delegation. + - `load-balancer_member` MUST be in the creator's token: the magnum-capi-helm + driver provisions an Octavia LB for the apiserver, and the trust must carry + Octavia authority or CAPO 403s at LB reconcile (D-039). This is exactly why + D-039 grants the trustor `load-balancer_member` on the cluster project. + - `member` provides the compute/network/volume authority the cluster's CCM/CSI + need via the trust. + +WHO THIS IS, per the onboarding model (tenant-onboarding-runbook Stage 2/4): + - The tenant's SERVICE identity: `-ci` / `-svc`, holding + `member` + `load-balancer_member` on `-prod`, authenticating with its + UNRESTRICTED application credential (the app cred is required so the driver can + mint the per-cluster CAPO child cred -- D-039 / onboarding Stage 4). + - Equivalently a tenant human user with `member` + `load-balancer_member` on the + project, but the service/app-cred identity is the production path (Jenkins/CI). + +The operator (admin / cloud_admin) does NOT create tenant clusters. The capi-mgmt +project is the MANAGEMENT-plane project (where the CAPI mgmt cluster VM and the +operator's own D-039 roles live for the mgmt cluster itself); tenant clusters are +created in the TENANT's project by the TENANT's identity. + +-------------------------------------------------------------------------------- +## D.4 Trustor role-set validation (run before the create) +-------------------------------------------------------------------------------- + +Confirm the creating identity's TOKEN carries the delegatable set and nothing that +cannot be delegated. Run AS the tenant creator identity (app cred or password): + + # as the tenant service identity, project-scoped to -prod + openstack token issue -f value -c user_id -c project_id # confirm scope + # roles in THIS token == what magnum will delegate (context.roles): + openstack role assignment list --user \ + --project --effective --names -f value -c Role | sort + +GATE: the role set is a subset of { member, load-balancer_member, reader }, and +INCLUDES load-balancer_member. If `admin` appears, this is the wrong identity -- +do not create with it. + +Note: a tenant/app-cred identity cannot run `role assignment list` for other users +(policy 403, by design). Query only its own assignment, or read it as admin +beforehand during onboarding. + +-------------------------------------------------------------------------------- +## D.5 The create (tenant identity), and the trust it produces +-------------------------------------------------------------------------------- + + # authenticate as the tenant service identity via its app cred (onboarding Stage 4) + # OS_AUTH_TYPE=v3applicationcredential + the app cred id/secret from the 0600 file + # then, project-scoped to the tenant project: + openstack coe cluster create \ + --cluster-template \ + --keypair \ + --master-count 1 --node-count 2 + + # verify the trust was created and carries the tenant roles: + openstack coe cluster show -f value -c status -c trustee_user_id + # status -> CREATE_IN_PROGRESS (past trustee+trust), NOT CREATE_FAILED at ~3s. + +Expected: create_user (D-064) AND create_trust both pass, because the creator is the +trustor and its token roles (member + load-balancer_member) are cleanly delegatable +on the tenant project. The driver then proceeds to helm/CAPI provisioning. + +-------------------------------------------------------------------------------- +## D.6 Roosevelt +-------------------------------------------------------------------------------- + + - Cluster-create is a TENANT self-service operation, performed by the tenant's + app-cred identity carrying member + load-balancer_member on the tenant project. + Wire it into the tenant CI (Jenkins) path (onboarding Stage 7), never the + operator admin. + - Optionally pin `CONF.trust.roles = member,load-balancer_member` in magnum.conf + (via the D-037 conf.d mechanism) to make the delegated set EXPLICIT and + independent of whatever roles happen to be in the caller's token -- a hardening + that removes the "wrong token roles" failure mode entirely. Decide as a tracked + item; unset (inherit context.roles) is the upstream default and works when the + creator identity is correct. + - The management-plane capi-mgmt project + the operator's D-039 roles there remain + for the MGMT cluster; they are not the tenant cluster-create path. + +-------------------------------------------------------------------------------- +## D.7 Open validation item +-------------------------------------------------------------------------------- + +This appendix establishes the model from the magnum source and the identity design. +The live behavioral confirmation on THIS cloud -- create a cluster as a tenant +app-cred identity (member + load-balancer_member) and observe create_trust succeed -- +is the acceptance step, and folds into onboarding Stage 7 (currently [PENDING]) and +the D-011 gate. Until run, D.3 is design-derived-from-source, not yet live-verified +on the multi-tenant path.