Newer
Older
openstack-caracal-ipv4 / runbooks / appendix-D-magnum-trust-model.md

Appendix D -- Magnum cluster-create trust model (multi-tenant)

Fills the gap the onboarding runbook Stage 7 marks [PENDING]: exactly which identity creates a Magnum cluster, and why the Keystone trust delegation constrains that choice. Grounded in the magnum source (magnum/common/keystone.py, read live 2026-07-01) and the D-039 / D-051 / D-064 identity model. Supersedes the single-consumer shortcut used on 2026-06-09 (admin creates in the admin-owned capi-mgmt project), which sidesteps -- rather than exercises -- the trust constraint and therefore does NOT validate the tenant path.


D.1 What magnum does at cluster-create (the mechanism)


Two Keystone writes happen before any infrastructure is touched (magnum/conductor/handlers/common/trust_manager.py -> create_trustee_and_trust):

  1. create_trustee -> identity:create_user Magnum's trustee_domain_admin (magnum_domain_admin, Admin on the magnum domain) creates a per-cluster service user in the magnum domain. This is the step D-064 unblocked (the create_user policy templating fix). VALIDATED live 2026-07-01: trustee user is created successfully.

  2. create_trust -> identity:create_trust Magnum creates a Keystone trust delegating the CALLER's roles to that trustee. From magnum/common/keystone.py:

    def create_trust(self, trustee_user):
        trustor_user_id   = self.session.get_user_id()      # the CALLER's user
        trustor_project_id = self.session.get_project_id()  # the CALLER's project
        if CONF.trust.roles:
            roles = CONF.trust.roles      # (unset on this deploy)
            else:
            roles = self.context.roles    # -> the roles in the CALLER's token
        self.client.trusts.create(
            trustor_user=trustor_user_id, project=trustor_project_id,
            trustee_user=trustee_user, impersonation=True, role_names=roles)

Two facts follow directly from that code, and they are the whole model:

A. The TRUSTOR is the identity that issued openstack coe cluster create (self.session is the request-context client). The Keystone policy identity:create_trust = "user_id:%(trust.trustor_user_id)s" is therefore satisfied by construction -- caller == trustor. (So the create_trust 403 is NOT a trustor-identity policy failure.)

B. The DELEGATED ROLES are self.context.roles -- the roles present in the CALLER's token on trustor_project_id. Keystone's create_trust REFUSES to delegate any role the trustor does not actually hold on that project (a trust cannot grant more than the trustor has). CONF.trust.roles is unset here, so magnum delegates the caller's token roles verbatim -- whatever they are.


D.2 Why the 2026-06-09 single-consumer path "worked" (and why we retired it)


On 2026-06-09 the cluster was created by ADMIN, scoped to the admin-owned capi-mgmt project. Admin trivially holds (or cloud-admin-bypasses) every role it delegates to itself, so create_trust never exercised the delegation constraint. That is a SINGLE-CONSUMER shortcut: one privileged operator standing in for the tenant. It proves the driver/CAPI plumbing but NOT the multi-tenant identity path, because in the real product the cluster creator is a TENANT, not the cloud operator.

The admin-in-capi-mgmt attempt on 2026-07-01 then 403'd at create_trust because that mixed scope (admin user, capi-mgmt project) is not a clean delegatable-role identity on capi-mgmt -- and, under D-064, admin scoped to capi-mgmt is a RESTRICTED identity there (it is not cloud_admin outside the admin domain; list_role_assignments 403s in that scope, confirmed live). It is the wrong identity for the tenant model on two counts: it is the operator, and its token roles are not the tenant delegatable set.


D.3 The multi-tenant rule (what identity must create the cluster)


RULE: a Magnum cluster is created by the TENANT's own project-scoped identity, whose token carries EXACTLY the delegatable tenant roles -- member and load-balancer_member (and reader where used) -- and NOT admin.

Rationale, straight from D.1.B:

  • The trust delegates context.roles. If the creator's token carries admin, magnum tries to delegate admin into the trust; Keystone refuses a trust that grants a role the trustor does not properly hold as a delegatable project grant, and even if it did, delegating admin into a long-lived cluster credential is a privilege-escalation footgun (the trustee impersonates the trustor with impersonation=True). The tenant set (member + load-balancer_member) is the correct, least-privilege delegation.
  • load-balancer_member MUST be in the creator's token: the magnum-capi-helm driver provisions an Octavia LB for the apiserver, and the trust must carry Octavia authority or CAPO 403s at LB reconcile (D-039). This is exactly why D-039 grants the trustor load-balancer_member on the cluster project.
  • member provides the compute/network/volume authority the cluster's CCM/CSI need via the trust.

WHO THIS IS, per the onboarding model (tenant-onboarding-runbook Stage 2/4):

  • The tenant's SERVICE identity: <client>-ci / <client>-svc, holding member + load-balancer_member on <client>-prod, authenticating with its UNRESTRICTED application credential (the app cred is required so the driver can mint the per-cluster CAPO child cred -- D-039 / onboarding Stage 4).
  • Equivalently a tenant human user with member + load-balancer_member on the project, but the service/app-cred identity is the production path (Jenkins/CI).

The operator (admin / cloud_admin) does NOT create tenant clusters. The capi-mgmt project is the MANAGEMENT-plane project (where the CAPI mgmt cluster VM and the operator's own D-039 roles live for the mgmt cluster itself); tenant clusters are created in the TENANT's project by the TENANT's identity.


D.4 Trustor role-set validation (run before the create)


Confirm the creating identity's TOKEN carries the delegatable set and nothing that cannot be delegated. Run AS the tenant creator identity (app cred or password):

# as the tenant service identity, project-scoped to <client>-prod
openstack token issue -f value -c user_id -c project_id   # confirm scope
# roles in THIS token == what magnum will delegate (context.roles):
openstack role assignment list --user <this-user-id> \
  --project <tenant-project-id> --effective --names -f value -c Role | sort

GATE: the role set is a subset of { member, load-balancer_member, reader }, and INCLUDES load-balancer_member. If admin appears, this is the wrong identity -- do not create with it.

Note: a tenant/app-cred identity cannot run role assignment list for other users (policy 403, by design). Query only its own assignment, or read it as admin beforehand during onboarding.


D.5 The create (tenant identity), and the trust it produces


# authenticate as the tenant service identity via its app cred (onboarding Stage 4)
#   OS_AUTH_TYPE=v3applicationcredential + the app cred id/secret from the 0600 file
# then, project-scoped to the tenant project:
openstack coe cluster create <cluster-name> \
  --cluster-template <tenant-template> \
  --keypair <tenant-key> \
  --master-count 1 --node-count 2

# verify the trust was created and carries the tenant roles:
openstack coe cluster show <cluster-name> -f value -c status -c trustee_user_id
#   status -> CREATE_IN_PROGRESS (past trustee+trust), NOT CREATE_FAILED at ~3s.

Expected: create_user (D-064) AND create_trust both pass, because the creator is the trustor and its token roles (member + load-balancer_member) are cleanly delegatable on the tenant project. The driver then proceeds to helm/CAPI provisioning.


D.6 Roosevelt


  • Cluster-create is a TENANT self-service operation, performed by the tenant's app-cred identity carrying member + load-balancer_member on the tenant project. Wire it into the tenant CI (Jenkins) path (onboarding Stage 7), never the operator admin.
  • Optionally pin CONF.trust.roles = member,load-balancer_member in magnum.conf (via the D-037 conf.d mechanism) to make the delegated set EXPLICIT and independent of whatever roles happen to be in the caller's token -- a hardening that removes the "wrong token roles" failure mode entirely. Decide as a tracked item; unset (inherit context.roles) is the upstream default and works when the creator identity is correct.
  • The management-plane capi-mgmt project + the operator's D-039 roles there remain for the MGMT cluster; they are not the tenant cluster-create path.

D.7 Open validation item


This appendix establishes the model from the magnum source and the identity design. The live behavioral confirmation on THIS cloud -- create a cluster as a tenant app-cred identity (member + load-balancer_member) and observe create_trust succeed -- is the acceptance step, and folds into onboarding Stage 7 (currently [PENDING]) and the D-011 gate. Until run, D.3 is design-derived-from-source, not yet live-verified on the multi-tenant path. (UPDATE 2026-07-01: onboarding Stages 1-4 VALIDATED live as tenant acme -- manager self-service, app-cred cluster-creator with member+load-balancer_member, tenant L3. Stage 5 template = corrected-pending (image-by-UUID). Stage 6 create_trust = the outstanding item; the create_user half (D-064) is confirmed live.)