Newer
Older
openstack-caracal-ipv4 / runbooks / tenant-onboarding-v2-DRAFT.md

Tenant Onboarding v2 -- multi-tenant self-service + cluster creation (DRAFT)

STATUS: DRAFT, built 2026-07-01 from the live multi-tenant validation run (tenant acme). SUPERSEDES the 2026-06-22 tenant-onboarding-runbook.md identity model: that draft used an admin-on-domain tenant administrator (which needed an out-of-band bandaid and could not self-service reliably). This v2 uses the SCS Domain Manager persona (manager role) per D-051 / D-064, validated end-to-end via CLI this session.

VALIDATION LEGEND (honesty markers -- do not finalize past what is proven): [VALIDATED 2026-07-01] confirmed live this session (output captured) [CORRECTED-PENDING] a failure was root-caused and the corrected block is staged, but the corrected form has NOT been re-run yet [PROCEDURE-PENDING] design-derived (appendix-D from magnum source); NOT yet live-verified

DESIGN REFS: D-051 (Domain Manager persona), D-064 (policy reconciliation to scs-0302 + create-op templating fix), D-039 (per-cluster app-cred carries load-balancer_member), appendix-C (identity/RBAC reference), appendix-D (Magnum cluster-create trust model).


Validation status (this draft)


Stage What Status
0 Operator pre-flight (perimeters exist) [VALIDATED 2026-07-01]
1 Operator provisions domain + manager + quotas [VALIDATED 2026-07-01]
2 Manager self-services project + svc-user + grants [VALIDATED 2026-07-01] (D-064 G3)
2.5 Tenant isolation (anti-escalation + cross-domain) [VALIDATED 2026-07-01] (+ finding, below)
3 Service user mints app cred + keypair [VALIDATED 2026-07-01]
4 Tenant builds L3 (net/subnet/router/ext-gw) [VALIDATED 2026-07-01]
5 Tenant creates its own cluster template [CORRECTED-PENDING] (image-by-UUID fix staged)
6/7 Tenant creates cluster (trustee + trust) [PROCEDURE-PENDING] (appendix-D; trust step not yet confirmed)

FINDING (Stage 2.5d, 2026-07-01): on THIS deployment the manager's domain list returns ONLY its own domain -- keystone scope-filters the result even though the D-064 policy authorizes list_domains for a manager. So the "manager can enumerate all domain names cloud-wide" limitation documented in appendix-C / D-064 does NOT manifest here; isolation is TIGHTER than documented. appendix-C must be corrected to state the observed own-domain-only behavior (see the appendix-C change in this package).


Model (operator provides vs. tenant self-services) -- v2


OPERATOR PROVIDES (minimal):

  • A Keystone DOMAIN per client.
  • ONE domain-admin account holding the manager role on that domain (SCS Domain Manager).
  • Domain/project quotas (the envelope).
  • The shared perimeters: public flavor catalog, public Magnum-ready image, external network (provider-ext) + FIP pool, the Vault root CA.

TENANT SELF-SERVICES (via the manager, then a manager-created service identity):

  • Their own projects, users, and role assignments WITHIN their domain (member + load-balancer_member only -- manager cannot grant admin or manager; anti-escalation).
  • Their own application credential (the cluster-creator identity).
  • Their own network / subnet / router / external gateway.
  • Their own Magnum cluster template (visible by OWNERSHIP -- created in their project).
  • Their own clusters.

TRUST BOUNDARY: the operator never holds the tenant's working credentials. The manager owns its domain's users (including resetting the service user's password). Admin never mutates tenant resources.

CONVENTIONS (carried; all exercised this session):

  • env-clean before every identity switch: for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done
  • OS_CACERT MUST be threaded into every session (Vault-issued keystone cert). A stripped CA yields an opaque SSL error -- the Stage 2 pre-auth guard fails loud instead.
  • Subshell-isolate identity switches ( ... ) so the operator admin shell is untouched.
  • Dynamic resolution only -- never hardcode ids (domain/project/image regenerate per rebuild).
  • Verify-before-mutate; capture real command output and test the RESULT (not head||echo).
  • Secrets straight to 0600 files under $HOME (snap confinement: never /tmp).
  • ASCII-only committed files; LF endings.

AS-RUN REFERENCE (tenant acme, 2026-07-01 -- ids are per-run, shown for traceability only): domain acme=7b65248e33e041c78793b7d0939ef631; project acme-prod=780fa2f0761541ba8bc283c346b6af4d; svc-user acme-svc=af73b67aa8b24b07904f4d463c1528b2; app-cred=930b0b027b0e465f89f04ef53c4db18c (unrestricted, 86-char secret); net=193a8915..., subnet 10.20.24.0/24, router SNAT 10.12.7.194.

================================================================================

STAGE 0 -- Operator pre-flight (READ-ONLY) [VALIDATED 2026-07-01]

================================================================================ Confirms the shared perimeters exist and the D-064 override is live BEFORE onboarding.

--- BEGIN block: onboard-v2-00-preflight (RUN -- jumphost, admin) --- for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done source ~/admin-openrc CLIENT=; fail=0

D-064 override MUST be live (manager self-service depends on it)

juju status keystone -m openstack --format=yaml 2>/dev/null | python3 -c ' import sys,yaml; m=yaml.safe_load(sys.stdin)["applications"]["keystone"]["units"]["keystone/0"].get("workload-status",{}).get("message","") print("keystone:",m); sys.exit(0 if m.startswith("PO:") else 1)' || { echo " keystone not PO: active -- STOP "; fail=1; }

roles present

for R in manager member load-balancer_member reader; do openstack role show "$R" -f value -c id &1 | grep -qE '^[0-9a-f]{32}$'
&& echo "role $R ok" || { echo " role $R MISSING "; fail=1; } done

public Magnum-ready image (kube_version + os_distro + public)

IMG=$(openstack image list --public -f value -c ID -c Name &1 | awk '/kube/{print $1;exit}') openstack image show "$IMG" -f json &1 | python3 -c ' import sys,json; d=json.load(sys.stdin); p=d.get("properties",{}) ok=d.get("visibility")=="public" and p.get("kube_version") and p.get("os_distro") print("image:",d.get("name"),d.get("visibility"),p.get("kube_version"),p.get("os_distro")) sys.exit(0 if ok else 1)' || { echo " kube image not public / missing props -- STOP "; fail=1; }

a Magnum-capable public flavor (>=2 vcpu, >=2048 MB)

openstack flavor list --public -f json &1 | python3 -c ' import sys,json; fs=json.load(sys.stdin) ok=[f for f in fs if (f.get("VCPUs") or 0)>=2 and (f.get("RAM") or 0)>=2048] print("magnum-capable flavors:",[f["Name"] for f in ok]); sys.exit(0 if ok else 1)' || fail=1

external net + FIP capacity (note: ip availability columns print alphabetically -- total then used)

openstack network show provider-ext -f value -c id &1 | grep -qE '^[0-9a-f-]{36}$'
&& echo "provider-ext ok" || { echo " provider-ext missing "; fail=1; }

clean slate for this client

openstack domain show "$CLIENT" -f value -c id &1 | grep -qE '^[0-9a-f]{32}$'
&& { echo " $CLIENT domain EXISTS -- decide reuse/clean "; fail=1; } || echo "no $CLIENT domain (clean)"

echo "=== PRE-FLIGHT $([ $fail -eq 0 ] && echo PASS || echo FAIL) ===" --- END block --- GATE: PRE-FLIGHT PASS.

================================================================================

STAGE 1 -- Operator provisions domain + manager + quotas [VALIDATED 2026-07-01]

================================================================================ The ONLY operator-side tenant provisioning. Everything after is tenant self-service.

--- BEGIN block: onboard-v2-01-operator-domain (RUN -- jumphost, admin) --- for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done source ~/admin-openrc CLIENT=; fail=0

openstack domain create --description "Client: ${CLIENT}" "$CLIENT" /dev/null 2>&1 DOM=$(openstack domain show "$CLIENT" -f value -c id &1) echo "$DOM" | grep -qE '^[0-9a-f]{32}$' && echo "domain $CLIENT id=$DOM" || { echo " domain FAIL "; fail=1; }

manager account (NOT admin -- admin is not safely domain-confinable; the persona is manager)

MPW=$(python3 -c 'import secrets;print(secrets.token_urlsafe(24))') openstack user create --domain "$DOM" --password "$MPW"
--description "${CLIENT} domain manager (SCS Domain Manager; D-051/D-064)"
"${CLIENT}-domain-admin" /dev/null 2>&1 MUID=$(openstack user show "${CLIENT}-domain-admin" --domain "$DOM" -f value -c id &1) echo "$MUID" | grep -qE '^[0-9a-f]{32}$' && echo "manager user id=$MUID" || { echo " user FAIL "; fail=1; }

openstack role add --domain "$DOM" --user "$MUID" manager &1

confine check: EXACTLY one assignment (manager on this domain), nothing else

echo "manager assignments (expect exactly: manager on $CLIENT):" openstack role assignment list --user "$MUID" --names -f value &1 | sed 's/^/ /'

stash the manager credential (tenant handoff) -> 0600 file

MF="$HOME/${CLIENT}-domain-admin-cred.txt"; umask 077; : > "$MF"; chmod 600 "$MF" printf 'domain=%s\ndomain_id=%s\nusername=%s-domain-admin\nuser_id=%s\npassword=%s\nauth_url=https://:5000/v3\n'
"$CLIENT" "$DOM" "$CLIENT" "$MUID" "$MPW" > "$MF"; chmod 600 "$MF"; unset MPW echo "credential -> $MF"

quotas (the envelope) -- span nova/neutron/cinder; set explicitly (documents the record)

openstack quota set "${CLIENT}-prod" --instances 10 --cores 20 --ram 51200 /dev/null || true

NOTE: run quota AFTER the manager creates the project (Stage 2), OR set on the project once it exists.

echo "=== STAGE 1 $([ $fail -eq 0 ] && echo PASS || echo FAIL) ===" --- END block --- GATE: STAGE 1 PASS -- domain created, manager holds EXACTLY manager on the domain (no other assignment), credential stashed 0600.

DELIVER TO CLIENT: Horizon URL, domain name, <client>-domain-admin + password, Vault CA. Nothing else. (Quotas: set on <client>-prod after Stage 2 creates it, or pre-create the project as operator; the 2026-07-01 run set quotas post-project.)

================================================================================

STAGE 2 -- Manager self-service (project + service user + grants) [VALIDATED 2026-07-01]

================================================================================ This is the D-064 G3 acceptance: the manager performs, VIA CLI, exactly the identity operations the pre-D-064 policy rejected (create_project / create_user / create_grant), and is correctly DENIED admin-grant and cross-domain access.

--- BEGIN block: onboard-v2-02-manager-selfservice (RUN -- jumphost, AS manager) --- CLIENT= MF="$HOME/${CLIENT}-domain-admin-cred.txt" CA="$HOME/vault-init/vault-ca-root.pem" # confirm current path from admin-openrc [ -s "$MF" ] || { echo "missing $MF -- run Stage 1"; return 2>/dev/null||exit 1; }

( for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done DOM=$(awk -F= '/^domain_id=/{print $2}' "$MF") export OS_AUTH_URL="https://:5000/v3" OS_IDENTITY_API_VERSION=3 OS_CACERT="$CA" export OS_USERNAME="${CLIENT}-domain-admin" OS_USER_DOMAIN_ID="$DOM" OS_DOMAIN_ID="$DOM" export OS_PASSWORD="$(awk -F= '/^password=/{print $2}' "$MF")" fail=0

HARDENING GUARD: TLS trust material present before ANY call (fails loud, not opaque SSL)

[ -n "$OS_CACERT" ] && [ -s "$OS_CACERT" ] || { echo " OS_CACERT unset/missing -- STOP "; exit 3; } openssl x509 -in "$OS_CACERT" -noout -checkend 0 >/dev/null 2>&1 || { echo " CA expired/unreadable -- STOP "; exit 3; }

scope=$(openstack token issue -f value -c domain_id &1) [ "$scope" = "$DOM" ] && echo "manager authenticated, domain-scoped" || { echo " auth FAIL: $scope "; exit 1; }

2.1 create_project (PASS)

openstack project create --domain "$DOM" --description "${CLIENT} production" "${CLIENT}-prod" /dev/null 2>&1 PID=$(openstack project show "${CLIENT}-prod" --domain "$DOM" -f value -c id &1) echo "$PID" | grep -qE '^[0-9a-f]{32}$' && echo "project ${CLIENT}-prod=$PID (create_project PASS)" || { echo " create_project FAIL "; fail=1; }

2.2 create_user (service account) (PASS)

SPW=$(python3 -c 'import secrets;print(secrets.token_urlsafe(24))') openstack user create --domain "$DOM" --password "$SPW" --description "${CLIENT} CI/service (cluster creator)" "${CLIENT}-svc" /dev/null 2>&1 SUID=$(openstack user show "${CLIENT}-svc" --domain "$DOM" -f value -c id &1) echo "$SUID" | grep -qE '^[0-9a-f]{32}$' && echo "user ${CLIENT}-svc=$SUID (create_user PASS)" || { echo " create_user FAIL "; fail=1; }

2.3 grant member + load-balancer_member on the project (create_grant) -- capture RESULT, not exit

for R in member load-balancer_member; do openstack role add --project "$PID" --user "$SUID" "$R" &1 openstack role assignment list --project "$PID" --user "$SUID" --names -f value -c Role &1 | grep -qw "$R"
&& echo "granted $R" || { echo " grant $R FAIL "; fail=1; } done

2.4 anti-escalation: admin grant MUST be denied (verify it did NOT take -- ground truth, not error text)

openstack role add --project "$PID" --user "$SUID" admin /dev/null 2>&1 if openstack role assignment list --project "$PID" --user "$SUID" --names -f value -c Role &1 | grep -qw admin; then echo " ESCALATION: manager granted admin -- POLICY FAILURE, STOP "; fail=1 else echo "admin grant DENIED (anti-escalation holds)"; fi

echo "=== STAGE 2 $([ $fail -eq 0 ] && echo PASS || echo FAIL) ===" unset SPW OS_PASSWORD ) --- END block --- GATE: STAGE 2 PASS -- 2.1/2.2/2.3 succeed, 2.4 DENIED.

NOTE (service-user password): the Stage-2 password is ephemeral inside the subshell. Stage 3 resets it (as the MANAGER) to a fresh 0600-stashed value -- the manager owns its domain's users.

--- BEGIN block: onboard-v2-02b-isolation (CHECK read-only -- AS manager) ---

Cross-domain isolation. Resolve REAL other-domain targets AS ADMIN first (prove they exist),

then confirm the manager is refused. Keystone returns "does not exist" for cross-scope reads

(no-enumeration-oracle design) -- treat does-not-exist on a proven-real resource as isolation-holding.

source ~/admin-openrc OTHER_DOM=$(openstack domain show admin_domain -f value -c id &1) OTHER_USER=$(openstack user list --domain "$OTHER_DOM" -f value -c ID &1 | head -1) OTHER_PROJ=$(openstack project show admin --domain admindomain -f value -c id &1) ( for v in $(env|awk -F= '/^OS/{print $1}'); do unset "$v"; done DOM=$(awk -F= '/^domain_id=/{print $2}' "$MF") export OS_AUTH_URL="https://:5000/v3" OS_IDENTITY_API_VERSION=3 OS_CACERT="$CA" export OS_USERNAME="${CLIENT}-domain-admin" OS_USER_DOMAIN_ID="$DOM" OS_DOMAIN_ID="$DOM" export OS_PASSWORD="$(awk -F= '/^password=/{print $2}' "$MF")" refused(){ echo "$1" | grep -qiE 'forbidden|not authorized|403|could not be found|does not exist|No . exists|HTTP 40[34]'; } out=$(openstack user show "$OTHER_USER" &1); refused "$out" && echo "cross-domain user read DENIED/hidden" || echo "** GAP " out=$(openstack project show "$OTHER_PROJ" &1); refused "$out" && echo "admin project read DENIED/hidden" || echo " GAP *" echo "manager domain list (observed own-domain-only on this cloud, 2026-07-01):" openstack domain list -f value -c Name &1 | sed 's/^/ /' ) --- END block --- GATE: both cross-domain reads DENIED/hidden; domain list shows only the client's own domain (the appendix-C names-only-leak does NOT manifest here -- tighter isolation; see appendix-C fix).

================================================================================

STAGE 3 -- Service user mints app cred + keypair (cluster-creator identity) [VALIDATED 2026-07-01]

================================================================================ <client>-svc is the cluster creator. Its token roles are EXACTLY member + load-balancer_member (from Stage 2.3) -- the clean delegatable set the trust needs (appendix-D). App cred MUST be unrestricted (the driver mints a per-cluster CAPO child cred; D-039). Secrets -> 0600 files.

--- BEGIN block: onboard-v2-03-appcred-keypair (RUN -- jumphost) --- CLIENT=; CA="$HOME/vault-init/vault-ca-root.pem" MF="$HOME/${CLIENT}-domain-admin-cred.txt"; SF="$HOME/${CLIENT}-svc-cred.txt"; ACF="$HOME/${CLIENT}-svc-appcred.txt" source ~/admin-openrc DOM=$(openstack domain show "$CLIENT" -f value -c id &1) PID=$(openstack project show "${CLIENT}-prod" --domain "$DOM" -f value -c id &1) SUID=$(openstack user show "${CLIENT}-svc" --domain "$DOM" -f value -c id &1)

3.1 manager sets the svc password -> 0600 (manager owns its domain's users; admin does NOT)

SPW=$(python3 -c 'import secrets;print(secrets.tokenurlsafe(24))') ( for v in $(env|awk -F= '/^OS/{print $1}'); do unset "$v"; done export OS_AUTH_URL="https://:5000/v3" OS_IDENTITY_API_VERSION=3 OS_CACERT="$CA" export OS_USERNAME="${CLIENT}-domain-admin" OS_USER_DOMAIN_ID="$DOM" OS_DOMAIN_ID="$DOM" export OS_PASSWORD="$(awk -F= '/^password=/{print $2}' "$MF")" openstack user set --password "$SPW" "$SUID" &1 && echo "svc password set" || echo " set FAIL " ) umask 077; : > "$SF"; chmod 600 "$SF" printf 'username=%s-svc\nuser_id=%s\nuser_domain_id=%s\nproject_id=%s\nauth_url=https://:5000/v3\npassword=%s\n'
"$CLIENT" "$SUID" "$DOM" "$PID" "$SPW" > "$SF"; chmod 600 "$SF"; unset SPW

3.2 svc self-mints UNRESTRICTED app cred (project-scoped)

( for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done export OS_AUTH_URL="https://:5000/v3" OS_IDENTITY_API_VERSION=3 OS_CACERT="$CA" export OS_USERNAME="${CLIENT}-svc" OS_USER_DOMAIN_ID="$DOM" OS_PROJECT_ID="$PID" export OS_PASSWORD="$(awk -F= '/^password=/{print $2}' "$SF")" chk=$(openstack token issue -f value -c project_id &1) [ "$chk" = "$PID" ] && echo "svc authenticated, project-scoped" || { echo " svc auth FAIL: $chk "; exit 1; } umask 077; : > "$ACF"; chmod 600 "$ACF" openstack application credential create "${CLIENT}-cluster-cred" --unrestricted
--description "${CLIENT} cluster-creator" -f shell "$ACF" 2>&1 grep -qE '^id=' "$ACF" && { chmod 600 "$ACF"; echo "app cred minted -> $ACF";
awk -F'"' '/^secret=/{print " secret length (measured): "length($2)}' "$ACF";
grep -E '^unrestricted=|^project_id=' "$ACF" | sed 's/^/ /'; } || { echo " appcred FAIL "; cat "$ACF"; } )

3.3 svc creates keypair -> 0600

( for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done export OS_AUTH_URL="https://:5000/v3" OS_IDENTITY_API_VERSION=3 OS_CACERT="$CA" export OS_USERNAME="${CLIENT}-svc" OS_USER_DOMAIN_ID="$DOM" OS_PROJECT_ID="$PID" export OS_PASSWORD="$(awk -F= '/^password=/{print $2}' "$SF")" KF="$HOME/${CLIENT}-key.pem"; umask 077 openstack keypair create "${CLIENT}-key" "$KF" 2>&1 head -1 "$KF" | grep -q 'PRIVATE KEY' && { chmod 600 "$KF"; echo "keypair -> $KF"; } || { echo " keypair FAIL "; cat "$KF"; } ) --- END block --- GATE: app cred unrestricted, project_id = -prod, secret length measured (86 on this cloud -- do NOT assert; measure); keypair present, 0600.

================================================================================

STAGE 4 -- Tenant builds L3 (net/subnet/router/ext-gw) [VALIDATED 2026-07-01]

================================================================================ The tenant (app-cred identity) self-serves its own L3, INCLUDING the external gateway. FINDING (2026-07-01): a non-admin app-cred identity CAN set the external gateway on this cloud (confirms the onboarding Stage-5 finding for the automation identity, not just a Horizon human).

--- BEGIN block: onboard-v2-04-network (RUN -- jumphost; L3 as app cred, checks as admin) --- CLIENT=; CA="$HOME/vault-init/vault-ca-root.pem"; ACF="$HOME/${CLIENT}-svc-appcred.txt" TENANT_CIDR=10.20..0/24 # pick from the tenant pool; MUST NOT collide (checked below)

4.0 CIDR collision pre-check (operator IPAM concern; read-only as admin)

source ~/admin-openrc if openstack subnet list -f value -c Subnet &1 | grep -qw "$TENANT_CIDR"; then echo " $TENANT_CIDR IN USE -- pick another, STOP "; COLL=1 else echo "$TENANT_CIDR free"; COLL=0; fi

4.1-4.5 build L3 as the app-cred identity

[ "$COLL" = 0 ] && ( for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done export OS_AUTH_TYPE=v3applicationcredential OS_AUTH_URL="https://:5000/v3" OS_IDENTITY_API_VERSION=3 OS_CACERT="$CA" export OS_APPLICATION_CREDENTIAL_ID=$(awk -F'"' '/^id=/{print $2}' "$ACF") export OS_APPLICATION_CREDENTIAL_SECRET=$(awk -F'"' '/^secret=/{print $2}' "$ACF") openstack token issue -f value -c project_id &1 | grep -qE '^[0-9a-f]{32}$' || { echo " app-cred auth FAIL "; exit 1; } openstack network create "${CLIENT}-net" /dev/null 2>&1 && echo "net ok" openstack subnet create "${CLIENT}-subnet" --network "${CLIENT}-net" --subnet-range "$TENANT_CIDR" --dns-nameserver 8.8.8.8 /dev/null 2>&1 && echo "subnet ok" openstack router create "${CLIENT}-router" /dev/null 2>&1 && echo "router ok" openstack router set "${CLIENT}-router" --external-gateway provider-ext &1 && echo "ext-gw set" || echo " ext-gw FAIL (operator may need to attach) " openstack router add subnet "${CLIENT}-router" "${CLIENT}-subnet" &1 && echo "interface added" )

4.6 verify SNAT (proof egress) -- read-only as admin

source ~/admin-openrc P=$(openstack project show "${CLIENT}-prod" --domain "$CLIENT" -f value -c id &1) RID=$(openstack router list --project "$P" -f value -c ID &1 | head -1) openstack router show "$RID" -f json &1 | python3 -c ' import sys,json; d=json.load(sys.stdin); g=d.get("external_gateway_info") or {} print("router",d.get("name"),d.get("status"),"snat",g.get("enable_snat"), "snat_ip",(g.get("external_fixed_ips") or [{}])[0].get("ip_address","none"))' --- END block --- GATE: router ACTIVE, snat=True, snat_ip allocated from provider-ext.

================================================================================

STAGE 5 -- Tenant creates its OWN cluster template [CORRECTED-PENDING]

================================================================================ Templates are visible by OWNERSHIP -- the tenant creates its own in its project (it cannot use another project's private template). IMAGE PASSED BY UUID (not name): a name is subject to a quoting/resolution hazard -- the first Stage-5 attempt 2026-07-01 failed with Cluster type (vm, Unset, kubernetes) not supported because a doubled-quoted name resolved to no image, so magnum could not derive the type. UUID removes the failure surface entirely. STATUS: the corrected (UUID) block below is staged but was NOT re-run before this draft.

--- BEGIN block: onboard-v2-05-template (RUN -- jumphost; template as app cred) --- CLIENT=; CA="$HOME/vault-init/vault-ca-root.pem"; ACF="$HOME/${CLIENT}-svc-appcred.txt" source ~/admin-openrc IMG_ID=$(openstack image list --public -f value -c ID -c Name &1 | awk '/kube/{print $1;exit}') echo "$IMG_ID" | grep -qE '^[0-9a-f-]{36}$' || { echo " image uuid resolve FAIL -- STOP "; }

( for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done export OS_AUTH_TYPE=v3applicationcredential OS_AUTH_URL="https://:5000/v3" OS_IDENTITY_API_VERSION=3 OS_CACERT="$CA" export OS_APPLICATION_CREDENTIAL_ID=$(awk -F'"' '/^id=/{print $2}' "$ACF") export OS_APPLICATION_CREDENTIAL_SECRET=$(awk -F'"' '/^secret=/{print $2}' "$ACF") openstack token issue -f value -c project_id &1 | grep -qE '^[0-9a-f]{32}$' || { echo " auth FAIL "; exit 1; }

idempotent pre-clean

openstack coe cluster template show "${CLIENT}-k8s" -f value -c uuid /dev/null 2>&1
&& openstack coe cluster template delete "${CLIENT}-k8s" &1 openstack coe cluster template create "${CLIENT}-k8s"
--image "$IMG_ID"
--external-network provider-ext
--master-flavor gp.mid --flavor capi.node
--coe kubernetes --network-driver calico --docker-storage-driver overlay2
--master-lb-enabled --floating-ip-enabled
--fixed-network "${CLIENT}-net" --fixed-subnet "${CLIENT}-subnet"
--keypair "${CLIENT}-key" &1 TID=$(openstack coe cluster template show "${CLIENT}-k8s" -f value -c uuid &1) echo "$TID" | grep -qE '^[0-9a-f-]{36}$' && echo "template ${CLIENT}-k8s=$TID" || echo " template FAIL " ) --- END block --- GATE: template created, coe=kubernetes, network_driver=calico, master_lb+floating_ip enabled, image_id = the public kube image. OPEN QUESTION (one variable at a time): the --fixed-network/--fixed-subnet pin is the strict tenant-isolation posture. If the corrected create fails on the network params (image now moot), drop those two flags and let the capi-helm driver manage the cluster network -- and record which model this driver expects.

================================================================================

STAGE 6/7 -- Tenant creates the cluster (trustee + trust) [PROCEDURE-PENDING]

================================================================================ THE MULTI-TENANT TRUST TEST. Design basis: appendix-D (magnum/common/keystone.py, read live). create_trust delegates context.roles (the CALLER's token roles) from the caller (trustor) to the per-cluster trustee. The caller MUST be the tenant service identity whose token carries EXACTLY member + load-balancer_member -- NOT admin (a trust cannot delegate a role the trustor does not hold; and delegating admin is a privilege-escalation footgun). This is why the creator is <client>-svc via app cred, not the operator.

STATUS: NOT YET LIVE-VERIFIED on the multi-tenant path as of this draft. D-064 fixed the create_user step (trustee user creation -- confirmed live earlier this session). The create_trust step under a clean tenant identity is the specific thing this stage confirms. If it 403s despite a clean delegatable-role identity, that is a genuine finding (look at the conductor's trust-session construction), NOT a policy gap -- do not loosen create_trust.

--- BEGIN block: onboard-v2-06-cluster-create (RUN -- jumphost; create as app cred) --- CLIENT=; CA="$HOME/vault-init/vault-ca-root.pem"; ACF="$HOME/${CLIENT}-svc-appcred.txt"

6.0 mark conductor log (numeric-or-STOP guard)

source ~/admin-openrc MARK=$(juju ssh -m openstack magnum/0 'sudo cat /var/log/magnum/magnum-conductor.log | wc -l' /dev/null | tr -dc '0-9') [ -n "$MARK" ] || { echo "MARK empty -- STOP"; return 2>/dev/null||exit 1; } echo "MARK=$MARK"

6.1/6.2 create as the tenant app-cred identity

( for v in $(env|awk -F= '/^OS_/{print $1}'); do unset "$v"; done export OS_AUTH_TYPE=v3applicationcredential OS_AUTH_URL="https://:5000/v3" OS_IDENTITY_API_VERSION=3 OS_CACERT="$CA" export OS_APPLICATION_CREDENTIAL_ID=$(awk -F'"' '/^id=/{print $2}' "$ACF") export OS_APPLICATION_CREDENTIAL_SECRET=$(awk -F'"' '/^secret=/{print $2}' "$ACF") openstack token issue -f value -c project_id &1 | grep -qE '^[0-9a-f]{32}$' || { echo " auth FAIL "; exit 1; } openstack coe cluster create "${CLIENT}-cluster" --cluster-template "${CLIENT}-k8s"
--keypair "${CLIENT}-key" --master-count 1 --node-count 1 &1 sleep 12 openstack coe cluster show "${CLIENT}-cluster" -f value -c uuid -c status -c status_reason &1 | sed 's/^/ /' )

6.3 conductor log since MARK -- trustee + trust outcome (the verdict)

juju ssh -m openstack magnum/0 "sudo tail -n +$((MARK+1)) /var/log/magnum/magnum-conductor.log 2>/dev/null | grep -iE 'trustee|create_user|createtrust|403|forbidden|created trust|CREATE|ERROR' | tail -30" /dev/null --- END block --- GATE (expected if the model holds): status CREATE_IN_PROGRESS (not a ~3s CREATE_FAILED); log shows trustee created and NO create_trust 403; driver proceeds to helm/CAPI. Then watch to CREATE_COMPLETE (phase-08 Step 8.2 pattern) and verify nodes/CNI/CCM (Step 8.3).

================================================================================

Changes folded from the 2026-07-01 session

================================================================================

  • Identity model: manager persona (D-051/D-064) replaces admin-on-domain (2026-06-22). Manager CLI self-service VALIDATED (G3) -- the 2026-06-22 out-of-band bandaid is retired.
  • appendix-C correction: manager domain-enumeration is own-domain-only on this cloud (the documented names-only cloud-wide leak does NOT manifest); isolation tighter than documented.
  • Cluster-creator identity + trust model documented in appendix-D (magnum source-derived).
  • Template: create in owner project; image by UUID (quoting/resolution hazard).
  • Hardening throughout: OS_CACERT pre-auth guard; numeric MARK guard; capture-and-test-result (not head||echo); subshell isolation; dynamic id resolution; secrets 0600 under $HOME.

Open items (before this DRAFT becomes VALIDATED)

  1. Re-run Stage 5 (corrected UUID form) -- confirm template creates.
  2. Run Stage 6 -- confirm create_trust succeeds under the tenant identity (or capture the finding if it does not). THIS IS THE OUTSTANDING TRUST VALIDATION.
  3. Clean-room pass ("beta"): operate from ONLY the handed-over tenant credentials (zero admin fallback), logging every point where admin is currently used for a read -- classify each as legitimate operator-perimeter vs. a tenant-accessible lookup.
  4. On completion: fold this into tenant-onboarding-runbook.md (Stage 2 rewrite + Stage 7 fill), commit appendix-D, apply the appendix-C correction, and assign the D-06x number for the manager-persona-validated onboarding model.