Newer
Older
openstack-caracal-ipv4 / runbooks / provider-vip-maas-standup.md

provider-vip MAAS stand-up (D-057)

NOTE (end-of-deployment review -- R1, see D-057-REVIEW-ITEMS.md): the PHASE 2 create blocks below are now SUPERSEDED by the tested, idempotent, dry-run-default scripts/provider-vip-standup.sh, which is the preferred execution path. They are retained here (not trimmed) for reference and as a manual fallback. PHASE 1 (audit), the virbr1 vlan_filtering gate, and the deferred jumphost-gateway reads remain uniquely useful. Reconcile (trim-to-pointer or annotate) in the post-D-011 sweep.

Builds the provider-vip plane in MAAS so the carve and bundle have something to resolve against. New plane: space provider-vip, VLAN VID 104 on the provider fabric (the fabric that owns 10.12.4.0/22), subnet 10.12.8.0/22, reserved VIP band .24.2-.100.

RUN ON: the jumphost (ssh jessea123@10.17.11.246), MAAS admin profile already logged in (maas admin ...). All values are resolved live by name/CIDR -- no hardcoded MAAS ids (PATTERN-1).

PRINCIPLE: PHASE 1 is read-only -- run it and report the output back BEFORE any create. The audit is designed to surface anything that would change the carve or bundle (VID 104 already taken, unexpected provider fabric, vlan_filtering=1 on virbr1, metal-internal mtu/dns differing from assumptions). Do not run PHASE 2 until PHASE 1 is reviewed.

SCOPE NOTE: this stands up the MAAS side only. The jumphost L3 gateway (virbr1.104 = 10.12.8.1) that makes .4<->.8 routing real is a separate host step (libvirt/netplan, persistence-method TBD from a live read) -- deferred to its own block, and NOT required for the MAAS plane to be created or for the carve to resolve. It is required before D-011 #3 (tenant -> API reachability).

================================================================================

PHASE 1 -- AUDIT (read-only; run, then PASTE OUTPUT BACK)

================================================================================

--- BEGIN runbook block: pvip-01-audit (RUN ON jumphost) --- echo "=== A1: provider fabric (owns 10.12.4.0/22) -- VID 104 must live HERE ===" maas admin subnets read | jq -r '.[]|select(.cidr=="10.12.4.0/22") |{cidr, vid:.vlan.vid, fabric:.vlan.fabric, fabric_id:.vlan.fabric_id, space, gateway_ip, dns_servers}'

echo "=== A2: metal-internal VID 103 -- the TEMPLATE to mirror (mtu/managed/dns/dhcp) ===" maas admin subnets read | jq -r '.[]|select(.cidr=="10.12.16.0/22") |{cidr, vid:.vlan.vid, fabric:.vlan.fabric, vlan_mtu:.vlan.mtu, vlan_dhcp_on:.vlan.dhcp_on, space, managed, gateway_ip, dns_servers, allow_dns, allow_proxy, rdns_mode}'

echo "=== A3: VID 104 collision check -- expect EMPTY on every fabric ===" maas admin subnets read | jq -r '[.[].vlan|{vid,fabric,id}]|unique_by(.id)[]|select(.vid==104)' PROV_FAB=$(maas admin subnets read | jq -r '.[]|select(.cidr=="10.12.4.0/22")|.vlan.fabric_id') echo " provider fabric_id = $PROV_FAB ; VLANs already on it:" maas admin vlans read "$PROV_FAB" | jq -r '.[]|{vid,name,id,space}'

echo "=== A4: provider-vip must NOT already exist -- expect EMPTY for both ===" maas admin subnets read | jq -r '.[]|select(.cidr=="10.12.8.0/22")|.cidr' maas admin spaces read | jq -r '.[]|select(.name=="provider-vip")|.name'

echo "=== A5: provider-public reserved ranges -- mirror this pattern for .8 ===" maas admin ipranges read | jq -r '.[]|select(.start_ip|startswith("10.12.4.")) |{type, start_ip, end_ip, comment}'

echo "=== A6: GATE -- virbr1 must pass tagged frames (VID 104). MUST print 0 ===" cat /sys/class/net/virbr1/bridge/vlan_filtering 2>/dev/null
|| echo "WARN: virbr1 has no bridge/vlan_filtering node (not a bridge?) -- investigate"

echo "=== A7: jumphost must not already carry .8 -- expect 'clean' ===" ip -br addr show | grep -E 'virbr1.104|10.12.8.' || echo "clean: no .8 on jumphost yet" --- END runbook block: pvip-01-audit ---

STOP. Report A1-A7. Expected / change-triggers:

  • A1: provider fabric_id is the home for VID 104. Note its value; C2/C3 use it.
  • A2: gives mtu (almost certainly 1500) + dns_servers + managed for provider-vip to mirror. If mtu != 1500, C2 must match it. If dns_servers differ from a prior assumption, C4b uses the value read here -- not a guessed 10.12.12.1.
  • A3: must be EMPTY. If VID 104 is already in use, STOP -- pick another VID and update lib-net.sh PROVIDER_VIP_VID + the carve assert + this runbook in lockstep.
  • A4: both EMPTY. If either exists, a prior partial run happened -- reconcile, do not blind-create.
  • A6: MUST be 0. If 1, STOP -- VID 104 will not traverse virbr1 and the whole tagged-secondary approach needs rework (per-port VLAN membership, or a different bridge). This is the make-or-break gate; flag it loudly.

================================================================================

PHASE 2 -- CREATE (only after PHASE 1 reviewed; run ONE block at a time)

================================================================================

Re-resolve the helpers at the top of every shell you run these in (they are not persisted between SSH sessions):

PROV_FAB=$(maas admin subnets read | jq -r '.[]|select(.cidr=="10.12.4.0/22")|.vlan.fabric_id') MTU_PROV=$(maas admin subnets read | jq -r '.[]|select(.cidr=="10.12.4.0/22")|.vlan.mtu') # VID 104 parent = provider fabric DNS103=$(maas admin subnets read | jq -r '.[]|select(.cidr=="10.12.16.0/22")|.dns_servers|join(",")')

GATE C1 -- create the space. --- BEGIN runbook block: pvip-02-space --- maas admin spaces create name=provider-vip maas admin spaces read | jq -r '.[]|select(.name=="provider-vip")|{name,id}' --- END runbook block: pvip-02-space ---

GATE C2 -- create VLAN 104 on the PROVIDER fabric, mtu mirroring the PROVIDER untagged VLAN (VID 104 is a child of enp1s0 on the provider fabric -- its MTU must track that parent, NOT metal-internal which lives on a different fabric). (Confirm flags first if unsure: maas admin vlans create --help.) --- BEGIN runbook block: pvip-03-vlan --- PROV_FAB=$(maas admin subnets read | jq -r '.[]|select(.cidr=="10.12.4.0/22")|.vlan.fabric_id') MTU_PROV=$(maas admin subnets read | jq -r '.[]|select(.cidr=="10.12.4.0/22")|.vlan.mtu') # provider parent, not metal-internal maas admin vlans create "$PROV_FAB" name=provider-vip vid=104 mtu="$MTU_PROV" maas admin vlans read "$PROV_FAB" | jq -r '.[]|select(.vid==104)|{vid,name,id,fabric_id,mtu,space}' --- END runbook block: pvip-03-vlan ---

GATE C3 -- assign the VID-104 VLAN to the provider-vip space. (If space=<id> is rejected, retry with space=provider-vip.) --- BEGIN runbook block: pvip-04-assign-space --- PROV_FAB=$(maas admin subnets read | jq -r '.[]|select(.cidr=="10.12.4.0/22")|.vlan.fabric_id') SPACE_ID=$(maas admin spaces read | jq -r '.[]|select(.name=="provider-vip")|.id') maas admin vlan update "$PROV_FAB" 104 space="$SPACE_ID" maas admin vlans read "$PROV_FAB" | jq -r '.[]|select(.vid==104)|{vid,id,space}' --- END runbook block: pvip-04-assign-space ---

GATE C4 -- create the subnet on the VID-104 VLAN, then set gateway/dns/managed. Split into create (minimal) + update (confirmed subnet update form) to avoid guessing which flags subnets create accepts. --- BEGIN runbook block: pvip-05-subnet --- PROV_FAB=$(maas admin subnets read | jq -r '.[]|select(.cidr=="10.12.4.0/22")|.vlan.fabric_id') VID104_VLANID=$(maas admin vlans read "$PROV_FAB" | jq -r '.[]|select(.vid==104)|.id') DNS103=$(maas admin subnets read | jq -r '.[]|select(.cidr=="10.12.16.0/22")|.dns_servers|join(",")')

5a: create (minimal -- cidr + vlan)

maas admin subnets create cidr=10.12.8.0/22 vlan="$VID104_VLANID"

5b: routed-plane gateway (D-057) + managed; dns mirrors VID 103 if set

maas admin subnet update 10.12.8.0/22 gateway_ip=10.12.8.1 managed=true [ -n "$DNS103" ] && maas admin subnet update 10.12.8.0/22 dns_servers="$DNS103"

maas admin subnets read | jq -r '.[]|select(.cidr=="10.12.8.0/22") |{cidr, vid:.vlan.vid, fabric:.vlan.fabric, space, managed, gateway_ip, dns_servers}' --- END runbook block: pvip-05-subnet ---

GATE C5 -- reserved VIP band .24.2-.100 (VIPs .50-.60 live in it; mirrors .4.2-.100). --- BEGIN runbook block: pvip-06-range --- SUB24=$(maas admin subnets read | jq -r '.[]|select(.cidr=="10.12.8.0/22")|.id') maas admin ipranges create type=reserved subnet="$SUB24"
start_ip=10.12.8.2 end_ip=10.12.8.100 comment="provider-vip API VIP band (D-057)" maas admin ipranges read | jq -r '.[]|select(.start_ip|startswith("10.12.8.")) |{type, start_ip, end_ip, comment}' --- END runbook block: pvip-06-range ---

================================================================================

PHASE 3 -- VERIFY (read-only) -- proves the carve will resolve + Juju sees it

================================================================================

--- BEGIN runbook block: pvip-07-verify --- echo "=== carve resolvers, simulated against live MAAS (must match the script) ===" SUB=$(maas admin subnets read) echo "subid_of 10.12.8.0/22 = $(echo "$SUB" | jq -r '.[]|select(.cidr=="10.12.8.0/22")|.id') (expect non-empty)" echo "vlanid_of 10.12.8.0/22 = $(echo "$SUB" | jq -r '.[]|select(.cidr=="10.12.8.0/22")|(.vlan.id // .vlan)') (the VLAN obj id)" echo "vlanvid_of 10.12.8.0/22 = $(echo "$SUB" | jq -r '.[]|select(.cidr=="10.12.8.0/22")|.vlan.vid') (MUST be 104)" echo "space = $(echo "$SUB" | jq -r '.[]|select(.cidr=="10.12.8.0/22")|.space') (MUST be provider-vip)"

echo "=== Juju visibility (real consumption is at redeploy; this pre-validates) ===" juju reload-spaces juju spaces | grep -E 'provider-vip|provider-public' || echo "WARN: provider-vip not visible to Juju" --- END runbook block: pvip-07-verify ---

PASS CRITERIA:

  • vlanvid_of == 104, space == provider-vip, subid_of non-empty.
  • juju spaces lists provider-vip. At this point the carve's PHASE-1 asserts (no MAAS subnet for 10.12.8.0/22, provider-vip ... expected 104) will pass, and the files are safe to drop in.

================================================================================

DEFERRED (separate step; needs a live read first) -- jumphost L3 gateway

================================================================================

virbr1.104 = 10.12.8.1 on the jumphost makes .4<->.8 routing real (ip_forward is already on). NOT needed for the MAAS plane or the carve; needed before D-011

#3. Before writing it I want a live read of how the jumphost defines virbr1 so the persistence method is correct (libvirt network XML vs netplan vs a systemd unit -- virbr1 is libvirt-managed, so a naive netplan vlan-on-virbr1 may race libvirt at boot):

ip -d link show virbr1 virsh net-dumpxml 1_provider 2>/dev/null | sed -n '1,40p' ls /etc/netplan/ ; sudo grep -RnE 'virbr1|10.12.4.1' /etc/netplan/ 2>/dev/null

Report that and I will write the gated gateway block (with a clean rollback).

POST-DEPLOY WATCH-ITEM (gateway_ip safeguard): after redeploy, confirm every API container's DEFAULT route is still via metal-admin 10.12.12.1, NOT 10.12.8.1: juju exec --all -- ip route show default If any unit defaults via .24.1, drop the subnet gateway_ip (set to "") or pin the node default gateway to the metal-admin subnet; provider-vip reachability does not depend on its own gateway in v1.