Newer
Older
openstack-caracal-ipv4 / runbooks / jumphost-provider-vip-gateway.md

jumphost provider-vip L3 gateway (virbr1.104 = 10.12.8.1) -- D-057

Provisions the L3 gateway that makes provider <-> provider-vip routing real on the jumphost (vopenstack-jesse, 10.17.11.246). provider-vip (10.12.8.0/22, VID 104) rides the SAME libvirt bridge as provider (virbr1), tagged; the jumphost already routes between its directly-connected planes (ip_forward=1), so once virbr1.104 = 10.12.8.1 exists, tenant SNAT (on provider 10.12.5-7) reaches the API VIPs on 10.12.8.50-60 and back.

WHY A RUNBOOK, NOT A SCRIPT: this is a one-time, consequential host change. The real risk is how persistence interacts with libvirt (virbr1 is libvirt-managed, created at libvirtd start) -- which a fixture test cannot exercise. It is also NOT portable to Roosevelt (no virbr1 there; the provider-vip gateway is a physical router/SVI). So it is gated and human-run, per the project's "human gates own consequential mutations".

NOT required for: the MAAS plane stand-up, or the carve. The MAAS subnet records gateway_ip=10.12.8.1 as metadata regardless. REQUIRED before: D-011 #3 (tenant -> API reachability) and any provider<->provider-vip traffic test.

================================================================================

PHASE 1 -- AUDIT (read-only). Run, paste back; this picks the persistence method.

================================================================================

--- BEGIN runbook block: gw-01-audit (RUN ON jumphost) --- echo "=== G1: virbr1 must pass tagged frames (VID 104). MUST be 0 ===" cat /sys/class/net/virbr1/bridge/vlan_filtering 2>/dev/null
|| echo "WARN: virbr1 has no bridge/vlan_filtering node -- investigate before proceeding"

echo "=== G2: ip_forward must be 1 ===" cat /proc/sys/net/ipv4/ip_forward

echo "=== G3: virbr1 detail (is it a bridge? up? who owns it?) ===" ip -d link show virbr1 | sed -n '1,6p' ip -br addr show virbr1

echo "=== G4: libvirt 1_provider net -- autostart + forward mode (NAT double-NAT note) ===" sudo virsh net-info 1_provider 2>/dev/null sudo virsh net-dumpxml 1_provider 2>/dev/null | sed -n '1,30p'

echo "=== G5: is virbr1 touched by netplan? (decides systemd-vs-netplan persistence) ===" ls -1 /etc/netplan/ 2>/dev/null sudo grep -RnE 'virbr1|10.12.4.1|10.12.8.' /etc/netplan/ 2>/dev/null || echo "netplan: no virbr1 / .8 references"

echo "=== G6: must NOT already exist ===" ip -br addr show | grep -E 'virbr1.104|10.12.8.' || echo "clean: no virbr1.104 / .8 yet" --- END runbook block: gw-01-audit ---

STOP. Decision from the audit:

  • G1 != 0 -> STOP. VID 104 will not traverse virbr1; the tagged-secondary approach needs rework. This is the same hard gate as the MAAS stand-up.
  • G5 shows virbr1 already managed in netplan -> prefer the NETPLAN persistence variant (Phase 3B) to avoid two managers fighting.
  • G5 shows virbr1 is purely libvirt (the expected case) -> use the SYSTEMD ONESHOT variant (Phase 3A): it orders cleanly after libvirtd and won't race a netplan that doesn't manage virbr1.
  • G4 autostart != yes -> enable it (sudo virsh net-autostart 1_provider) so virbr1 exists at boot before the gateway unit runs.

================================================================================

PHASE 2 -- RUNTIME (reversible; proves it works before persisting)

================================================================================

GATE. Brings the gateway up immediately (lost on reboot -- Phase 3 persists it). Fully reversible via the rollback block. --- BEGIN runbook block: gw-02-runtime --- sudo ip link add link virbr1 name virbr1.104 type vlan id 104 sudo ip addr add 10.12.8.1/22 dev virbr1.104 sudo ip link set virbr1.104 up ip -br addr show virbr1.104 ip route show 10.12.8.0/22 --- END runbook block: gw-02-runtime ---

ROLLBACK (if anything looks wrong): sudo ip link del virbr1.104

TEST (after the MAAS plane exists and a host carries a .8 static, e.g. post-carve): ping -c2 10.12.8.1 # the gateway itself ping -c2 10.12.8.40 # a host's br-prov-api static (if carved)

from a provider-plane host, confirm .8 is reachable via the jumphost route

NOTE (libvirt NAT, cosmetic): 1_provider is forward mode=nat, so .4<->.8 traffic may be masqueraded to the jumphost's address. It still works statefully (the API does not care about source IP). If you later want symmetric, un-NATed provider<->provider-vip routing, add an iptables RETURN rule ahead of the libvirt masquerade for 10.12.4.0/22 <-> 10.12.8.0/22 -- optional, not needed for v1.

================================================================================

PHASE 3 -- PERSISTENCE (pick ONE per the Phase-1 decision)

================================================================================

3A -- systemd oneshot (RECOMMENDED for libvirt-managed virbr1)

Orders after libvirtd; idempotent (deletes any stale virbr1.104 first). --- BEGIN runbook block: gw-03a-systemd --- sudo tee /etc/systemd/system/provider-vip-gw.service >/dev/null <<'UNIT' [Unit] Description=provider-vip L3 gateway (virbr1.104 = 10.12.8.1) -- D-057 After=libvirtd.service network-online.target Wants=network-online.target Requires=libvirtd.service [Service] Type=oneshot RemainAfterExit=yes ExecStartPre=-/sbin/ip link del virbr1.104 ExecStart=/sbin/ip link add link virbr1 name virbr1.104 type vlan id 104 ExecStart=/sbin/ip addr add 10.12.8.1/22 dev virbr1.104 ExecStart=/sbin/ip link set virbr1.104 up ExecStop=/sbin/ip link del virbr1.104 [Install] WantedBy=multi-user.target UNIT sudo systemctl daemon-reload sudo systemctl enable --now provider-vip-gw.service systemctl --no-pager status provider-vip-gw.service | sed -n '1,6p' ip -br addr show virbr1.104 --- END runbook block: gw-03a-systemd ---

Persistence test (the real proof): sudo reboot, then after it returns ip -br addr show virbr1.104 must show 10.12.8.1/22 UP. (libvirt 1_provider must be autostart -- see G4 -- so virbr1 exists when the unit runs.)

ROLLBACK 3A: sudo systemctl disable --now provider-vip-gw.service sudo rm -f /etc/systemd/system/provider-vip-gw.service && sudo systemctl daemon-reload sudo ip link del virbr1.104 2>/dev/null || true

3B -- netplan (ONLY if G5 showed virbr1 already managed by netplan)

Add a vlans stanza. Risk: if virbr1 is NOT up when netplan runs at boot, the vlan fails -- which is exactly why 3A is preferred for a libvirt bridge. Use only if your jumphost already manages virbr1 via netplan.

in the relevant /etc/netplan/*.yaml, under network::

vlans:

virbr1.104:

id: 104

link: virbr1

addresses: [10.12.8.1/22]

then: sudo netplan try (auto-reverts in 120s if unreachable), then sudo netplan apply

ROLLBACK 3B: remove the stanza; sudo netplan apply.

================================================================================

PHASE 4 -- VERIFY

================================================================================ --- BEGIN runbook block: gw-04-verify --- ip -br addr show virbr1.104 # 10.12.8.1/22, UP ip route show 10.12.8.0/22 # directly-connected via virbr1.104 cat /proc/sys/net/ipv4/ip_forward # 1 --- END runbook block: gw-04-verify ---

DONE when virbr1.104 = 10.12.8.1/22 is UP, survives a reboot (3A), and a provider-plane host can reach 10.12.8.x through the jumphost.