diff --git a/docs/v1-redeploy-changelog.md b/docs/v1-redeploy-changelog.md index 16fc7bc..2887155 100644 --- a/docs/v1-redeploy-changelog.md +++ b/docs/v1-redeploy-changelog.md @@ -929,5 +929,54 @@ denied, keypair/sg/network not-visible, existing-ERROR abort, existing-ERROR+RECREATE). shellcheck clean; ASCII+0CR. +### Phase-06 6.3 BLOCKED -- provider FIP range dark cloud-wide; root-caused as D-057 (2026-06-27) + +Phase-06 Step 6.3 (SSH to capi-mgmt-v2 via FIP 10.12.7.107) is blocked: every +provider-ext FIP (pool 10.12.5.0-.7.254) is unreachable -- "No route to host" +(EHOSTUNREACH; L2/ARP fails before TCP) from the jumphost, and TCP-FAIL / +ARP-INCOMPLETE from the cloud nodes themselves. The 10.12.4.234 Horizon HA VIP +is dark in the same class. The .4.x control VIPs answer normally. + +Root cause CONFIRMED (measured, not inferred): on all three ovn-chassis nodes +the provider NIC enp1s0 is untagged + enslaved to a Linux bridge br-enp1s0 that +carries the host provider static L3 (openstack2: inet 10.12.4.42/22 on +br-enp1s0), while also being a port of OVS br-ex (carrier DOWN on all 3). The +provider Neutron net is flat/untagged (physnet1, no segmentation), so the FIP +range is untagged provider traffic -- which the L3-carrying Linux bridge +intercepts at the kernel before it reaches OVS br-ex, where OVN's gateway +ARP-responder lives. This violates the Canonical charm-guide rule that the OVS +uplink must carry NO L3 and host/container workloads must ride a separate VLAN +sub-interface (docs.openstack.org/charm-guide/latest/admin/networking/interface-config.html). + +The OpenStack control plane, OVN logical config (port_binding up, gateway +scheduled, FIP NAT programmed), Neutron provider segment, ovn-chassis +bridge-mappings, and the MAAS provider subnet are ALL correct/healthy -- +verified this session. carve-host-interfaces.sh is correct per the as-built +reference; the L3 + LXD container auto-bridge land on top of the raw NIC at +deploy. A .5.x FIP (10.12.5.19) was inbound-reachable on the PRIOR cloud (chat +4586faa7, 2026-06-05) before the D-052/D-053 teardown/re-IP/rebuild, so +reachability is achievable; this rebuild never re-established it. + +Logged as design decision D-057 (PROPOSED/OPEN): separate the host/API plane +from the flat FIP plane per the Canonical shared-NIC VLAN pattern; raw untagged +enp1s0 reserved for OVS br-ex with NO L3. NIC-limited by design (no spare NIC; +ex-lbaas enp11s0 to be REMOVED). OPEN blocking question: the interaction with +D-003B (API VIPs must stay tenant-reachable) -- next session must choose between +(a) API VIPs on a tagged routed VLAN + re-validate Option-B, or (b) remove the +host provider static from enp1s0 (host may not need provider L3; API VIPs live +in containers) and confirm that alone clears the interception. Full root-cause +chain in docs/fip-edge-diagnosis-checkpoint-20260627.md. NO live ip/ovs-vsctl/ +ovn-nbctl writes -- remediation is a MAAS-carve + bundle + redeploy change. + +DIAGNOSTIC DISCIPLINE (carry forward; cost many turns this session): read +source-of-truth/history before live probing; query OpenStack objects BY ID not +name across scopes; never filter diagnostics through 2>/dev/null+jq (it hid +scope/empty errors); ICMP from nodes is a weak signal (test the real open port); +compare across all nodes before declaring a one-host anomaly. + +(No mutations to the cloud this session -- all diagnosis was read-only. Phase-06 +6.0-BOOT / 6.0 / 6.1 / 6.2 remain as-built from the prior session: capi-mgmt-v2 +ACTIVE, FIP 10.12.7.107, tenant 10.20.0.84, persisted to ~/capi-mgmt-net.env.) + ### Next-free numbers -Design decision: D-057. Doc fix: DOCFIX-056. +Design decision: D-058 (D-057 coined above). Doc fix: DOCFIX-056.