Newer
Older
openstack-caracal-ipv4 / runbooks / phase-05-octavia-enablement.md

Phase 05 -- Octavia Enablement (D-021)

Bring Octavia from its post-deploy BLOCKED state to fully enabled: run the configure-resources action (control plane + lb-mgmt overlay), then build and tag the amphora image. End state: octavia active/idle with an ACTIVE amphora image whose tag matches octavia amp-image-tag. (The end-to-end LB build + round-robin + failover validation is D-011 criterion 4, run in phase-08 once tenant scaffolding exists.)

Decisions: D-021 (amphora pipeline; control-plane then image). Troubleshooting: appendix-A -- L7 (snap cannot read /tmp), octavia-configure-resources (long-running / o-hm0 transient), amp-image-tag-mismatch (LP#1937003).

IP-ONLY NOTE (supersedes the 2026-05-30 octavia capture): the 05-30 /etc/hosts FQDN prereq DOES NOT APPLY here. This deploy is IP-only (R18 catalog) and octavia is multi-homed (reaches the provider VIPs over its eth1), so configure-resources needs no hosts/FQDN prep -- just the action.


Prerequisites (must be true entering phase-05)

  • phase-04 (core/network) done; vault cert cascade complete so octavia:certificates is satisfied. octavia/0 sits BLOCKED "Awaiting ... configure-resources" (the expected post-deploy state, D-021) -- this phase clears it.
  • Glance reachable from the jumphost (provider VIP) to seed the amphora base.
  • Bundle-baked octavia config (verify in 5.2 gate): octavia-diskimage-retrofit use-internal-endpoints=true + image-format=raw + amp-image-tag=octavia-amphora, and octavia amp-image-tag=octavia-amphora (the two MUST match -- LP#1937003).

Constants and env-literals (TAG: confirm per site on rebuild)

  • ENV(octavia-tag) octavia-amphora (octavia + retrofit amp-image-tag; must match)
  • ENV(base-name) jammy-amphora-base (uploaded base; NOT amphora-tagged)
  • ENV(retrofit) octavia-diskimage-retrofit
  • ENV(internal-glance-vip) 10.12.8.53 (retrofit is metal-only 10.12.8.x -> internal glance)
  • run-specific: base image id, amphora image id, op/task ids (capture at run).

Run-location legend

  • # RUN: jumphost -- vopenstack-jesse as jessea123, admin-openrc sourced; all octavia work is juju run / juju config / openstack from the jumphost.

Step 5.1 -- configure-resources (D-021 Phase 1; control plane + lb-mgmt overlay)

# RUN: jumphost Read-only pre-check, then the argument-free action with a bound wait, then authoritative completion via show-operation (NOT the streamed log).

( {
  source ~/admin-openrc
  echo "=== pre-check (verify-before-mutate) ==="
  juju status octavia -m openstack | grep -E 'octavia/0' || true   # expect BLOCKED Awaiting configure-resources
  juju actions octavia -m openstack | grep -i configure-resources  # action exists; takes NO params
  echo "--- idempotency: charm-octavia-tagged resources should be EMPTY pre-run ---"
  openstack network list --tags charm-octavia -f value -c Name        # expect empty
  openstack security group list --tags charm-octavia -f value -c Name # expect empty
  openstack loadbalancer provider list                                # expect amphora present (API reachable)
} )

Run the action (long-running; juju's default wait may time out but the hook keeps going -- use a bound --wait and tee; do NOT re-fire on a wait-timeout -- appendix-A: octavia-configure-resources):

juju run octavia/leader configure-resources -m openstack --wait=20m 2>&1 | tee ~/octavia-configure-resources.out

Authoritative completion + A/B/C verify:

( {
  source ~/admin-openrc
  echo "=== 0. authoritative status (use show-operation, not the streamed log) ==="
  # juju show-operation <N>  -> operation status: completed AND its task: completed
  echo "=== A. octavia active/idle 'Unit is ready' (blocked cleared) ==="
  juju status octavia -m openstack | grep -E 'octavia/0'
  echo "=== B. resources created by the action (were empty pre-run) ==="
  openstack network list --tags charm-octavia -f value -c Name        # lb-mgmt-net
  openstack subnet list  --tags charm-octavia -f value -c Name        # lb-mgmt-subnetv6 (IPv6 geneve design)
  openstack security group list --tags charm-octavia -f value -c Name # lb-mgmt-sec-grp
  echo "=== C. o-hm0 up (IPv6-ULA on lb-mgmt prefix; a br-int port) ==="
  juju exec --unit octavia/0 -m openstack -- 'ip -br addr show o-hm0; sudo ovs-vsctl get Interface o-hm0 external_ids' </dev/null
} )

GATE: octavia/0 active/idle; lb-mgmt-net + lb-mgmt-subnetv6 + lb-mgmt-sec-grp present; o-hm0 has an fc00::/.. IPv6-ULA addr and is a br-int port. (NORMAL, not faults: the lb-mgmt-net is IPv6-ULA by design; a "Virtual network ... down" transient during o-hm0 bring-up self-heals; the lb-mgmt network:distributed port shows DOWN.)

Step 5.2 -- Amphora image pipeline (D-021 Phase 2; canonical block)

# RUN: jumphost This is the verified canonical block (06-03). One ( set -e ) subshell: config GATE -> idempotent seed (base staged in $HOME, NOT /tmp -- the openstack snap cannot read /tmp, appendix-A: L7) -> retrofit build -> confirm. Fully idempotent (amphora present -> skip to confirm; base present -> retrofit only; fresh -> download+checksum+upload+retrofit). For a FIRST live run in a new environment you may stop after the seed to eyeball before the multi-minute build.

# Tunables (operator-confirm the first two for your environment):
BASE_IMG_URL="https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img"
BASE_SUM_URL="https://cloud-images.ubuntu.com/jammy/current/SHA256SUMS"
BASE_IMG_FILE="jammy-server-cloudimg-amd64.img"
BASE_NAME="jammy-amphora-base"                     # ENV(base-name); NOT amphora-tagged (only the retrofit OUTPUT is)
VERSION_NAME="$(date -u +%Y%m%d)"                  # cosmetic (D-021): feeds the retrofit OUTPUT name
PRODUCT_NAME="com.ubuntu.cloud:server:22.04:amd64" # cosmetic (D-021): metadata only
RETRO=octavia-diskimage-retrofit                   # ENV(retrofit)
STAGE="$HOME/amphora-base"                          # snap-READABLE (home iface); NOT /tmp (L7)

( set -e
  source ~/admin-openrc

  # ---- Phase 0: config GATE (abort if the cloud is not in the expected state) ----
  UIE=$(juju config $RETRO use-internal-endpoints)
  IMGFMT=$(juju config $RETRO image-format)
  RTAG=$(juju config $RETRO amp-image-tag)
  OTAG=$(juju config octavia amp-image-tag)
  [ "$UIE" = true ]   || { echo "GATE FAIL: $RETRO use-internal-endpoints=$UIE (need true; retrofit is metal-only)"; exit 1; }
  [ "$IMGFMT" = raw ] || { echo "GATE FAIL: $RETRO image-format=$IMGFMT (need raw; Ceph RBD fast-clone)"; exit 1; }
  [ -n "$RTAG" ] && [ "$RTAG" = "$OTAG" ] || { echo "GATE FAIL: amp-image-tag mismatch retrofit='$RTAG' octavia='$OTAG' (LP#1937003)"; exit 1; }
  echo "[OK] config gate: use-internal-endpoints=true image-format=raw amp-image-tag=$OTAG"

  # ---- Phase 1: idempotency + seed the jammy base (only if no amphora AND no base) ----
  AMPH=$(openstack image list --tag "$OTAG" -f value -c ID | head -1)
  if [ -n "$AMPH" ]; then
    echo "[SKIP] image already tagged $OTAG ($AMPH) -- pipeline complete; jumping to confirm"
  else
    BASE_ID=$(openstack image list --name "$BASE_NAME" -f value -c ID | head -1)
    if [ -z "$BASE_ID" ]; then
      mkdir -p "$STAGE"; LOCAL="$STAGE/$BASE_IMG_FILE"
      EXP=$(curl -fsSL "$BASE_SUM_URL" | awk -v f="$BASE_IMG_FILE" '$2=="*"f || $2==f {print $1}')
      [ -n "$EXP" ] || { echo "GATE FAIL: no published checksum for $BASE_IMG_FILE"; exit 1; }
      if [ -f "$LOCAL" ] && [ "$(sha256sum "$LOCAL" | awk '{print $1}')" = "$EXP" ]; then
        echo "[OK] staged base present + checksum-valid; skipping download"
      else
        echo "[..] downloading jammy base to $LOCAL (snap-readable; NOT /tmp)"
        wget -q -O "$LOCAL" "$BASE_IMG_URL"
        GOT=$(sha256sum "$LOCAL" | awk '{print $1}')
        [ "$EXP" = "$GOT" ] || { echo "GATE FAIL: checksum mismatch exp='$EXP' got='$GOT'"; exit 1; }
        echo "[OK] checksum verified ($GOT)"
      fi
      echo "[..] uploading base to glance (qcow2; 5 retrofit props; NO amphora tag on the base)"
      BASE_ID=$(openstack image create "$BASE_NAME" \
        --file "$LOCAL" --disk-format qcow2 --container-format bare \
        --property architecture=x86_64 --property os_distro=ubuntu --property os_version=22.04 \
        --property version_name="$VERSION_NAME" --property product_name="$PRODUCT_NAME" \
        -f value -c id)
    fi
    [ -n "$BASE_ID" ] || { echo "GATE FAIL: base image id empty after seed"; exit 1; }
    echo "[OK] base image: $BASE_ID"

    # ---- Phase 2: retrofit (long-running build; bounded wait; tee the result) ----
    echo "-- retrofit-image action schema (informational; confirm source-image is honored) --"
    juju actions $RETRO --schema --format yaml 2>&1 | sed -n '/retrofit-image:/,/^[a-zA-Z]/p' | head -30 || true
    echo "[..] running retrofit-image (multi-minute build)"
    juju run $RETRO/leader retrofit-image source-image="$BASE_ID" --wait=30m 2>&1 | tee ~/retrofit-image.out
  fi

  # ---- Phase 3: confirm (amphora present + active + tagged == octavia's tag) ----
  echo "=== CONFIRM: images tagged $OTAG ==="
  openstack image list --tag "$OTAG" -f value -c ID -c Name -c Status
  ACT=$(openstack image list --tag "$OTAG" -f value -c Status | grep -xc active || true)
  [ "$ACT" -ge 1 ] || { echo "CONFIRM FAIL: no ACTIVE image tagged $OTAG"; exit 1; }
  echo "[OK] amphora present + active + tagged $OTAG (matches octavia amp-image-tag) -- D-021 complete"
)

GATE: an ACTIVE image tagged octavia-amphora whose tag matches octavia amp-image-tag.


EXIT GATE (phase-05 complete)

  • octavia/0 active/idle; lb-mgmt-net / lb-mgmt-subnetv6 / lb-mgmt-sec-grp present; o-hm0 up.
  • An ACTIVE amphora image tagged octavia-amphora, tag matching octavia amp-image-tag.
  • Octavia is fully enabled (control plane + amphora). The end-to-end LB validation (build -> listener -> pool -> health-monitor -> 2 members -> FIP; round-robin; admin-scope failover) is D-011 criterion 4 -- run in phase-08 (needs tenant scaffolding + the external provider network from phase-04).

As-built reference (2026-06-03 run -- audit trail)

  • octavia/0: octavia 14.0.0, charm rev 441 2024.1/stable, on 3/lxd/3, data leg 10.12.12.1; multi-homed (reaches provider VIPs over eth1).
  • configure-resources op 15 / task 16 completed (--wait=20m). Created lb-mgmt-net (d1ee4bca-...), lb-mgmt-subnetv6 (1c1f50df-..., IPv6 geneve), lb-mgmt-sec-grp (acbacb21-...). o-hm0 fc00:9c49:5b4e:cf23:f816:3eff:fead:56df/64, br-int port.
  • amphora: retrofit is metal-only (10.12.8.172) -> internal glance VIP 10.12.8.53. base jammy-amphora-base uploaded (f8b48cdb-...); retrofit op 19/task 20 built amphora-haproxy-x86_64-ubuntu-22.04-20260603 (4e4a94ac-...), ACTIVE, tag octavia-amphora (matches octavia amp-image-tag). image-format raw.
  • Charm gap (parked): glance-simplestreams-sync is metal-only and cannot reach glance on a no-DNS deploy (use-internal-endpoints steers keystone auth but not the glance/swift client) -> gss does NOT seed the base. The base is seeded manually (above) and the amphora BUILD stays charm-native via the retrofit over internal endpoints. Roosevelt root-fix: cloud DNS + FQDN-valid certs (also fixes gss).

Next

phase-06 -- in-cloud management cluster (D-035).