sdk/go,controller: add TopologyInfo deserialization and flex-algo controller support#3515
sdk/go,controller: add TopologyInfo deserialization and flex-algo controller support#3515ben-malbeclabs merged 5 commits intomainfrom
Conversation
3272eb1 to
72ca31d
Compare
b102bd9 to
33ab59c
Compare
…y processors (#3497) RFC-18 flex-algo · PR 1 of 5 · see `rfcs/rfc-0018-flex-algo.md` Series: #3497 · #3512 · #3513 · #3514 · #3515 ## Summary of Changes - Adds `TopologyInfo` onchain account (RFC-18 flex-algo) with `admin_group_bit`, `flex_algo_number`, and `TopologyConstraint`; creates `AdminGroupBits` resource extension singleton for bit allocation - Adds four new foundation-only instructions (107–110): `CreateTopology`, `DeleteTopology`, `ClearTopology`, `BackfillTopology` - Extends `Link` with `link_topologies` (Vec<Pubkey>) and `link_flags` (LINK_FLAG_UNICAST_DRAINED); extends `Tenant` with `include_topologies`; adds `flex_algo_node_segments` to Interface V2 - `CreateLink` now requires the unicast-default topology account and auto-tags the link into it - `UpdateLink` gains foundation-gated `link_topologies` update and contributor-gated `unicast_drained` flag ## Diff Breakdown | Category | Files | Lines (+/-) | Net | |-------------|-------|---------------|-------| | Core logic | 22 | +1,306 / -38 | +1,268| | Scaffolding | 49 | +133 / -3 | +130 | | Tests | 27 | +4,592 / -74 | +4,518| The scaffolding is mechanical struct field additions (`link_topologies: vec![]`, `include_topologies: vec![]`, `flex_algo_node_segments: vec![]`) across CLI, SDK, activator, and client to keep the workspace compiling. Core logic is concentrated in the new topology processors, state types, and link/activate updates. <details> <summary>Key files (click to expand)</summary> - `smartcontract/programs/doublezero-serviceability/src/state/interface.rs` — updates Interface V2 (discriminant 1) to include `flex_algo_node_segments: Vec<FlexAlgoNodeSegment>`; V1 accounts (pre-CYOA format) are left as-is; pre-RFC-18 V2 accounts on mainnet are upgraded in-place by `MigrateDeviceInterfaces` before this deserialization path is used - `smartcontract/programs/doublezero-serviceability/src/state/topology.rs` — new `TopologyInfo` account: `admin_group_bit` (u8), `flex_algo_number` (128+bit), `TopologyConstraint` (IncludeAny/Exclude) - `smartcontract/programs/doublezero-serviceability/src/processors/topology/` — five new processors: create (allocates AdminGroupBit, validates IdRange 1–127), delete, clear (removes topology from all links), backfill (allocates FlexAlgoNodeSegment on all device interfaces) - `smartcontract/programs/doublezero-serviceability/src/processors/link/update.rs` — adds foundation-gated `link_topologies` update (validates each topology account onchain) and contributor-gated `unicast_drained` flag (LINK_FLAG_UNICAST_DRAINED bit 0) - `smartcontract/programs/doublezero-serviceability/src/processors/link/activate.rs` — requires `unicast_default_topology_account` as a mandatory account; auto-tags new link into the unicast-default topology on activation - `smartcontract/programs/doublezero-serviceability/src/processors/globalconfig/set.rs` — creates `AdminGroupBits` ResourceExtension PDA on global config initialization - `smartcontract/programs/doublezero-serviceability/src/state/link.rs` — adds `link_topologies: Vec<Pubkey>`, `link_flags: u64`, and `LINK_FLAG_UNICAST_DRAINED = 0x01` - `smartcontract/programs/doublezero-serviceability/src/instructions.rs` — registers four new instruction variants (CreateTopology=107, DeleteTopology=108, ClearTopology=109, BackfillTopology=110) </details> ## Testing Verification - `make rust-lint` and `make rust-test` pass clean on this branch - `topology_test.rs` (~2,400 lines) covers all four topology processors: create/delete/clear/backfill, including error paths (duplicate names, out-of-range bits, unauthorized signers, invalid topology accounts), segment allocation, and multi-topology backfill - Existing `link_wan_test.rs`, `tenant_test.rs`, and telemetry tests updated to account for the new mandatory `unicast_default_topology_account` in `ActivateLink` --------- Co-authored-by: Greg Mitchell <greg@malbeclabs.com>
RFC-18 flex-algo · PR 2 of 5 · see `rfcs/rfc-0018-flex-algo.md` Series: #3497 · #3512 · #3513 · #3514 · #3515 ## Summary of Changes - Adds five `doublezero link topology` subcommands: `create`, `delete`, `clear`, `backfill`, `list` - `topology create` allocates the admin-group bit and flex-algo number onchain, then automatically backfills `FlexAlgoNodeSegment` entries on all activated Vpnv4 loopback interfaces - `topology backfill` and `topology clear` batch at 16 accounts per transaction to stay within Solana's 32-account limit; zero accounts → zero transactions sent - `topology list` displays each topology's name, admin-group bit, flex-algo number, BGP color community, constraint, and count of tagged links - Backs each subcommand with a Rust SDK command in `doublezero_sdk::commands::topology` - Simplifies the `BackfillTopology` onchain processor: removes the two-pass SR ID pre-marking loop; keeping the `SegmentRoutingIds` resource in sync with off-chain allocations is the activator's responsibility ## Diff Breakdown | Category | Files | Lines (+/-) | Net | |-------------|-------|----------------|--------| | Core logic | 11 | +1,572 / -32 | +1,540 | | Scaffolding | 8 | +91 / -6 | +85 | | Tests | 1 | +25 / -16 | +9 | Core logic includes inline unit tests in the SDK command files. <details> <summary>Key files (click to expand)</summary> - `smartcontract/sdk/rs/src/commands/topology/create.rs` — creates topology, then enumerates devices and calls `BackfillTopologyCommand` for those with Vpnv4 loopbacks; returns `CreateTopologyResult { signature, topology_pda, backfill_signatures }` - `smartcontract/sdk/rs/src/commands/topology/backfill.rs` — batches devices in chunks of 16; returns `Vec<Signature>`; empty device list sends no transactions - `smartcontract/sdk/rs/src/commands/topology/clear.rs` — batches links in chunks of 16; returns `Vec<Signature>`; empty link list sends no transactions - `smartcontract/sdk/rs/src/commands/topology/list.rs` — fetches all `TopologyInfo` accounts and cross-references link accounts to compute per-topology link counts - `smartcontract/cli/src/topology/create.rs` — CLI wrapper; prints PDA and backfill transaction count - `smartcontract/cli/src/topology/list.rs` — tabular display of name, bit, algo number, BGP color, constraint, link count - `smartcontract/programs/doublezero-serviceability/src/processors/topology/backfill.rs` — removes two-pass SR ID pre-marking; single-pass allocation only </details> ## Testing Verification - `cargo test -p doublezero_sdk -p doublezero_cli` passes - `cargo clippy -- -D warnings` clean - `make rust-fmt` applied
migrate command (#3513) RFC-18 flex-algo · PR 3 of 5 · see `rfcs/rfc-0018-flex-algo.md` Depends on: #3512 (PR 2) Series: #3497 · #3512 · #3513 · #3514 · #3515 ## Summary of Changes - Extends `doublezero link get/list/update` to display topology assignments (`link_topologies`) and drain status; adds `--link-topology` (comma-separated topology names, `default` to clear all) and `--unicast-drained` flags to `link update`; adds `--topology` filter to `link list` - Extends `doublezero tenant get/list/update` to display included topologies (`include_topologies`) and adds `--include-topologies` (comma-separated topology names, `default` to clear) to `tenant update` - Adds `doublezero-admin migrate flex-algo [--dry-run]` command that backfills link topology assignments and VPNv4 loopback flex-algo node segments across all existing devices and links ## Diff Breakdown | Category | Files | Lines (+/-) | Net | |-------------|-------|-------------|------| | Core logic | 11 | +471 / -57 | +414 | | Scaffolding | 7 | +16 / -5 | +11 | Mostly core logic — the migrate command alone is 180 lines of backfill and dry-run orchestration. <details> <summary>Key files (click to expand)</summary> - `controlplane/doublezero-admin/src/cli/migrate.rs` — new: `migrate flex-algo` command; validates UNICAST-DEFAULT PDA exists, backfills link topologies for all links, backfills flex-algo node segments for all VPNv4 loopback interfaces on all devices; supports `--dry-run` - `smartcontract/cli/src/link/list.rs` — adds `--topology <name>` filter and displays `link_topologies`/`unicast_drained` columns, resolving topology pubkeys to human-readable names - `smartcontract/cli/src/link/get.rs` — displays topology assignments and drain status; fetches topology map to resolve pubkeys to names - `smartcontract/cli/src/link/update.rs` — adds `--link-topology` (accepts comma-separated list of topology names; use `default` to clear) and `--unicast-drained` flags - `smartcontract/cli/src/tenant/list.rs` — displays `include_topologies` column - `smartcontract/cli/src/tenant/get.rs` — displays included topology names - `smartcontract/cli/src/tenant/update.rs` — adds `--include-topologies` (accepts comma-separated list; use `default` to clear) </details> ## Testing Verification - `cargo test -p doublezero_sdk -p doublezero_cli -p doublezero-admin` passes - `cargo clippy -- -D warnings` clean - `make rust-fmt` applied before commit --------- Co-authored-by: Greg Mitchell <greg@malbeclabs.com>
RFC-18 flex-algo · PR 4 of 5 · see [`rfcs/rfc-0018-flex-algo.md`](../tree/main/rfcs/rfc-0018-flex-algo.md) Depends on: #3512 (PR 2), #3513 (PR 3) Series: #3497 · #3512 · #3513 · #3514 · #3515 ## Summary of Changes - Python SDK: deserialize `TopologyConstraint`, `TopologyInfo`, and `FlexAlgoNodeSegment`; read `flex_algo_node_segments` from the RFC-18 `InterfaceV3` account format - TypeScript SDK: deserialize `FlexAlgoNodeSegment` from `InterfaceV3` account format; guard the segments loop against pre-RFC-18 mainnet accounts where the segment count reads garbage bytes - SDK fixtures: regenerate device, link, tenant, and user fixtures to include the new `InterfaceV3` fields - TypeScript RPC client: remove the `AbortController` timeout wrapper around `fetch` (was surfacing as spurious `TimeoutError` on slow `getProgramData` responses); bump the compat test's request timeout to 120s - `tryReadString` in `borsh-incremental`: check remaining buffer length before reading to avoid out-of-bounds reads - CHANGELOG / e2e compatibility test: document the mandatory CLI upgrade boundary for RFC-18 `InterfaceV3` (devices written by the new program cannot be deserialized by pre-RFC-18 CLI versions) Note: activator changes that were originally planned for this PR have been dropped — the doublezero activator is deprecated. ## Diff Breakdown | Category | Files | Lines (+/-) | |------------|-------|-------------| | Python SDK | 1 | +61 / -1 | | TS SDK | 3 | +27 / -3 | | Fixtures | 5 | +4 / -1 + bin | | Docs/e2e | 2 | +2 / -2 | ## Testing Verification - Python SDK: `uv run pytest` passes under `sdk/serviceability/python/`, including fixture deserialization - TypeScript SDK: `bun test` passes under `sdk/serviceability/typescript/`, including compat fixture deserialization with the new `InterfaceV3` fields - `cargo check --workspace` clean
Completes the Go SDK catch-up for PR3's onchain writes: Link now has link_topologies and link_flags, Tenant has include_topologies. Also adds TopologyInfo account type (IndexType=16, TopologyType=17) with TopologyConstraint enum, ListTopologies client method, and fixture bytes for the new trailing fields in the unit tests.
Adds RFC-18 flex-algo rendering to the controller's Arista EOS template, gated behind a --features-config YAML toggle (flex_algo.enabled) that is off by default. When enabled, the controller populates topology data into the state cache, resolves per-tenant BGP color communities from Tenant.include_topologies, and emits IS-IS flex-algo node segments and color stamping blocks into tunnel.tmpl. Includes golden-config fixtures for both enabled and disabled paths (base.config.flex-algo.txt / base.config.flex-algo-disabled.txt) and render tests covering each.
nikw9944
left a comment
There was a problem hiding this comment.
Code Review — PR #3515: sdk/go,controller: add TopologyInfo deserialization and flex-algo controller support
Related: rfcs/rfc-0018-flex-algo.md
Tests pass, lint is clean. Overall this is solid work — the feature-gate pattern is well-designed, the cleanup/rollback path (disabled flex-algo emitting no commands) is thoughtful, and the test coverage on render paths is good.
High
1. CHANGELOG is stale after the latest commit
CHANGELOG.md:14 — Still says Add --features-config flag accepting a YAML file but the latest commit (7cb9ba85) removed the --features-config flag in favor of auto-loading /etc/doublezero-controller/features.yaml. The CHANGELOG should reflect the current behavior (auto-load, no flag).
2. CHANGELOG references nonexistent ListTopologies client method
CHANGELOG.md:16 — Says "Go serviceability SDK adds … ListTopologies client method" but no ListTopologies function exists anywhere in the codebase. The SDK adds the TopologyInfo type, deserialization, and the dispatch case in GetProgramData — but there is no standalone ListTopologies method. This should be corrected to avoid confusing readers.
Medium
3. LinkFlags bit 0 is a magic number
server.go:415 — link.LinkFlags&0x01 != 0 for UnicastDrained. A named constant (e.g., const LinkFlagUnicastDrained = 0x01) in the SDK's state.go would be more self-documenting and avoid magic bitmask values scattered across consumers.
4. No direct unit test for resolveTenantColors
server.go:780-798 — The resolveTenantColors function has meaningful branching logic (empty vs. non-empty includeTopologies, fallback to unicast-default) but is only tested indirectly through render tests that pre-compute TenantTopologyColors. A focused unit test would catch regressions in the fallback behavior (e.g., what happens when the topology map has no unicast-default entry, or when includeTopologies contains a pubkey not in the map).
5. Redundant condition in template
tunnel.tmpl:135 — The inner condition {{- if and .Ip.IsValid .IsPhysical .Metric .IsLink (not .IsSubInterfaceParent) (not .IsCYOA) (not .IsDIA) }} is identical to the outer condition at line 123. The inner block is always entered when the outer block is entered, so this is a no-op guard. Consider removing the redundant inner check for clarity, or add a comment explaining if it's intentional (e.g., defensive against future template restructuring).
Low
6. resolveTenantColors default fallback uses string matching
server.go:791 — strings.EqualFold(topo.Name, "unicast-default") — the name "unicast-default" is a well-known topology name used in multiple places. Consider extracting it as a named constant (e.g., const DefaultTopologyName = "unicast-default") to avoid string literal duplication.
7. exclude constraint topologies silently produce no flex-algo definition
tunnel.tmpl:349 — The {{- if eq .ConstraintStr "include-any" }} guard means topologies with exclude constraint skip the flex-algo N NAME definition block entirely. This is probably correct behavior, but a brief comment in the template explaining why exclude topologies don't get a flex-algo definition would help future readers.
|
In my opinion as a followup you should add an e2e test showing that the correct config is generated in the following scenarios:
|
- Fix CHANGELOG: auto-load description and correct SDK method name - Add LinkFlagUnicastDrained constant to sdk/go, use in controller - Remove redundant inner if guard in tunnel.tmpl - Extract defaultTopologyName constant in resolveTenantColors - Add exclude constraint branch to flex-algo definition block - Add unit tests for resolveTenantColors and flex-algo config rendering
7cb9ba8 to
7181cb5
Compare
ben-malbeclabs
left a comment
There was a problem hiding this comment.
Thanks for the thorough review. Addressed in the latest commit:
1 & 2 — CHANGELOG: Both entries corrected — auto-load description replaces the flag reference, and GetProgramData dispatch case replaces the nonexistent ListTopologies method.
3 — LinkFlags magic number: Added LinkFlagUnicastDrained constant to sdk/go/serviceability/state.go and updated server.go to use it.
4 — Unit test for resolveTenantColors: Added Test_resolveTenantColors with 5 table-driven cases covering: empty include_topologies falling back to unicast-default, empty with no unicast-default (returns empty string), single known pubkey, multiple known pubkeys in slice order, and unknown pubkey silently skipped.
5 — Redundant template condition: Removed the duplicate inner guard.
6 — "unicast-default" string literal: Extracted as const defaultTopologyName.
7 — exclude constraint topologies: exclude is a valid user-creatable topology type — operators can create one via doublezero topology create --constraint exclude, at which point the activator allocates a flex-algo number and color for it. Without a corresponding flex-algo definition pushed to the device, the color community would be stamped on BGP routes but the device would have no algorithm to resolve them against — a silent misconfiguration. Added the else if eq .ConstraintStr "exclude" branch and validated the syntax in the physical lab. TestGetConfig_FlexAlgo/flex_algo_enabled now verifies both constraint types render correctly.
|
The latest commit also adds |
RFC-18 flex-algo · PR 5 of 5 · see
rfcs/rfc-0018-flex-algo.mdDepends on: #3497 (PR 1) — does not require PRs 2–4; can merge after PR 1
Series: #3497 · #3512 · #3513 · #3514 · #3515
Summary of Changes
TopologyInfoaccount type to the Go SDK (state.go,deserialize.go,client.go), includingTopologyType,TopologyConstraint,LinkFlagUnicastDrained,LinkTopologies/LinkFlagsonLink, andIncludeTopologiesonTenant/etc/doublezero-controller/features.yamlat startup if present (silently skips if absent); whenflex_algo.enabled: true, populates topology data into the state cache, resolves tenant color communities viaresolveTenantColors, and emits IS-IS flex-algo node segment and link configuration into the Arista EOS templateinclude-anyconstraint (administrative-group include any N exclude 0) andexcludeconstraint (administrative-group exclude N,0) topologies; gated per-link vialink_tagging.excludeand per-tenant viacommunity_stampinge2e/compatibility_test.goto set the mandatory upgrade boundary for RFC-18 interface/link operations tobefore: "0.18.0"Diff Breakdown
Most of the weight is tests and fixtures; core logic is the controller cache/template and Go SDK deserialization.
Key files (click to expand)
controlplane/controller/internal/controller/server.go— populatesTopologiesmap in state cache; resolvesLinkTopologiespubkeys to topology names andUnicastDrainedfromLinkFlags; computesTenantTopologyColors;resolveTenantColorsfalls back tounicast-defaultwhen tenant has no explicit topology assignmentscontrolplane/controller/internal/controller/templates/tunnel.tmpl— IS-IS flex-algo node segment config, link admin-group tagging, BGPnext-hop resolution ribsandset extcommunitystamping, and full rollback (nocommands) when disabled; handles bothinclude-anyandexcludeconstraint topologiescontrolplane/controller/internal/controller/features_config.go—FeaturesConfigstruct withflex_algo.enabled,link_tagging.exclude, andcommunity_stampingcontrols; auto-loaded from/etc/doublezero-controller/features.yamlcontrolplane/controller/internal/controller/models.go— addsLinkTopologies,UnicastDrained,FlexAlgoNodeSegments,TenantTopologyColorsto controller model types;TopologyModelandFlexAlgoEnabled()for template renderingsmartcontract/sdk/go/serviceability/state.go— addsTopologyType,TopologyConstraint,TopologyInfo,LinkFlagUnicastDrained; extendsLinkwithLinkTopologies/LinkFlagsandTenantwithIncludeTopologiessmartcontract/sdk/go/serviceability/deserialize.go— addsDeserializeTopologyInfo; extendsDeserializeLinkwithLinkTopologies/LinkFlagsandDeserializeTenantwithIncludeTopologiesTesting Verification
TestGetConfig_FlexAlgo(new): render tests for flex-algo disabled, enabled with both constraint types (include-anyandexclude), and link excluded from tagging — all passTest_resolveTenantColors(new): 5 table-driven cases covering fallback tounicast-default, missing topology, single/multiple known pubkeys, and unknown pubkey — all passTestE2E_IBRL,TestE2E_IBRL_WithAllocatedAddr,TestE2E_Multicast— all pass; flex-algo config is suppressed in e2e (nofeatures.yamlpresent)excludeconstraint flex-algo syntax validated on physical lab switches (chi-dn-dzd5–dzd8)