Skip to content

sdk/go,controller: add TopologyInfo deserialization and flex-algo controller support#3515

Merged
ben-malbeclabs merged 5 commits intomainfrom
bc/rfc18-pr5
Apr 29, 2026
Merged

sdk/go,controller: add TopologyInfo deserialization and flex-algo controller support#3515
ben-malbeclabs merged 5 commits intomainfrom
bc/rfc18-pr5

Conversation

@ben-malbeclabs
Copy link
Copy Markdown
Contributor

@ben-malbeclabs ben-malbeclabs commented Apr 10, 2026

RFC-18 flex-algo · PR 5 of 5 · see rfcs/rfc-0018-flex-algo.md
Depends on: #3497 (PR 1) — does not require PRs 2–4; can merge after PR 1
Series: #3497 · #3512 · #3513 · #3514 · #3515

Summary of Changes

  • Adds TopologyInfo account type to the Go SDK (state.go, deserialize.go, client.go), including TopologyType, TopologyConstraint, LinkFlagUnicastDrained, LinkTopologies/LinkFlags on Link, and IncludeTopologies on Tenant
  • Auto-loads /etc/doublezero-controller/features.yaml at startup if present (silently skips if absent); when flex_algo.enabled: true, populates topology data into the state cache, resolves tenant color communities via resolveTenantColors, and emits IS-IS flex-algo node segment and link configuration into the Arista EOS template
  • Template emits correct flex-algo definitions for both include-any constraint (administrative-group include any N exclude 0) and exclude constraint (administrative-group exclude N,0) topologies; gated per-link via link_tagging.exclude and per-tenant via community_stamping
  • Updates e2e/compatibility_test.go to set the mandatory upgrade boundary for RFC-18 interface/link operations to before: "0.18.0"

Diff Breakdown

Category Files Lines (+/-) Net
Core logic 6 +368 / -6 +362
Scaffolding 2 +22 / -0 +22
Tests 4 +482 / -2 +480
Fixtures 2 +415 / -0 +415
Docs 1 +4 / -0 +4

Most of the weight is tests and fixtures; core logic is the controller cache/template and Go SDK deserialization.

Note on diff size: The full diff vs main is large because this branch is stacked on PR 1 (#3497), which has not yet merged. The table above reflects only PR 5's contribution. After PR 1 merges and this branch is rebased, the diff shrinks to the ~15 files shown here.

Key files (click to expand)

Testing Verification

  • TestGetConfig_FlexAlgo (new): render tests for flex-algo disabled, enabled with both constraint types (include-any and exclude), and link excluded from tagging — all pass
  • Test_resolveTenantColors (new): 5 table-driven cases covering fallback to unicast-default, missing topology, single/multiple known pubkeys, and unknown pubkey — all pass
  • TestE2E_IBRL, TestE2E_IBRL_WithAllocatedAddr, TestE2E_Multicast — all pass; flex-algo config is suppressed in e2e (no features.yaml present)
  • exclude constraint flex-algo syntax validated on physical lab switches (chi-dn-dzd5–dzd8)

@ben-malbeclabs ben-malbeclabs changed the base branch from bc/rfc18-pr1 to main April 10, 2026 17:52
@ben-malbeclabs ben-malbeclabs marked this pull request as ready for review April 10, 2026 17:52
@ben-malbeclabs ben-malbeclabs marked this pull request as draft April 10, 2026 17:53
@ben-malbeclabs ben-malbeclabs force-pushed the bc/rfc18-pr5 branch 2 times, most recently from b102bd9 to 33ab59c Compare April 15, 2026 00:43
@ben-malbeclabs ben-malbeclabs marked this pull request as ready for review April 15, 2026 04:00
elitegreg added a commit that referenced this pull request Apr 18, 2026
…y processors (#3497)

RFC-18 flex-algo · PR 1 of 5 · see `rfcs/rfc-0018-flex-algo.md`
Series: #3497 · #3512 · #3513 · #3514 · #3515

## Summary of Changes
- Adds `TopologyInfo` onchain account (RFC-18 flex-algo) with
`admin_group_bit`, `flex_algo_number`, and `TopologyConstraint`; creates
`AdminGroupBits` resource extension singleton for bit allocation
- Adds four new foundation-only instructions (107–110):
`CreateTopology`, `DeleteTopology`, `ClearTopology`, `BackfillTopology`
- Extends `Link` with `link_topologies` (Vec<Pubkey>) and `link_flags`
(LINK_FLAG_UNICAST_DRAINED); extends `Tenant` with `include_topologies`;
adds `flex_algo_node_segments` to Interface V2
- `CreateLink` now requires the unicast-default topology account and
auto-tags the link into it
- `UpdateLink` gains foundation-gated `link_topologies` update and
contributor-gated `unicast_drained` flag

## Diff Breakdown
| Category    | Files | Lines (+/-)   | Net   |
|-------------|-------|---------------|-------|
| Core logic  |    22 | +1,306 / -38  | +1,268|
| Scaffolding |    49 | +133 / -3     |  +130 |
| Tests       |    27 | +4,592 / -74  | +4,518|

The scaffolding is mechanical struct field additions (`link_topologies:
vec![]`, `include_topologies: vec![]`, `flex_algo_node_segments:
vec![]`) across CLI, SDK, activator, and client to keep the workspace
compiling. Core logic is concentrated in the new topology processors,
state types, and link/activate updates.

<details>
<summary>Key files (click to expand)</summary>

-
`smartcontract/programs/doublezero-serviceability/src/state/interface.rs`
— updates Interface V2 (discriminant 1) to include
`flex_algo_node_segments: Vec<FlexAlgoNodeSegment>`; V1 accounts
(pre-CYOA format) are left as-is; pre-RFC-18 V2 accounts on mainnet are
upgraded in-place by `MigrateDeviceInterfaces` before this
deserialization path is used
-
`smartcontract/programs/doublezero-serviceability/src/state/topology.rs`
— new `TopologyInfo` account: `admin_group_bit` (u8), `flex_algo_number`
(128+bit), `TopologyConstraint` (IncludeAny/Exclude)
-
`smartcontract/programs/doublezero-serviceability/src/processors/topology/`
— five new processors: create (allocates AdminGroupBit, validates
IdRange 1–127), delete, clear (removes topology from all links),
backfill (allocates FlexAlgoNodeSegment on all device interfaces)
-
`smartcontract/programs/doublezero-serviceability/src/processors/link/update.rs`
— adds foundation-gated `link_topologies` update (validates each
topology account onchain) and contributor-gated `unicast_drained` flag
(LINK_FLAG_UNICAST_DRAINED bit 0)
-
`smartcontract/programs/doublezero-serviceability/src/processors/link/activate.rs`
— requires `unicast_default_topology_account` as a mandatory account;
auto-tags new link into the unicast-default topology on activation
-
`smartcontract/programs/doublezero-serviceability/src/processors/globalconfig/set.rs`
— creates `AdminGroupBits` ResourceExtension PDA on global config
initialization
- `smartcontract/programs/doublezero-serviceability/src/state/link.rs` —
adds `link_topologies: Vec<Pubkey>`, `link_flags: u64`, and
`LINK_FLAG_UNICAST_DRAINED = 0x01`
- `smartcontract/programs/doublezero-serviceability/src/instructions.rs`
— registers four new instruction variants (CreateTopology=107,
DeleteTopology=108, ClearTopology=109, BackfillTopology=110)

</details>

## Testing Verification
- `make rust-lint` and `make rust-test` pass clean on this branch
- `topology_test.rs` (~2,400 lines) covers all four topology processors:
create/delete/clear/backfill, including error paths (duplicate names,
out-of-range bits, unauthorized signers, invalid topology accounts),
segment allocation, and multi-topology backfill
- Existing `link_wan_test.rs`, `tenant_test.rs`, and telemetry tests
updated to account for the new mandatory
`unicast_default_topology_account` in `ActivateLink`

---------

Co-authored-by: Greg Mitchell <greg@malbeclabs.com>
ben-malbeclabs added a commit that referenced this pull request Apr 21, 2026
RFC-18 flex-algo · PR 2 of 5 · see `rfcs/rfc-0018-flex-algo.md`
Series: #3497 · #3512 · #3513 · #3514 · #3515

## Summary of Changes
- Adds five `doublezero link topology` subcommands: `create`, `delete`, `clear`, `backfill`, `list`
- `topology create` allocates the admin-group bit and flex-algo number onchain, then automatically backfills `FlexAlgoNodeSegment` entries on all activated Vpnv4 loopback interfaces
- `topology backfill` and `topology clear` batch at 16 accounts per transaction to stay within Solana's 32-account limit; zero accounts → zero transactions sent
- `topology list` displays each topology's name, admin-group bit, flex-algo number, BGP color community, constraint, and count of tagged links
- Backs each subcommand with a Rust SDK command in `doublezero_sdk::commands::topology`
- Simplifies the `BackfillTopology` onchain processor: removes the two-pass SR ID pre-marking loop; keeping the `SegmentRoutingIds` resource in sync with off-chain allocations is the activator's responsibility

## Diff Breakdown
| Category    | Files | Lines (+/-)    | Net    |
|-------------|-------|----------------|--------|
| Core logic  |    11 | +1,572 / -32   | +1,540 |
| Scaffolding |     8 |   +91 /  -6    |   +85  |
| Tests       |     1 |   +25 / -16    |    +9  |

Core logic includes inline unit tests in the SDK command files.

<details>
<summary>Key files (click to expand)</summary>

- `smartcontract/sdk/rs/src/commands/topology/create.rs` — creates topology, then enumerates devices and calls `BackfillTopologyCommand` for those with Vpnv4 loopbacks; returns `CreateTopologyResult { signature, topology_pda, backfill_signatures }`
- `smartcontract/sdk/rs/src/commands/topology/backfill.rs` — batches devices in chunks of 16; returns `Vec<Signature>`; empty device list sends no transactions
- `smartcontract/sdk/rs/src/commands/topology/clear.rs` — batches links in chunks of 16; returns `Vec<Signature>`; empty link list sends no transactions
- `smartcontract/sdk/rs/src/commands/topology/list.rs` — fetches all `TopologyInfo` accounts and cross-references link accounts to compute per-topology link counts
- `smartcontract/cli/src/topology/create.rs` — CLI wrapper; prints PDA and backfill transaction count
- `smartcontract/cli/src/topology/list.rs` — tabular display of name, bit, algo number, BGP color, constraint, link count
- `smartcontract/programs/doublezero-serviceability/src/processors/topology/backfill.rs` — removes two-pass SR ID pre-marking; single-pass allocation only

</details>

## Testing Verification
- `cargo test -p doublezero_sdk -p doublezero_cli` passes
- `cargo clippy -- -D warnings` clean
- `make rust-fmt` applied
elitegreg added a commit that referenced this pull request Apr 22, 2026
  migrate command (#3513)

RFC-18 flex-algo · PR 3 of 5 · see `rfcs/rfc-0018-flex-algo.md`
Depends on: #3512 (PR 2)
Series: #3497 · #3512 · #3513 · #3514 · #3515

## Summary of Changes
- Extends `doublezero link get/list/update` to display topology
assignments (`link_topologies`) and drain status; adds `--link-topology`
(comma-separated topology names, `default` to clear all) and
`--unicast-drained` flags to `link update`; adds `--topology` filter to
`link list`
- Extends `doublezero tenant get/list/update` to display included
topologies (`include_topologies`) and adds `--include-topologies`
(comma-separated topology names, `default` to clear) to `tenant update`
- Adds `doublezero-admin migrate flex-algo [--dry-run]` command that
backfills link topology assignments and VPNv4 loopback flex-algo node
segments across all existing devices and links

## Diff Breakdown
| Category    | Files | Lines (+/-) | Net  |
|-------------|-------|-------------|------|
| Core logic  |    11 | +471 / -57  | +414 |
| Scaffolding |     7 | +16 / -5    |  +11 |

Mostly core logic — the migrate command alone is 180 lines of backfill
and dry-run orchestration.

<details>
<summary>Key files (click to expand)</summary>

- `controlplane/doublezero-admin/src/cli/migrate.rs` — new: `migrate
flex-algo` command; validates UNICAST-DEFAULT PDA exists, backfills link
topologies for all links, backfills flex-algo node segments for all
VPNv4 loopback interfaces on all devices; supports `--dry-run`
- `smartcontract/cli/src/link/list.rs` — adds `--topology <name>` filter
and displays `link_topologies`/`unicast_drained` columns, resolving
topology pubkeys to human-readable names
- `smartcontract/cli/src/link/get.rs` — displays topology assignments
and drain status; fetches topology map to resolve pubkeys to names
- `smartcontract/cli/src/link/update.rs` — adds `--link-topology`
(accepts comma-separated list of topology names; use `default` to clear)
and `--unicast-drained` flags
- `smartcontract/cli/src/tenant/list.rs` — displays `include_topologies`
column
- `smartcontract/cli/src/tenant/get.rs` — displays included topology
names
- `smartcontract/cli/src/tenant/update.rs` — adds `--include-topologies`
(accepts comma-separated list; use `default` to clear)

</details>

## Testing Verification
- `cargo test -p doublezero_sdk -p doublezero_cli -p doublezero-admin`
passes
- `cargo clippy -- -D warnings` clean
- `make rust-fmt` applied before commit

---------

Co-authored-by: Greg Mitchell <greg@malbeclabs.com>
Comment thread controlplane/controller/internal/controller/features_config.go
elitegreg pushed a commit that referenced this pull request Apr 23, 2026
RFC-18 flex-algo · PR 4 of 5 · see
[`rfcs/rfc-0018-flex-algo.md`](../tree/main/rfcs/rfc-0018-flex-algo.md)
Depends on: #3512 (PR 2), #3513 (PR 3)
Series: #3497 · #3512 · #3513 · #3514 · #3515

## Summary of Changes

- Python SDK: deserialize `TopologyConstraint`, `TopologyInfo`, and
`FlexAlgoNodeSegment`; read `flex_algo_node_segments` from the RFC-18
`InterfaceV3` account format
- TypeScript SDK: deserialize `FlexAlgoNodeSegment` from `InterfaceV3`
account format; guard the segments loop against pre-RFC-18 mainnet
accounts where the segment count reads garbage bytes
- SDK fixtures: regenerate device, link, tenant, and user fixtures to
include the new `InterfaceV3` fields
- TypeScript RPC client: remove the `AbortController` timeout wrapper
around `fetch` (was surfacing as spurious `TimeoutError` on slow
`getProgramData` responses); bump the compat test's request timeout to
120s
- `tryReadString` in `borsh-incremental`: check remaining buffer length
before reading to avoid out-of-bounds reads
- CHANGELOG / e2e compatibility test: document the mandatory CLI upgrade
boundary for RFC-18 `InterfaceV3` (devices written by the new program
cannot be deserialized by pre-RFC-18 CLI versions)

Note: activator changes that were originally planned for this PR have
been dropped — the doublezero activator is deprecated.

## Diff Breakdown

| Category   | Files | Lines (+/-) |
|------------|-------|-------------|
| Python SDK |     1 | +61 / -1    |
| TS SDK     |     3 | +27 / -3    |
| Fixtures   |     5 | +4 / -1 + bin |
| Docs/e2e   |     2 | +2 / -2     |

## Testing Verification

- Python SDK: `uv run pytest` passes under `sdk/serviceability/python/`,
including fixture deserialization
- TypeScript SDK: `bun test` passes under
`sdk/serviceability/typescript/`, including compat fixture
deserialization with the new `InterfaceV3` fields
- `cargo check --workspace` clean
Completes the Go SDK catch-up for PR3's onchain writes: Link now has
link_topologies and link_flags, Tenant has include_topologies. Also adds
TopologyInfo account type (IndexType=16, TopologyType=17) with
TopologyConstraint enum, ListTopologies client method, and fixture bytes
for the new trailing fields in the unit tests.
Adds RFC-18 flex-algo rendering to the controller's Arista EOS template,
gated behind a --features-config YAML toggle (flex_algo.enabled) that is
off by default. When enabled, the controller populates topology data
into the state cache, resolves per-tenant BGP color communities from
Tenant.include_topologies, and emits IS-IS flex-algo node segments and
color stamping blocks into tunnel.tmpl.

Includes golden-config fixtures for both enabled and disabled paths
(base.config.flex-algo.txt / base.config.flex-algo-disabled.txt) and
render tests covering each.
Copy link
Copy Markdown
Contributor

@nikw9944 nikw9944 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #3515: sdk/go,controller: add TopologyInfo deserialization and flex-algo controller support

Related: rfcs/rfc-0018-flex-algo.md

Tests pass, lint is clean. Overall this is solid work — the feature-gate pattern is well-designed, the cleanup/rollback path (disabled flex-algo emitting no commands) is thoughtful, and the test coverage on render paths is good.


High

1. CHANGELOG is stale after the latest commit
CHANGELOG.md:14 — Still says Add --features-config flag accepting a YAML file but the latest commit (7cb9ba85) removed the --features-config flag in favor of auto-loading /etc/doublezero-controller/features.yaml. The CHANGELOG should reflect the current behavior (auto-load, no flag).

2. CHANGELOG references nonexistent ListTopologies client method
CHANGELOG.md:16 — Says "Go serviceability SDK adds … ListTopologies client method" but no ListTopologies function exists anywhere in the codebase. The SDK adds the TopologyInfo type, deserialization, and the dispatch case in GetProgramData — but there is no standalone ListTopologies method. This should be corrected to avoid confusing readers.


Medium

3. LinkFlags bit 0 is a magic number
server.go:415link.LinkFlags&0x01 != 0 for UnicastDrained. A named constant (e.g., const LinkFlagUnicastDrained = 0x01) in the SDK's state.go would be more self-documenting and avoid magic bitmask values scattered across consumers.

4. No direct unit test for resolveTenantColors
server.go:780-798 — The resolveTenantColors function has meaningful branching logic (empty vs. non-empty includeTopologies, fallback to unicast-default) but is only tested indirectly through render tests that pre-compute TenantTopologyColors. A focused unit test would catch regressions in the fallback behavior (e.g., what happens when the topology map has no unicast-default entry, or when includeTopologies contains a pubkey not in the map).

5. Redundant condition in template
tunnel.tmpl:135 — The inner condition {{- if and .Ip.IsValid .IsPhysical .Metric .IsLink (not .IsSubInterfaceParent) (not .IsCYOA) (not .IsDIA) }} is identical to the outer condition at line 123. The inner block is always entered when the outer block is entered, so this is a no-op guard. Consider removing the redundant inner check for clarity, or add a comment explaining if it's intentional (e.g., defensive against future template restructuring).


Low

6. resolveTenantColors default fallback uses string matching
server.go:791strings.EqualFold(topo.Name, "unicast-default") — the name "unicast-default" is a well-known topology name used in multiple places. Consider extracting it as a named constant (e.g., const DefaultTopologyName = "unicast-default") to avoid string literal duplication.

7. exclude constraint topologies silently produce no flex-algo definition
tunnel.tmpl:349 — The {{- if eq .ConstraintStr "include-any" }} guard means topologies with exclude constraint skip the flex-algo N NAME definition block entirely. This is probably correct behavior, but a brief comment in the template explaining why exclude topologies don't get a flex-algo definition would help future readers.

Comment thread controlplane/controller/internal/controller/templates/tunnel.tmpl
Comment thread controlplane/controller/internal/controller/templates/tunnel.tmpl
Comment thread controlplane/controller/internal/controller/templates/tunnel.tmpl
@nikw9944
Copy link
Copy Markdown
Contributor

In my opinion as a followup you should add an e2e test showing that the correct config is generated in the following scenarios:

  • flex algo disabled
  • flex algo enabled
  • flex algo enabled but device is excluded

- Fix CHANGELOG: auto-load description and correct SDK method name
- Add LinkFlagUnicastDrained constant to sdk/go, use in controller
- Remove redundant inner if guard in tunnel.tmpl
- Extract defaultTopologyName constant in resolveTenantColors
- Add exclude constraint branch to flex-algo definition block
- Add unit tests for resolveTenantColors and flex-algo config rendering
Copy link
Copy Markdown
Contributor Author

@ben-malbeclabs ben-malbeclabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review. Addressed in the latest commit:

1 & 2 — CHANGELOG: Both entries corrected — auto-load description replaces the flag reference, and GetProgramData dispatch case replaces the nonexistent ListTopologies method.

3 — LinkFlags magic number: Added LinkFlagUnicastDrained constant to sdk/go/serviceability/state.go and updated server.go to use it.

4 — Unit test for resolveTenantColors: Added Test_resolveTenantColors with 5 table-driven cases covering: empty include_topologies falling back to unicast-default, empty with no unicast-default (returns empty string), single known pubkey, multiple known pubkeys in slice order, and unknown pubkey silently skipped.

5 — Redundant template condition: Removed the duplicate inner guard.

6 — "unicast-default" string literal: Extracted as const defaultTopologyName.

7 — exclude constraint topologies: exclude is a valid user-creatable topology type — operators can create one via doublezero topology create --constraint exclude, at which point the activator allocates a flex-algo number and color for it. Without a corresponding flex-algo definition pushed to the device, the color community would be stamped on BGP routes but the device would have no algorithm to resolve them against — a silent misconfiguration. Added the else if eq .ConstraintStr "exclude" branch and validated the syntax in the physical lab. TestGetConfig_FlexAlgo/flex_algo_enabled now verifies both constraint types render correctly.

@ben-malbeclabs
Copy link
Copy Markdown
Contributor Author

The latest commit also adds TestGetConfig_FlexAlgo with 3 render test cases covering the scenarios you mentioned: flex-algo disabled (asserts no router traffic-engineering and no traffic-engineering administrative-group), flex-algo enabled (asserts the full TE block including both include-any and exclude constraint definitions), and flex-algo enabled with a link in the exclude list (asserts no traffic-engineering administrative-group on that interface). These are config render tests rather than full cEOS container tests — the latter is tracked in #3355.

@ben-malbeclabs ben-malbeclabs changed the title sdk/go,controller: add TopologyInfo deserialization and flex-algo controller support sdk/go,controller: add TopologyInfo deserialization and flex-algo controller support Apr 29, 2026
@ben-malbeclabs ben-malbeclabs merged commit af55413 into main Apr 29, 2026
36 checks passed
@ben-malbeclabs ben-malbeclabs deleted the bc/rfc18-pr5 branch April 29, 2026 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants