Skip to content

Adding AAA TACACS Support#176

Open
weneghawi wants to merge 10 commits intomainfrom
feature/aaa-tacacs-support
Open

Adding AAA TACACS Support#176
weneghawi wants to merge 10 commits intomainfrom
feature/aaa-tacacs-support

Conversation

@weneghawi
Copy link
Copy Markdown
Contributor

@weneghawi weneghawi commented Feb 4, 2026

Summary

Restructured the core AAA API to align with the OpenConfig system/aaa YANG model, making it vendor-agnostic and suitable for multi-vendor support (Nokia, Juniper, Arista, etc.). All Cisco NX-OS specific configuration has been moved to a dedicated AAAConfig provider CRD. RADIUS server group support has been added alongside the existing TACACS+ implementation.

Core API Changes (api/core/v1alpha1/aaa_types.go)

  • Server Groups: Replaced flat TACACSServers + TACACSGroup with ServerGroups []AAAServerGroup — protocol-agnostic containers with nested servers, following OpenConfig /system/aaa/server-groups/server-group. Supports both TACACS and RADIUS group types.
  • RADIUS Support: Added AAAServerRADIUS struct with authPort (default 1812), acctPort (default 1813), and keySecretRef
  • Method Lists: Flattened Authentication, Authorization, and Accounting to simple method lists (removed NX-OS specific nesting like Login.Default/Console and ConfigCommands)
  • Field Renames: VRF -> VrfName, SourceInterface -> SourceInterfaceName (leaves room for future object references)
  • Removed Cisco-specific fields: KeyEncryption, LoginErrorEnable moved to Cisco AAAConfig CRD
  • Added XValidation rules:
    • At least one of serverGroups, authentication, authorization, or accounting must be set
    • Servers in a TACACS group must have tacacs config
    • Servers in a RADIUS group must have radius config
    • groupName is required when method type is Group
    • deviceRef is immutable

Cisco AAAConfig CRD (api/cisco/nx/v1alpha1/aaaconfig_types.go)

  • Added ConsoleAuthentication *NXOSMethodList — NX-OS: aaa authentication login console
  • Added ConfigCommandsAuthorization *NXOSMethodList — NX-OS: aaa authorization config-commands default
  • Added RADIUSKeyEncryption type (Type6/Type7/Clear) with radiusKeyEncryption field (default Type7)
  • Retains KeyEncryption (Type6/Type7/Clear) and LoginErrorEnable

Controller (internal/controller/core/aaa_controller.go)

  • Updated secret loading to iterate nested ServerGroups[].Servers[].TACACS.KeySecretRef and ServerGroups[].Servers[].RADIUS.KeySecretRef
  • Updated secretToAAA watch mapping to trigger reconciliation on changes to both TACACS and RADIUS key secrets

NX-OS Provider (internal/provider/cisco/nxos/)

  • aaa.go: Added RadiusProvider, RadiusProviderGroup, RadiusProviderRef NX-OS DME structs. Added MapRADIUSKeyEncryption helper. Added groupTypeByName and MapRealmFromGroup to correctly resolve realm as "radius" or "tacacs" based on the referenced server group type. Removed read-only Name and Realm fields from AAADefaultAuthor (NX-OS rejects writes to these). Added MapNXOSRealm, MapNXOSLocal, MapNXOSFallback helpers. Note: RADIUS on NX-OS requires no feature flag (unlike TACACS+ which requires feature tacacs+).
  • provider.go: Rewrote EnsureAAA to iterate ServerGroups with a switch on group type covering both TACACS and RADIUS. Rewrote DeleteAAA with batched resets and RADIUS group/server cleanup. Changed from Patch to Update (netconf replace).

Sample YAML (config/samples/networking_v1alpha1_aaa.yaml)

  • Updated to demonstrate the new structure with serverGroups, nested servers, flat method lists, and separate Cisco AAAConfig with console/config-commands authorization.

Test Plan

  • go build ./... — compiles cleanly
  • go test ./api/... ./internal/provider/... ./internal/clientutil/... — all pass
  • make run-golangci-lint — 0 issues
  • make generate — CRDs and deepcopy regenerated
  • Verified full AAA lifecycle on NX-OS switch via gNMI (gnmic):
    • Enable TACACS+ feature, create servers + server group with VRF, source interface, server refs
    • Set default authentication, console authentication, config-commands authorization, accounting
    • Read back and verify all TACACS+ configuration
    • Reset all auth/authz/acct to local, delete group and servers, disable feature, verify clean state
    • Create RADIUS server + server group, set default auth realm to radius
    • Read back and verify all RADIUS configuration (confirmed: no feature flag needed on NX-OS)
    • Delete RADIUS group and server, verify clean state

@weneghawi weneghawi requested a review from a team as a code owner February 4, 2026 16:43
@hardikdr hardikdr added the area/metal-automation Automation processes within the Metal project. label Feb 5, 2026
@hardikdr hardikdr added this to Roadmap Feb 5, 2026
Copy link
Copy Markdown
Contributor

@felix-kaestner felix-kaestner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weneghawi Please have a look at the openconfig-system:system/aaa yang model. Only configurations that are part of this or are otherwise commonly found on all vendors (Nokia, Juniper, Arista & Co.) should be part of the core api. All Cisco NX-OS specific configuration should be refactored into a vendor specific provider config, see e.g. the ManagementAccess resource on how this is done. There is a separate api package for cisco specific CRDs.

@weneghawi
Copy link
Copy Markdown
Contributor Author

@weneghawi Please have a look at the openconfig-system:system/aaa yang model. Only configurations that are part of this or are otherwise commonly found on all vendors (Nokia, Juniper, Arista & Co.) should be part of the core api. All Cisco NX-OS specific configuration should be refactored into a vendor specific provider config, see e.g. the ManagementAccess resource on how this is done. There is a separate api package for cisco specific CRDs.

Done. The core API (api/core/v1alpha1/aaa_types.go) has been completely restructured to follow the OpenConfig system/aaa YANG model:

  • ServerGroups with nested Servers maps to /system/aaa/server-groups/server-group
  • AAAServerGroupType (TACACS/RADIUS) maps to server-group/config/type
  • Flat Authentication, Authorization, Accounting method lists map to /system/aaa/authentication, /authorization, /accounting

All Cisco NX-OS specific config has been moved to the AAAConfig CRD in api/cisco/nx/v1alpha1/, including: KeyEncryption, LoginErrorEnable, ConsoleAuthentication, and ConfigCommandsAuthorization.

@weneghawi weneghawi force-pushed the feature/aaa-tacacs-support branch from 14c783c to c4937c1 Compare February 17, 2026 23:35
@weneghawi weneghawi force-pushed the feature/aaa-tacacs-support branch 3 times, most recently from ce3748b to 9b58290 Compare February 18, 2026 15:53
}

// MapNXOSRealm maps an NX-OS method type string to NX-OS realm.
func MapNXOSRealm(methodType string) string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave out the 'NXOS' in the naming here, as this is already clear from the package nxos.

Ref/ https://google.github.io/styleguide/go/decisions.html#package-vs-exported-symbol-name

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to MapRealm, MapLocal, and MapFallback

Copy link
Copy Markdown
Contributor

@felix-kaestner felix-kaestner Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not consistently done. I still see NXOSMethodList, NXOSMethody, ...

@hardikdr hardikdr added area/switch-automation Automation processes for network switch management and operations. and removed area/metal-automation Automation processes within the Metal project. labels Mar 23, 2026
@weneghawi weneghawi force-pushed the feature/aaa-tacacs-support branch from 98588a4 to b202cf6 Compare March 26, 2026 11:09
@weneghawi weneghawi force-pushed the feature/aaa-tacacs-support branch from 5c5f3d9 to 3fb343d Compare March 27, 2026 13:40
@weneghawi weneghawi force-pushed the feature/aaa-tacacs-support branch from f177bb2 to cc8c8bd Compare April 6, 2026 14:32
Enable or disable the DHCP feature based on AdminState. When enabled,
configure DHCP relay on each referenced interface with the specified
server addresses.

The provider uses the VRF context from VrfRef (or the NXOS default
"!unspecified" if no VRF is specified) when configuring server
addresses.

The implementation uses the Update operation to ensure stale DHCP relay
entries are removed when the configuration changes. This also affects
entries referencing interfaces not managed by the operator.

The entire tree is removed on deletion, affecting non-managed interfaces.,
It leaves the DHCP feature in its current state.

GetDHCPRelayStatus queries the device for all interfaces with DHCP relay
configured and returns their names.
@weneghawi weneghawi force-pushed the feature/aaa-tacacs-support branch from cc8c8bd to 9e64d28 Compare April 6, 2026 14:38
@weneghawi
Copy link
Copy Markdown
Contributor Author

AAA TACACS/RADIUS gNMI Testing

Note: AAA default authentication (defaultauth-items), full end-to-end authentication flows, and controller reconciliation were tested previously on a local VM setup. The tests below cover the remaining gNMI paths validated directly on a QA device.

Device: 10.114.235.140 — Cisco Nexus 9332D-GX2B, NX-OS 10.6(2)
Access: SSH tunnel via netsec-admincluster-int.wdf.sap.corp, gNMI on localhost:9339


Baseline Read

gnmic -a localhost:9339 -u mooapi -p '***' --skip-verify \
  get --path 'System/fm-items/tacacsplus-items' \
      --path 'System/userext-items/authrealm-items/defaultauth-items' \
      --path 'System/userext-items/authrealm-items/consoleauth-items' \
      --path 'System/userext-items/tacacsext-items'

Result: TACACS disabled, all auth realm: local.


TACACS

Enable feature

gnmic -a localhost:9339 -u mooapi -p '***' --skip-verify \
  set --update-path 'System/fm-items/tacacsplus-items/adminSt' \
      --update-value '"enabled"'

Add server

gnmic -a localhost:9339 -u mooapi -p '***' --skip-verify \
  set --update-path 'System/userext-items/tacacsext-items/tacacsplusprovider-items/TacacsPlusProvider-list[name=192.0.2.1]' \
      --update-value '{"name":"192.0.2.1","port":49,"keyEnc":"7","key":"testkey","timeout":5}'

Create server group

gnmic -a localhost:9339 -u mooapi -p '***' --skip-verify \
  set --update-path 'System/userext-items/tacacsext-items/tacacsplusprovidergroup-items/TacacsPlusProviderGroup-list[name=test-group]' \
      --update-value '{"name":"test-group","providerref-items":{"ProviderRef-list":[{"name":"192.0.2.1"}]}}'

Cleanup

# Clear providerGroup reference first (required before group delete)
gnmic -a localhost:9339 -u mooapi -p '***' --skip-verify \
  set --update-path 'System/userext-items/authrealm-items/defaultauth-items' \
      --update-value '{"realm":"local","providerGroup":"","fallback":"yes","local":"yes"}'

gnmic -a localhost:9339 -u mooapi -p '***' --skip-verify \
  set --delete 'System/userext-items/tacacsext-items/tacacsplusprovidergroup-items/TacacsPlusProviderGroup-list[name=test-group]' \
      --delete 'System/userext-items/tacacsext-items/tacacsplusprovider-items/TacacsPlusProvider-list[name=192.0.2.1]'

gnmic -a localhost:9339 -u mooapi -p '***' --skip-verify \
  set --update-path 'System/fm-items/tacacsplus-items/adminSt' \
      --update-value '"disabled"'

All operations: ✅


RADIUS

Add server

gnmic -a localhost:9339 -u mooapi -p '***' --skip-verify \
  set --update-path 'System/userext-items/radiusext-items/radiusprovider-items/RadiusProvider-list[name=192.0.2.2]' \
      --update-value '{"name":"192.0.2.2","authPort":1812,"acctPort":1813,"keyEnc":"7","key":"testkey","timeout":5}'

Create server group

gnmic -a localhost:9339 -u mooapi -p '***' --skip-verify \
  set --update-path 'System/userext-items/radiusext-items/radiusprovidergroup-items/RadiusProviderGroup-list[name=test-radius-group]' \
      --update-value '{"name":"test-radius-group","providerref-items":{"ProviderRef-list":[{"name":"192.0.2.2"}]}}'

Cleanup

gnmic -a localhost:9339 -u mooapi -p '***' --skip-verify \
  set --delete 'System/userext-items/radiusext-items/radiusprovidergroup-items/RadiusProviderGroup-list[name=test-radius-group]' \
      --delete 'System/userext-items/radiusext-items/radiusprovider-items/RadiusProvider-list[name=192.0.2.2]'

All operations: ✅


AAA Authorization

gnmic -a localhost:9339 -u mooapi -p '***' --skip-verify \
  set --update-path 'System/userext-items/authrealm-items/defaultauthor-items/DefaultAuthor-list[cmdType=config]' \
      --update-value '{"cmdType":"config","localRbac":true}'

Result: ✅ — NX-OS correctly populates read-only name and realm fields server-side; confirmed these must not be sent in write requests.


AAA Accounting

gnmic -a localhost:9339 -u mooapi -p '***' --skip-verify \
  set --update-path 'System/userext-items/authrealm-items/defaultacc-items' \
      --update-value '{"realm":"local","localRbac":true}'

Result: ✅


AAA Console Auth

gnmic -a localhost:9339 -u mooapi -p '***' --skip-verify \
  set --update-path 'System/userext-items/authrealm-items/consoleauth-items' \
      --update-value '{"realm":"local","fallback":"yes","local":"yes"}'

Result: ✅


Notes

  • TACACS group delete fails if still referenced by defaultauth-items — must clear providerGroup first.
  • RADIUS requires no feature flag (unlike TACACS which needs feature tacacs+).
  • AAADefaultAuthor name and realm fields are read-only on NX-OS — sending them causes a write error.
  • AAA default auth (defaultauth-items) was intentionally not tested to avoid risk of lockout.

}

// NXOSMethod represents a single AAA method in an NX-OS context.
type NXOSMethod struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reuse the AAAMethod type from the core/v1alpha1 package.

if err := (&corecontroller.AAAReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
Recorder: mgr.GetEventRecorderFor("aaa-controller"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Recorder: mgr.GetEventRecorderFor("aaa-controller"),
Recorder: Recorder: mgr.GetEventRecorder("aaa-controller"),


// Recorder is used to record events for the controller.
// More info: https://book.kubebuilder.io/reference/raising-events
Recorder record.EventRecorder
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Recorder record.EventRecorder
Recorder events.EventRecorder

// +kubebuilder:rbac:groups=networking.metal.ironcore.dev,resources=aaa,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=networking.metal.ironcore.dev,resources=aaa/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=networking.metal.ironcore.dev,resources=aaa/finalizers,verbs=update
// +kubebuilder:rbac:groups=core,resources=events,verbs=create;patch
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// +kubebuilder:rbac:groups=core,resources=events,verbs=create;patch
// +kubebuilder:rbac:groups=events.k8s.io,resources=events,verbs=create;patch

// - https://ahmet.im/blog/controller-pitfalls/#reconcile-method-shape
func (r *AAAReconciler) Reconcile(ctx context.Context, req ctrl.Request) (_ ctrl.Result, reterr error) {
log := ctrl.LoggerFrom(ctx)
log.Info("Reconciling resource")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please adapt this controller to the logging scheme introduced in 4e1b59c.

if err := r.Locker.AcquireLock(ctx, device.Name, "aaa-controller"); err != nil {
if errors.Is(err, resourcelock.ErrLockAlreadyHeld) {
log.Info("Device is already locked, requeuing reconciliation")
return ctrl.Result{RequeueAfter: time.Second * 5}, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return ctrl.Result{RequeueAfter: time.Second * 5}, nil
return ctrl.Result{RequeueAfter: Jitter(time.Second), Priority: new(LockWaitPriorityDefault)}, nil

see 10af9d7.

Comment on lines +3132 to +3136
for _, server := range group.Servers {
if err := p.client.Delete(ctx, &TacacsPlusProvider{Name: server.Address}); err != nil {
return err
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do a single delete call here with all the servers, instead of doing so many individual gnmi requests?
Also you shouldn't assume that the spec contains all servers, that were ever created. So this DeleteAAA func should also take care of removing any potential leftover from previous reconciles.

Name: server.Address,
KeyEnc: MapKeyEncryption(cfg.Spec.KeyEncryption),
}
if server.TACACS != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still append the server to the updates later on despite server.TACACS == nil? This doesn't seem right.
Isn't it enforced that when the group is tacacs, all servers have to have that setting? Making this check here redundant?

Same also applies to the RADIUS servers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a test file, as done with the other configuration items including a golden cli script under the testdata/ directory, i.e. testdata/aaa.json.txt and the expected YANG json payload of that as testdata/aaa.json.

See how this is done on other items, e.g. the user.go/user_test.go/user.json/user.json.txt.

@weneghawi weneghawi force-pushed the feature/aaa-tacacs-support branch from 28c67ae to a0af584 Compare April 9, 2026 14:46
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Merging this branch will decrease overall coverage

Impacted Packages Coverage Δ 🤖
github.com/ironcore-dev/network-operator/api/cisco/nx/v1alpha1 0.00% (ø)
github.com/ironcore-dev/network-operator/api/core/v1alpha1 0.00% (ø)
github.com/ironcore-dev/network-operator/cmd 0.00% (ø)
github.com/ironcore-dev/network-operator/hack/provider 0.00% (ø)
github.com/ironcore-dev/network-operator/internal/controller/core 61.73% (-1.87%) 👎
github.com/ironcore-dev/network-operator/internal/deviceutil 40.79% (ø)
github.com/ironcore-dev/network-operator/internal/provider 52.00% (ø)
github.com/ironcore-dev/network-operator/internal/provider/cisco/nxos 9.63% (-0.58%) 👎
github.com/ironcore-dev/network-operator/test/e2e 0.00% (ø)
github.com/ironcore-dev/network-operator/test/lab 0.00% (ø)

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/ironcore-dev/network-operator/api/cisco/nx/v1alpha1/aaaconfig_types.go 0.00% (ø) 0 0 0
github.com/ironcore-dev/network-operator/api/cisco/nx/v1alpha1/zz_generated.deepcopy.go 0.00% (ø) 0 0 0
github.com/ironcore-dev/network-operator/api/core/v1alpha1/aaa_types.go 0.00% (ø) 0 0 0
github.com/ironcore-dev/network-operator/api/core/v1alpha1/zz_generated.deepcopy.go 0.00% (ø) 0 0 0
github.com/ironcore-dev/network-operator/cmd/main.go 0.00% (ø) 0 0 0
github.com/ironcore-dev/network-operator/hack/provider/main.go 0.00% (ø) 0 0 0
github.com/ironcore-dev/network-operator/internal/controller/core/aaa_controller.go 0.00% (ø) 139 (+139) 0 139 (+139)
github.com/ironcore-dev/network-operator/internal/deviceutil/deviceutil.go 40.79% (ø) 76 31 45
github.com/ironcore-dev/network-operator/internal/provider/cisco/nxos/aaa.go 0.00% (ø) 52 (+52) 0 52 (+52)
github.com/ironcore-dev/network-operator/internal/provider/cisco/nxos/ascii.go 100.00% (ø) 7 7 0
github.com/ironcore-dev/network-operator/internal/provider/cisco/nxos/provider.go 0.06% (-0.00%) 1720 (+89) 1 1719 (+89) 👎
github.com/ironcore-dev/network-operator/internal/provider/cisco/nxos/user.go 40.00% (ø) 40 16 24
github.com/ironcore-dev/network-operator/internal/provider/provider.go 52.00% (ø) 25 13 12
github.com/ironcore-dev/network-operator/test/e2e/util.go 0.00% (ø) 0 0 0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/ironcore-dev/network-operator/test/lab/main_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/switch-automation Automation processes for network switch management and operations.

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants