Skip to content

Log selected model in Agent.Model() for alloy observability#2134

Open
derekmisler wants to merge 2 commits intodocker:mainfrom
derekmisler:log-selected-model-in-agent-model-for-alloy-obse
Open

Log selected model in Agent.Model() for alloy observability#2134
derekmisler wants to merge 2 commits intodocker:mainfrom
derekmisler:log-selected-model-in-agent-model-for-alloy-obse

Conversation

@derekmisler
Copy link
Contributor

@derekmisler derekmisler commented Mar 16, 2026

Summary

  • Adds slog.Info("Model selected", ...) to Agent.Model() so the chosen provider/model ID and pool size are logged on every model selection
  • Enables correlating which model handled each agent turn in the verbose log artifact, without needing API provider dashboard access
  • Particularly useful for alloy configurations where models are randomly selected — pool_size > 1 indicates alloy was in play

Motivation

Model usage metrics (model name, tokens, cost) are already tracked via telemetry events through the Marlin → Snowflake → Sigma pipeline, but that data is aggregate — useful for dashboarding trends, not for debugging a specific run.

This change surfaces model selection inline with the agent conversation in the verbose log artifact, making it possible to see exactly which model handled each agent turn during a particular workflow run. Different use case: debugging a specific run vs. dashboarding trends.

Test plan

  • Existing TestModelOverride and TestModelOverride_ConcurrentAccess pass
  • New TestModel_LogsSelection verifies the log line contains agent name, "Model selected" message, and correct pool size
  • Verify log lines appear in verbose artifact on a real workflow run

Closes https://github.com/docker/gordon/issues/236

Add slog.Info to Model() so the chosen provider/model ID and pool size
are emitted on every model selection. This surfaces in the verbose log
artifact, making it possible to see which model handled each agent turn
without needing API provider dashboard access.
@derekmisler derekmisler requested a review from a team as a code owner March 16, 2026 18:30
Copy link

@docker-agent docker-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Assessment: 🔴 CRITICAL

This PR adds valuable observability logging to track model selection, but introduces a critical bug that can cause runtime panics.

Critical Issues

1 HIGH severity bug found:

  • Missing validation for empty models slice can cause panic

2 MEDIUM severity issues found:

  • Incorrect pool_size logging when overrides are active
  • Insufficient test coverage for override scenarios

Recommendation

The logging feature is useful for alloy observability, but the panic risk must be addressed before merging. The pool_size logging issue should also be fixed to ensure accurate telemetry.


This review focuses only on code added or modified in this PR.

When model overrides are active, pool_size now reports the override
count instead of always reporting len(a.models). Also extends the
test to cover the override scenario.
@derekmisler
Copy link
Contributor Author

/review

Copy link

@docker-agent docker-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Assessment: 🔴 CRITICAL

Summary

Found 1 CRITICAL issue in the changed code that requires attention before merging.

The PR adds valuable observability by logging model selection details. However, the refactoring exposes a pre-existing panic risk when the models slice is empty.

Critical Issues

  • Panic risk: rand.Intn(0) will panic if a.models is empty (no models configured and no overrides)

Recommendation

Add defensive checks in the Model() method to handle empty model slices gracefully, or ensure validation at agent construction time prevents this scenario entirely.

@derekmisler derekmisler self-assigned this Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant