Skip to content

Wire slm.evaluate() into the train loop / studio / server #2

@patel-lyzr

Description

@patel-lyzr

slm.evaluate() (#1) ships the SDK + CLI surface, but nothing inside the product calls it yet — it's a leaf reachable only via slm.evaluate(...) and shadowlm eval. The task-quality number it produces is the "eval gate" the product thesis depends on ("run the shadow until it does the job as well as the frontier, then switch"), so it should be wired into the loop:

  • finetune() eval-on-holdout — pass an eval set, attach the task-quality score to TrainingRun alongside eval_loss. This is the actual eval gate.
  • Studio — surface the score in Runs / Playground.
  • /v1/evaluate endpoint — so the studio can evaluate.

None are blocking the merge of #1; this just tracks turning the primitive into something the loop consumes.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions