feat: MLflow metrics visualization, enhanced wait UI, and eval job links by mollyheamazon · Pull Request #5662 · aws/sagemaker-python-sdk

mollyheamazon · 2026-03-20T23:47:38Z

What's new

New public APIs (sagemaker.train)

get_studio_url — get a SageMaker Studio URL for any training job:

from sagemaker.train import get_studio_url

url = get_studio_url(training_job)                                          # TrainingJob object
url = get_studio_url("my-job-name")                                         # job name string
url = get_studio_url("arn:aws:sagemaker:us-west-2:123:training-job/my-job") # ARN string

get_mlflow_url — get a presigned MLflow experiment URL (valid 5 min):

from sagemaker.train import get_mlflow_url

url = get_mlflow_url(training_job)
url = get_mlflow_url("my-job-name")

plot_training_metrics — plot MLflow metrics from a completed training job in Jupyter (requires
sagemaker-train[notebook]):

from sagemaker.train import plot_training_metrics

plot_training_metrics(training_job)                          # all metrics
plot_training_metrics(training_job, metrics=["loss", "accuracy"])  # specific metrics

get_available_metrics — list available MLflow metrics for a job:

from sagemaker.train import get_available_metrics

metrics = get_available_metrics(training_job)
# ['loss', 'accuracy', 'eval_loss', ...]

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Enhancements

Training job wait() UI overhaul

Adds TrainingJob ARN row and clickable Console / Studio / CloudWatch links
MLflow experiment link with auto-refresh every 4 min (before 5-min presigned URL expiry)
Smarter render throttling — only re-renders on status change or every 2s, reducing flicker

Evaluation pipeline wait() UI

Pipeline execution Studio link in header
Per-step job ARN table with Console, Studio, and CloudWatch links for each pipeline step
Steps now rendered in chronological order (earliest first)

Loss metric detection broadened

Previously matched only exact total_loss; now matches any metric containing "loss" via
LOSS_METRIC_KEYWORDS, improving coverage across model families

Optional notebook dependencies

ipywidgets, rich, matplotlib added to optional extra — install with:

  pip install sagemaker-train[notebook]

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Bug fixes

resource_config null-safety: TrainingJob.wait() no longer crashes when resource_config is Unassigned
MlflowRunName removed from pipeline templates: The pipeline definition uses the MLflowConfiguration schema (pipeline-level), which only supports MlflowResourceArn and MlflowExperimentName. MlflowRunName belongs to MlflowConfig (training job-level) and is not a valid field in the pipeline definition — passing it was causing API validation errors.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Testing

15 new unit tests added in tests/unit/train/common_utils/test_metrics_visualizer.py covering _parse_job_arn, get_console_job_url, get_cloudwatch_logs_url, get_studio_url (object / ARN / job-name inputs), and get_available_metrics

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…aluation job

…k color to blue

mollyheamazon added 10 commits March 6, 2026 14:35

Intermediary checkpoint

0adef07

Evaluation job update

7b5f8b1

Fix studio domain mismatch for url, update text color, add link of ev…

ec76bff

…aluation job

Merge branch 'aws:master' into feat/mlflow-mc

c5bd46f

Add underscore to fine-tune and eval job links

0297d0a

Update link to console, conditionally display studio link, update lin…

b34d0d8

…k color to blue

Always show console link, conditional show studio link

1adce4f

Minor update to execution link names

928204b

Fix region issue for studio url

cd1658a

Revert notebook change to original

c6ba964

mollyheamazon temporarily deployed to auto-approve March 20, 2026 23:47 — with GitHub Actions Inactive

Address PR readiness

41ccc9c

mollyheamazon temporarily deployed to auto-approve March 21, 2026 00:07 — with GitHub Actions Inactive

mollyheamazon changed the title ~~Feat/mlflow mc~~ feat: MLflow metrics visualization, enhanced wait UI, and eval job links Mar 21, 2026

mollyheamazon marked this pull request as ready for review March 21, 2026 00:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: MLflow metrics visualization, enhanced wait UI, and eval job links#5662

feat: MLflow metrics visualization, enhanced wait UI, and eval job links#5662
mollyheamazon wants to merge 11 commits intoaws:masterfrom
mollyheamazon:feat/mlflow-mc

mollyheamazon commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mollyheamazon commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's new

Enhancements

Bug fixes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mollyheamazon commented Mar 20, 2026 •

edited

Loading