feat: add evaluation-level expectedOutput to EvaluationItem by Chibionos · Pull Request #1387 · UiPath/uipath-python

Chibionos · 2026-02-27T07:57:09Z

What

Adds an optional expectedOutput field at the evaluation item level in the v1.0 schema. Output-based evaluators (ExactMatch, JsonSimilarity, LLMJudgeOutput) can now read expected output from the evaluation itself, instead of requiring it to be duplicated in every evaluator's evaluationCriterias entry.

Why

Today, when three evaluators need the same expected output, you write it three times:

"evaluationCriterias": {
  "exact-match":       { "expectedOutput": { "result": 4 } },
  "json-similarity":   { "expectedOutput": { "result": 4 } },
  "llm-judge-output":  { "expectedOutput": { "result": 4 } }
}

With this change you write it once:

"expectedOutput": { "result": 4 },
"evaluationCriterias": {
  "exact-match": null,
  "json-similarity": null,
  "llm-judge-output": null
}

Per-evaluator override still works — if a criteria entry has its own expectedOutput, it wins.

How it works

Model — One new optional field on EvaluationItem:

expected_output: dict[str, Any] | str | None = Field(default=None, alias="expectedOutput")

Runtime — 15 lines of merge logic in _execute_eval() before criteria is passed to evaluators:

If the evaluator's criteria type extends OutputEvaluationCriteria and the evaluation has expectedOutput:
- Null criteria → inject {"expectedOutput": eval_item.expected_output}
- Criteria without expectedOutput → merge it in
- Criteria with expectedOutput → keep as-is (per-evaluator wins)
Non-output evaluators (Contains, ToolCall*, Trajectory) are completely untouched.

Backward compatibility

The new field defaults to None — existing eval-set JSONs parse and run without any changes.
Legacy evaluation sets are unaffected (different model, different migration path).
No changes needed in downstream repos (uipath-agents, uipath-langchain, uipath-runtime).

Tests

25 new tests across 7 test classes covering:

Model parsing (dict, string, null, missing, serialization roundtrip)
Runtime merge logic (null criteria, override, injection, non-output evaluators, edge cases)
Evaluator integration (ExactMatch, JsonSimilarity, LLMJudgeOutput)
Legacy migration compatibility
End-to-end flow simulation (mixed evaluator types, override precedence)

All 1574 tests pass (25 new + 1549 existing, zero regressions).

Jira

AE-1066

Spec

Confluence: Evaluation-Level ExpectedOutput Schema Enhancement

🤖 Generated with Claude Code

Add an optional `expectedOutput` field at the evaluation level so output-based evaluators can share a common expected output instead of duplicating it in every evaluator's criteria entry. Resolution order: 1. Per-evaluator criteria expectedOutput (highest priority) 2. Evaluation-level expectedOutput (fallback) 3. Evaluator config default / error (existing behavior) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add evaluation set JSON files exercising the new expectedOutput field on EvaluationItem and a matching testcase with run.sh + assert.py that validates scores from deterministic and LLM judge evaluators. Bump version to 2.11.0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

uipath-langchain==0.7.11 pins uipath>=2.10.0,<2.11.0, so bumping to 2.11.0 breaks the cross-compatibility testcase. The version bump should be coordinated with a uipath-langchain release. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Feb 27, 2026

Chibi Vikram and others added 2 commits February 27, 2026 00:03

fix: add type narrowing asserts for mypy strict checks

f73fda2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Chibionos force-pushed the feat/eval-level-expected-output branch from 6b5e80e to f73fda2 Compare February 27, 2026 08:04

Chibionos requested review from andrei-rusu and cristipufu February 27, 2026 08:18

andrei-rusu approved these changes Feb 27, 2026

View reviewed changes

Chibi Vikram and others added 2 commits February 27, 2026 01:12

Chibionos merged commit 40de0d3 into main Feb 27, 2026
104 of 117 checks passed

Chibionos deleted the feat/eval-level-expected-output branch February 27, 2026 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add evaluation-level expectedOutput to EvaluationItem#1387

feat: add evaluation-level expectedOutput to EvaluationItem#1387
Chibionos merged 4 commits intomainfrom
feat/eval-level-expected-output

Chibionos commented Feb 27, 2026 •

edited by atlassian bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Chibionos commented Feb 27, 2026 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How it works

Backward compatibility

Tests

Jira

Spec

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Chibionos commented Feb 27, 2026 •

edited by atlassian bot

Loading