.NET: Foundry Evals integration for .NET#4914
Draft
alliscode wants to merge 4 commits intomicrosoft:mainfrom
Draft
.NET: Foundry Evals integration for .NET#4914alliscode wants to merge 4 commits intomicrosoft:mainfrom
alliscode wants to merge 4 commits intomicrosoft:mainfrom
Conversation
Add evaluation framework with local and Foundry-hosted evaluator support: - EvalItem/EvalCheck/EvalChecks core types with IConversationSplitter - IAgentEvaluator interface and MeaiEvaluatorAdapter for MEAI bridge - FunctionEvaluator and LocalEvaluator for custom evaluation functions - FoundryEvals provider for Azure AI Foundry hosted evaluations - EvaluateAsync extension methods with expected values support - WorkflowEvaluationExtensions for multi-agent workflow evaluation - Unit tests and evaluation samples Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e74c569 to
b3684c0
Compare
- Make Microsoft.Extensions.AI.Evaluation package references conditional on net8.0+ in AI, AzureAI, and Workflows csproj files - Exclude Evaluation/**/*.cs from compilation on legacy TFMs (net472, netstandard2.0) since MEAI.Evaluation does not support them - Fix missing numRepetitions XML doc params in AgentEvaluationExtensions - Fix expectedOutput parameter name bug in BuildItemsFromResponses call Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
b3684c0 to
704e804
Compare
Code fixes: - Deduplicate ContentHarmEvaluator in BuildEvaluators (all safety names share one instance) - Throw ArgumentException on unknown evaluator names instead of silently ignoring - BuildEvalItem no longer mutates caller's messages list - AllPassed checks both SubResults and _items when SubResults is populated - Null guard for agent in sample finally blocks - Fix README type reference (Evaluators -> FoundryEvals) Test coverage: - BuildItemsFromResponses validation (mismatched queries/responses/expectedOutput/expectedToolCalls) - BuildEvaluators: quality names, safety deduplication, unknown name throws, default selection - AllPassed: empty items, SubResults with overall failure - BuildEvalItem: property correctness, input list not mutated - ExtractAgentData: empty events, matched pairs, unmatched invocations, completions without invocations, multiple agents, duplicate executor IDs, multiple rounds, null data, splitter propagation Infrastructure: - Made BuildItemsFromResponses, ExtractAgentData, BuildEvaluators internal for testability - Added InternalsVisibleTo for AI.UnitTests in AzureAI project - Added conditional AzureAI project reference in AI.UnitTests (net8.0+ only) - Added conditional compile exclusion for WorkflowEvaluationTests.cs on net472 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use item.Split() to separate query messages from the response, matching what FoundryEvals does. Previously the full conversation (including assistant turns) was passed as 'messages' alongside chatResponse, feeding duplicate assistant context to the evaluator and corrupting quality scores. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new sample,
FoundryAgents_Evaluations_Step03_AllPatterns, which demonstrates all evaluation patterns available in the Agent Framework for .NET. It also updates the existingFoundryAgents_Evaluations_Step02_SelfReflectionsample to use the new Azure AI Foundry project endpoints and simplifies the evaluator setup. The changes primarily focus on showcasing evaluation capabilities, including function evaluators, built-in checks, MEAI quality evaluators, Foundry (cloud-based) evaluators, mixed evaluators, pre-existing response evaluation, and conversation split strategies.Addition of comprehensive evaluation sample:
FoundryAgents_Evaluations_Step03_AllPatternswith a detailedProgram.csdemonstrating all major evaluation patterns (function evaluators, built-in checks, MEAI, Foundry, mixed, pre-existing response evaluation, and conversation split strategies), including a custom conversation splitter. [1] [2]README.md) for the new sample, summarizing its purpose, prerequisites, key types, and usage instructions.Updates to existing self-reflection evaluation sample:
Program.csto use Azure AI Foundry project endpoints and deployment names, simplifying the evaluator setup to derive everything from the project endpoint.Azure.AI.OpenAIin the.csprojfile, relying instead onAzure.AI.Projectsand related packages.Program.csto useAzure.AI.Projects.OpenAIinstead ofAzure.AI.OpenAI.Contribution Checklist