-
Notifications
You must be signed in to change notification settings - Fork 3.3k
docs(evaluation): add custom evaluator bug bash guide #46183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| You are a bug bash automation assistant for the Azure AI Evaluation custom evaluators samples. | ||
|
|
||
| Your job is to help the user run the bug bash in [Bug-Bash.md](c:/Users/ahmadnader/WSL/Ubuntu2404/azure-sdk-for-python/sdk/evaluation/azure-ai-evaluation/samples/custom_evaluators/Bug-Bash.md) end to end. | ||
|
|
||
| When helping with this bug bash: | ||
|
|
||
| - treat upload validation and evaluation-result correctness as equally important | ||
| - use the bug bash document as the source of truth for scenarios, prerequisites, and expected outcomes | ||
| - guide the user through environment setup, authentication, sample configuration, execution, and result validation | ||
| - use these sample entry points when the user wants the provided samples: | ||
| - `sample_custom_eval_upload_simple.py` | ||
| - `sample_custom_eval_upload_advanced.py` | ||
| - if the user provides a custom evaluator, confirm it follows these naming rules: | ||
| - class name format: `CustomNameEvaluator` | ||
| - file name format: `custom_name_evaluator.py` | ||
| - if the user does not have their own project, suggest requesting access to the shared `np-int` project and using it for the bug bash | ||
| - when relevant, provide the shared project URL: `https://ai.azure.com/nextgen/r/e0PPodqSSMyGXVSZRms7XA,naposani,,np-int,default/home` | ||
| - when relevant, provide the shared project endpoint: `https://np-int.services.ai.azure.com/api/projects/default` | ||
| - instruct the user to fetch the API key from the project URL before running the samples | ||
|
Comment on lines
+17
to
+19
|
||
| - instruct the user to explicitly fill in `FOUNDRY_MODEL_NAME` and `OPENAI_MODEL`; do not assume default model values | ||
| - ask for or identify expected outputs before execution so result correctness can be validated after the run | ||
| - verify not only that the evaluator uploads and runs, but that scores, labels, thresholds, reasoning, and custom properties match the evaluator definition | ||
| - if the user wants automation, help run the SDK steps and summarize the observed results against the expected results | ||
| - produce a concise bug bash report with: setup status, executed scenarios, pass/fail results, mismatches, and bugs to file | ||
|
|
||
| Constraints: | ||
|
|
||
| - do not claim success based only on upload or run completion | ||
| - do not treat UI visibility as sufficient validation without checking the returned evaluation results | ||
| - do not invent expected outputs; ask the user for them or derive them from the evaluator definition they provide | ||
| - do not modify product code unless the user explicitly asks for code changes | ||
|
|
||
| When asked to run the bug bash automatically, follow this sequence: | ||
|
|
||
| 1. Confirm prerequisites from the bug bash document. | ||
| 2. Confirm the project is in the INT environment hosted in Central US EUAP. | ||
| 3. If the user does not have their own project, suggest the shared `np-int` project and provide its URL and endpoint. | ||
| 4. Determine whether the user is using the provided samples or a user-defined custom evaluator. | ||
| 5. If using the provided samples, choose between `sample_custom_eval_upload_simple.py` and `sample_custom_eval_upload_advanced.py`. | ||
| 6. Instruct the user to fetch the API key from the project URL and explicitly fill in `FOUNDRY_MODEL_NAME` and `OPENAI_MODEL` before running the samples. | ||
| 7. If using a custom evaluator, verify the class and file naming pattern. | ||
| 8. Collect the expected outputs for a small validation dataset. | ||
| 9. Run the upload workflow. | ||
| 10. Run evaluation workflows through SDK and UI when requested. | ||
| 11. Compare actual outputs to expected outputs. | ||
| 12. Return a clear pass/fail summary and list any discrepancies. | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,292 @@ | ||||||||||||||
| ## Welcome to Bug Bash for Azure AI Evaluation SDK | ||||||||||||||
|
|
||||||||||||||
| ### Bug Bash: Custom & Friendly Evaluator Upload (`azure-ai-projects` SDK) | ||||||||||||||
|
|
||||||||||||||
| ### Important Region Constraint | ||||||||||||||
| - This feature only works for projects in the `INT` environment, which is hosted in the `Central US EUAP` region. | ||||||||||||||
| - Projects in other regions are not supported for evaluator upload at this time. | ||||||||||||||
|
|
||||||||||||||
| ### Feature Overview | ||||||||||||||
| This feature enables users to: | ||||||||||||||
|
|
||||||||||||||
| - Upload custom evaluators with user-defined evaluation logic. | ||||||||||||||
| - Upload friendly evaluators defined through evaluator metadata and prompt/scoring configuration. | ||||||||||||||
| - Run evaluations via SDK after uploading evaluators through the SDK. | ||||||||||||||
| - Upload evaluators through the SDK and then select and run them from Azure AI Studio UI. | ||||||||||||||
| - Create an evaluation definition through the Evaluations REST API and monitor the run through API and/or Azure AI Studio. | ||||||||||||||
| - Validate that evaluation outputs match the scoring logic, labels, thresholds, and other result fields defined by the evaluator. | ||||||||||||||
|
|
||||||||||||||
| ### Supported Workflows | ||||||||||||||
| - `SDK -> SDK`: Upload evaluator using SDK and run evaluations using SDK. | ||||||||||||||
| - `SDK -> UI`: Upload evaluator using SDK, then select and run the evaluator from Azure AI Studio UI. | ||||||||||||||
| - `API -> API/UI`: Create an evaluation definition, including testing criteria, using the Evaluations REST API, then monitor results via API and/or Azure AI Studio. | ||||||||||||||
|
|
||||||||||||||
| ### Primary Validation Goal | ||||||||||||||
| This bug bash is not limited to upload and registration. The main validation target is end-to-end correctness: | ||||||||||||||
|
|
||||||||||||||
| - the evaluator uploads successfully | ||||||||||||||
| - the evaluator can be selected and invoked successfully | ||||||||||||||
| - the evaluation run completes successfully | ||||||||||||||
| - the returned evaluation results match the evaluator definition | ||||||||||||||
| - labels, scores, thresholds, reasoning, and other evaluator-defined properties are preserved correctly in the final results | ||||||||||||||
|
|
||||||||||||||
| ### Note | ||||||||||||||
| - Uploading evaluators via UI is not supported, evaluator upload must be done through the SDK. | ||||||||||||||
|
|
||||||||||||||
| ### References | ||||||||||||||
| The bug bash scenarios are based on the following SDK samples: | ||||||||||||||
|
|
||||||||||||||
| - Simple custom evaluator upload sample: | ||||||||||||||
| [sample_custom_eval_upload_simple.py](https://github.com/Azure/azure-sdk-for-python/blob/feature/azure-ai-projects/2.0.2/sdk/ai/azure-ai-projects/samples/evaluations/sample_custom_eval_upload_simple.py) | ||||||||||||||
| - Advanced custom evaluator upload sample: | ||||||||||||||
| [sample_custom_eval_upload_advanced.py](https://github.com/Azure/azure-sdk-for-python/blob/feature/azure-ai-projects/2.0.2/sdk/ai/azure-ai-projects/samples/evaluations/sample_custom_eval_upload_advanced.py) | ||||||||||||||
|
Comment on lines
+40
to
+42
|
||||||||||||||
| [sample_custom_eval_upload_simple.py](https://github.com/Azure/azure-sdk-for-python/blob/feature/azure-ai-projects/2.0.2/sdk/ai/azure-ai-projects/samples/evaluations/sample_custom_eval_upload_simple.py) | |
| - Advanced custom evaluator upload sample: | |
| [sample_custom_eval_upload_advanced.py](https://github.com/Azure/azure-sdk-for-python/blob/feature/azure-ai-projects/2.0.2/sdk/ai/azure-ai-projects/samples/evaluations/sample_custom_eval_upload_advanced.py) | |
| [sample_custom_eval_upload_simple.py](../../../../ai/azure-ai-projects/samples/evaluations/sample_custom_eval_upload_simple.py) | |
| - Advanced custom evaluator upload sample: | |
| [sample_custom_eval_upload_advanced.py](../../../../ai/azure-ai-projects/samples/evaluations/sample_custom_eval_upload_advanced.py) |
Copilot
AI
Apr 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These steps require checking out a specific feature branch and running sample_custom_eval_upload_simple.py / advanced.py, but those sample files are not present in the current repo tree under sdk/ai/azure-ai-projects/samples/evaluations. As written, the instructions will fail for readers on main; either add the referenced samples or update the doc to point at the sample entry points that actually exist in this repo.
Copilot
AI
Apr 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc lives under azure-ai-evaluation samples, but the install step is for azure-ai-projects. This mismatch will confuse readers about which package they should be using. Consider moving this doc under the azure-ai-projects samples tree, or updating the instructions to use azure-ai-evaluation and its sample entry points.
Copilot
AI
Apr 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section hardcodes a shared internal project URL and internal service endpoint. If this repository is public, this should not be published and it will also become stale quickly. Use placeholders and direct participants to request access/values via internal channels; avoid guidance that implies retrieving or pasting API keys from URLs.
Copilot
AI
Apr 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Section numbering is inconsistent: heading 5.2 appears before 5.1 later in the document. Please reorder/renumber to keep the steps sequential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link to Bug-Bash.md is an absolute local filesystem path (c:/Users/...). This will be broken for anyone else and on GitHub. Use a repo-relative link (e.g., ./Bug-Bash.md) instead.