From b2f9db15c09aabb2d03a9036c125494a74b148b0 Mon Sep 17 00:00:00 2001
From: Ahmad Nader <ahmadnader@microsoft.com>
Date: Tue, 7 Apr 2026 18:22:20 +0200
Subject: [PATCH] docs(evaluation): add custom evaluator bug bash guide

Add bug-bash instructions and an AGENTS guide for the
custom evaluator evaluation samples.

Authored-by: GitHub Copilot Coding Agent
Model: GPT-5.4
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .../samples/custom_evaluators/AGENTS.md       |  46 +++
 .../samples/custom_evaluators/Bug-Bash.md     | 292 ++++++++++++++++++
 2 files changed, 338 insertions(+)
 create mode 100644 sdk/evaluation/azure-ai-evaluation/samples/custom_evaluators/AGENTS.md
 create mode 100644 sdk/evaluation/azure-ai-evaluation/samples/custom_evaluators/Bug-Bash.md

diff --git a/sdk/evaluation/azure-ai-evaluation/samples/custom_evaluators/AGENTS.md b/sdk/evaluation/azure-ai-evaluation/samples/custom_evaluators/AGENTS.md
new file mode 100644
index 000000000000..ba169ca52a03
--- /dev/null
+++ b/sdk/evaluation/azure-ai-evaluation/samples/custom_evaluators/AGENTS.md
@@ -0,0 +1,46 @@
+You are a bug bash automation assistant for the Azure AI Evaluation custom evaluators samples.
+
+Your job is to help the user run the bug bash in [Bug-Bash.md](c:/Users/ahmadnader/WSL/Ubuntu2404/azure-sdk-for-python/sdk/evaluation/azure-ai-evaluation/samples/custom_evaluators/Bug-Bash.md) end to end.
+
+When helping with this bug bash:
+
+- treat upload validation and evaluation-result correctness as equally important
+- use the bug bash document as the source of truth for scenarios, prerequisites, and expected outcomes
+- guide the user through environment setup, authentication, sample configuration, execution, and result validation
+- use these sample entry points when the user wants the provided samples:
+  - `sample_custom_eval_upload_simple.py`
+  - `sample_custom_eval_upload_advanced.py`
+- if the user provides a custom evaluator, confirm it follows these naming rules:
+  - class name format: `CustomNameEvaluator`
+  - file name format: `custom_name_evaluator.py`
+- if the user does not have their own project, suggest requesting access to the shared `np-int` project and using it for the bug bash
+- when relevant, provide the shared project URL: `https://ai.azure.com/nextgen/r/e0PPodqSSMyGXVSZRms7XA,naposani,,np-int,default/home`
+- when relevant, provide the shared project endpoint: `https://np-int.services.ai.azure.com/api/projects/default`
+- instruct the user to fetch the API key from the project URL before running the samples
+- instruct the user to explicitly fill in `FOUNDRY_MODEL_NAME` and `OPENAI_MODEL`; do not assume default model values
+- ask for or identify expected outputs before execution so result correctness can be validated after the run
+- verify not only that the evaluator uploads and runs, but that scores, labels, thresholds, reasoning, and custom properties match the evaluator definition
+- if the user wants automation, help run the SDK steps and summarize the observed results against the expected results
+- produce a concise bug bash report with: setup status, executed scenarios, pass/fail results, mismatches, and bugs to file
+
+Constraints:
+
+- do not claim success based only on upload or run completion
+- do not treat UI visibility as sufficient validation without checking the returned evaluation results
+- do not invent expected outputs; ask the user for them or derive them from the evaluator definition they provide
+- do not modify product code unless the user explicitly asks for code changes
+
+When asked to run the bug bash automatically, follow this sequence:
+
+1. Confirm prerequisites from the bug bash document.
+2. Confirm the project is in the INT environment hosted in Central US EUAP.
+3. If the user does not have their own project, suggest the shared `np-int` project and provide its URL and endpoint.
+4. Determine whether the user is using the provided samples or a user-defined custom evaluator.
+5. If using the provided samples, choose between `sample_custom_eval_upload_simple.py` and `sample_custom_eval_upload_advanced.py`.
+6. Instruct the user to fetch the API key from the project URL and explicitly fill in `FOUNDRY_MODEL_NAME` and `OPENAI_MODEL` before running the samples.
+7. If using a custom evaluator, verify the class and file naming pattern.
+8. Collect the expected outputs for a small validation dataset.
+9. Run the upload workflow.
+10. Run evaluation workflows through SDK and UI when requested.
+11. Compare actual outputs to expected outputs.
+12. Return a clear pass/fail summary and list any discrepancies.
\ No newline at end of file
diff --git a/sdk/evaluation/azure-ai-evaluation/samples/custom_evaluators/Bug-Bash.md b/sdk/evaluation/azure-ai-evaluation/samples/custom_evaluators/Bug-Bash.md
new file mode 100644
index 000000000000..4aa6a797e4a2
--- /dev/null
+++ b/sdk/evaluation/azure-ai-evaluation/samples/custom_evaluators/Bug-Bash.md
@@ -0,0 +1,292 @@
+## Welcome to Bug Bash for Azure AI Evaluation SDK
+
+### Bug Bash: Custom & Friendly Evaluator Upload (`azure-ai-projects` SDK)
+
+### Important Region Constraint
+- This feature only works for projects in the `INT` environment, which is hosted in the `Central US EUAP` region.
+- Projects in other regions are not supported for evaluator upload at this time.
+
+### Feature Overview
+This feature enables users to:
+
+- Upload custom evaluators with user-defined evaluation logic.
+- Upload friendly evaluators defined through evaluator metadata and prompt/scoring configuration.
+- Run evaluations via SDK after uploading evaluators through the SDK.
+- Upload evaluators through the SDK and then select and run them from Azure AI Studio UI.
+- Create an evaluation definition through the Evaluations REST API and monitor the run through API and/or Azure AI Studio.
+- Validate that evaluation outputs match the scoring logic, labels, thresholds, and other result fields defined by the evaluator.
+
+### Supported Workflows
+- `SDK -> SDK`: Upload evaluator using SDK and run evaluations using SDK.
+- `SDK -> UI`: Upload evaluator using SDK, then select and run the evaluator from Azure AI Studio UI.
+- `API -> API/UI`: Create an evaluation definition, including testing criteria, using the Evaluations REST API, then monitor results via API and/or Azure AI Studio.
+
+### Primary Validation Goal
+This bug bash is not limited to upload and registration. The main validation target is end-to-end correctness:
+
+- the evaluator uploads successfully
+- the evaluator can be selected and invoked successfully
+- the evaluation run completes successfully
+- the returned evaluation results match the evaluator definition
+- labels, scores, thresholds, reasoning, and other evaluator-defined properties are preserved correctly in the final results
+
+### Note
+- Uploading evaluators via UI is not supported, evaluator upload must be done through the SDK.
+
+### References
+The bug bash scenarios are based on the following SDK samples:
+
+- Simple custom evaluator upload sample:
+  [sample_custom_eval_upload_simple.py](https://github.com/Azure/azure-sdk-for-python/blob/feature/azure-ai-projects/2.0.2/sdk/ai/azure-ai-projects/samples/evaluations/sample_custom_eval_upload_simple.py)
+- Advanced custom evaluator upload sample:
+  [sample_custom_eval_upload_advanced.py](https://github.com/Azure/azure-sdk-for-python/blob/feature/azure-ai-projects/2.0.2/sdk/ai/azure-ai-projects/samples/evaluations/sample_custom_eval_upload_advanced.py)
+
+## Instructions
+
+### 1. Setup Virtualenv
+
+#### Recommended path
+```bash
+python -m venv .bugbashenv
+```
+
+#### Linux based
+```bash
+source .bugbashenv/bin/activate
+```
+
+#### Windows
+```bash
+.bugbashenv\Scripts\activate
+```
+
+### 2. Checkout the Branch and Locate the Samples
+```bash
+git clone https://github.com/Azure/azure-sdk-for-python.git
+cd azure-sdk-for-python
+git checkout feature/azure-ai-projects/2.0.2
+cd sdk/ai/azure-ai-projects/samples/evaluations
+```
+
+### 3. Install Dependencies
+
+Authenticate with Azure:
+
+```bash
+az login
+```
+
+Install the SDK package:
+
+```bash
+pip install azure-ai-projects
+```
+
+### 4. Confirm Project Prerequisites
+Make sure all of the following are true:
+
+- You have an Azure subscription with access to Azure AI Projects.
+- Your Azure AI Project is in the `INT` environment hosted in the `Central US EUAP` region.
+- The project is visible in `https://ai.azure.com`.
+- You are using Python 3.9 or later.
+- Your authentication method is ready, such as Azure CLI login or service principal.
+
+### 4.1 Shared Project Option
+If you do not have your own Azure AI Project available for the bug bash, you can use the shared `np-int` project that the team has been using.
+
+- ask the team for access to the `np-int` project if you do not already have it
+- use `np-int` as your project for upload, evaluation runs, and result validation if you do not have your own project
+- `np-int` is in the `INT` environment hosted in `Central US EUAP`
+- project URL: `https://ai.azure.com/nextgen/r/e0PPodqSSMyGXVSZRms7XA,naposani,,np-int,default/home`
+- project endpoint: `https://np-int.services.ai.azure.com/api/projects/default`
+- fetch the API key from the project URL before running the samples
+
+### 5. Azure AI Project Configuration
+Configure the sample with your project connection details before running it.
+
+At minimum, verify:
+
+- project endpoint or project connection settings
+- API key fetched from the project URL
+- model names are explicitly filled in by the user for the samples they run
+- evaluator name
+- evaluator version
+- evaluator definition or evaluator logic
+- test inputs with known expected outputs
+
+If you are using the shared project, configure the samples against `np-int` after access has been granted, verify that you can open the project URL successfully, use `https://np-int.services.ai.azure.com/api/projects/default` as the endpoint, and explicitly fill in the model variables required by the sample you are running.
+
+### 5.2 Required Variables For The Provided Samples
+For the provided samples, configure these values:
+
+- `FOUNDRY_PROJECT_ENDPOINT=https://np-int.services.ai.azure.com/api/projects/default`
+- `FOUNDRY_MODEL_NAME=<fill this in>`
+- `OPENAI_MODEL=<fill this in>`
+- fetch the API key from the shared project URL and use it for the sample that requires `OPENAI_API_KEY`
+
+Do not rely on defaults for model configuration. Fill in `FOUNDRY_MODEL_NAME` and `OPENAI_MODEL` explicitly before running the samples.
+
+### 5.1 Using Your Own Custom Evaluator
+Bug bash participants can use their own defined custom evaluators in addition to the provided samples.
+
+Use the following naming format:
+
+- evaluator class name: `CustomNameEvaluator`
+- evaluator file name: `custom_name_evaluator.py`
+
+Validate that:
+
+- the class name follows the `CustomNameEvaluator` pattern
+- the Python file follows the `custom_name_evaluator.py` pattern
+- the evaluator definition is consistent with the expected output fields you want to validate during the run
+
+### 6. Prepare Validation Inputs
+Before running the bug bash, prepare a small set of prompts or dataset rows where the expected evaluator output is known ahead of time.
+
+Examples:
+
+- an input that should clearly pass
+- an input that should clearly fail
+- an input that should produce a specific label
+- an input that should exercise threshold boundaries
+- an input that should produce evaluator-specific metadata or properties
+
+## Bug Bash Scenarios
+
+### Scenario 1: Upload Custom Evaluator via SDK
+
+#### Goal
+Validate uploading a custom evaluator, ensuring it is registered successfully, and confirming that evaluation runs return the expected outputs defined by that evaluator.
+
+This scenario applies both to the provided sample evaluator and to a user-defined custom evaluator that follows the required class and file naming format.
+
+#### Steps
+Open the sample:
+
+```bash
+python sample_custom_eval_upload_simple.py
+```
+
+Configure:
+
+- project connection details
+- evaluator name and version
+- custom evaluator logic
+- validation inputs with known expected results
+
+#### Expected Results
+- Evaluator upload completes without error.
+- Evaluator appears in SDK list APIs.
+- Evaluator appears in Azure AI Studio UI under Evaluators.
+- Evaluation runs using the uploaded evaluator complete successfully.
+- Output fields returned by the run match the evaluator definition.
+- Any evaluator-defined score, label, threshold, reasoning, or custom properties are returned with the expected values.
+
+#### What to Test
+- invalid evaluator schema
+- duplicate evaluator names
+- versioning behavior
+- deterministic inputs that should return known results
+- mismatches between evaluator logic and actual returned outputs
+- missing or incorrectly mapped result fields
+
+### Scenario 2: Upload Advanced Custom Evaluator via SDK
+
+#### Goal
+Validate uploading a more advanced custom evaluator definition and confirming that evaluation runs using it return the expected outputs from the configured definition.
+
+#### Steps
+Open the sample:
+
+```bash
+python sample_custom_eval_upload_advanced.py
+```
+
+Configure:
+
+- advanced evaluator metadata or configuration
+- evaluator logic and output definition
+- validation inputs with known expected results
+
+#### Expected Results
+- Advanced custom evaluator uploads successfully.
+- Evaluator is selectable in Azure AI Studio UI.
+- Evaluation runs complete successfully.
+- Returned labels, scores, thresholds, and reasoning align with the evaluator configuration.
+- Any additional evaluator-defined output fields are present and correct.
+
+#### What to Test
+- missing required fields
+- invalid evaluator configuration
+- evaluator discoverability in UI
+- incorrect scoring or labeling behavior
+- incorrect threshold application
+- result payloads that do not match the evaluator definition
+
+### Scenario 3: Run Evaluation via UI with SDK-Uploaded Evaluator
+
+#### Goal
+Ensure evaluators uploaded via SDK can be used in Azure AI Studio UI and that the final evaluation results are correct.
+
+#### Steps
+- Navigate to `https://ai.azure.com`.
+- Open the project in the `INT` environment hosted in `Central US EUAP`.
+- Go to Evaluations.
+- Create a new evaluation run.
+- Select the SDK-uploaded evaluator.
+- Choose a dataset or inputs.
+- Run the evaluation.
+- Compare the resulting outputs against the expected values from the evaluator definition.
+
+#### Expected Results
+- Evaluation run starts successfully.
+- Results are generated and visible in the UI.
+- Metrics match expectations from evaluator logic.
+- Result fields are not missing, renamed incorrectly, or assigned incorrect values.
+
+### Scenario 4: Validate Result Correctness End to End
+
+#### Goal
+Confirm that evaluator execution returns the proper results defined in the evaluator definition, not just that the run succeeds.
+
+#### Validation Checklist
+- score values are correct for the given test inputs
+- labels are correct for the given test inputs
+- thresholds are applied correctly
+- pass/fail outcomes are correct where applicable
+- reasoning or explanation fields are returned when defined
+- custom properties or evaluator-specific fields are returned when defined
+- results are consistent between SDK and UI views of the same evaluation run
+
+#### Failure Cases to Capture
+- run succeeds but returned values are wrong
+- expected fields are missing from results
+- field names differ from the evaluator definition
+- values are present but assigned to the wrong result fields
+- UI and SDK show different outputs for the same run
+
+## Known Limitations
+- Evaluator upload is SDK-only.
+- Feature is `INT` region only.
+- Uploading evaluators directly through the UI is not supported.
+
+## Feedback to Capture
+During bug bash, please report:
+
+- SDK usability issues
+- error message clarity
+- UI discoverability of evaluators
+- documentation gaps
+- region-related confusion or failures
+- support issues for user-defined evaluators authored and uploaded by customers
+- result correctness issues where the evaluation completed but the outputs did not match the evaluator definition
+- field mapping issues for labels, scores, thresholds, explanations, or custom properties
+
+## Reporting Bugs
+When filing bugs, include:
+
+- project region
+- evaluator used
+- SDK version
+- full error messages or stack traces
+- repro steps
\ No newline at end of file