Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
You are a bug bash automation assistant for the Azure AI Evaluation custom evaluators samples.

Your job is to help the user run the bug bash in [Bug-Bash.md](c:/Users/ahmadnader/WSL/Ubuntu2404/azure-sdk-for-python/sdk/evaluation/azure-ai-evaluation/samples/custom_evaluators/Bug-Bash.md) end to end.
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link to Bug-Bash.md is an absolute local filesystem path (c:/Users/...). This will be broken for anyone else and on GitHub. Use a repo-relative link (e.g., ./Bug-Bash.md) instead.

Suggested change
Your job is to help the user run the bug bash in [Bug-Bash.md](c:/Users/ahmadnader/WSL/Ubuntu2404/azure-sdk-for-python/sdk/evaluation/azure-ai-evaluation/samples/custom_evaluators/Bug-Bash.md) end to end.
Your job is to help the user run the bug bash in [Bug-Bash.md](./Bug-Bash.md) end to end.

Copilot uses AI. Check for mistakes.

When helping with this bug bash:

- treat upload validation and evaluation-result correctness as equally important
- use the bug bash document as the source of truth for scenarios, prerequisites, and expected outcomes
- guide the user through environment setup, authentication, sample configuration, execution, and result validation
- use these sample entry points when the user wants the provided samples:
- `sample_custom_eval_upload_simple.py`
- `sample_custom_eval_upload_advanced.py`
- if the user provides a custom evaluator, confirm it follows these naming rules:
- class name format: `CustomNameEvaluator`
- file name format: `custom_name_evaluator.py`
- if the user does not have their own project, suggest requesting access to the shared `np-int` project and using it for the bug bash
- when relevant, provide the shared project URL: `https://ai.azure.com/nextgen/r/e0PPodqSSMyGXVSZRms7XA,naposani,,np-int,default/home`
- when relevant, provide the shared project endpoint: `https://np-int.services.ai.azure.com/api/projects/default`
- instruct the user to fetch the API key from the project URL before running the samples
Comment on lines +17 to +19
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file embeds a shared project URL/endpoint and instructs users to fetch an API key. In a public repo this can leak internal environment details and encourages handling secrets in docs. Replace with placeholders (or a note to obtain these values via internal channels) and avoid hardcoding internal hostnames/IDs.

Copilot uses AI. Check for mistakes.
- instruct the user to explicitly fill in `FOUNDRY_MODEL_NAME` and `OPENAI_MODEL`; do not assume default model values
- ask for or identify expected outputs before execution so result correctness can be validated after the run
- verify not only that the evaluator uploads and runs, but that scores, labels, thresholds, reasoning, and custom properties match the evaluator definition
- if the user wants automation, help run the SDK steps and summarize the observed results against the expected results
- produce a concise bug bash report with: setup status, executed scenarios, pass/fail results, mismatches, and bugs to file

Constraints:

- do not claim success based only on upload or run completion
- do not treat UI visibility as sufficient validation without checking the returned evaluation results
- do not invent expected outputs; ask the user for them or derive them from the evaluator definition they provide
- do not modify product code unless the user explicitly asks for code changes

When asked to run the bug bash automatically, follow this sequence:

1. Confirm prerequisites from the bug bash document.
2. Confirm the project is in the INT environment hosted in Central US EUAP.
3. If the user does not have their own project, suggest the shared `np-int` project and provide its URL and endpoint.
4. Determine whether the user is using the provided samples or a user-defined custom evaluator.
5. If using the provided samples, choose between `sample_custom_eval_upload_simple.py` and `sample_custom_eval_upload_advanced.py`.
6. Instruct the user to fetch the API key from the project URL and explicitly fill in `FOUNDRY_MODEL_NAME` and `OPENAI_MODEL` before running the samples.
7. If using a custom evaluator, verify the class and file naming pattern.
8. Collect the expected outputs for a small validation dataset.
9. Run the upload workflow.
10. Run evaluation workflows through SDK and UI when requested.
11. Compare actual outputs to expected outputs.
12. Return a clear pass/fail summary and list any discrepancies.
Original file line number Diff line number Diff line change
@@ -0,0 +1,292 @@
## Welcome to Bug Bash for Azure AI Evaluation SDK

### Bug Bash: Custom & Friendly Evaluator Upload (`azure-ai-projects` SDK)

### Important Region Constraint
- This feature only works for projects in the `INT` environment, which is hosted in the `Central US EUAP` region.
- Projects in other regions are not supported for evaluator upload at this time.

### Feature Overview
This feature enables users to:

- Upload custom evaluators with user-defined evaluation logic.
- Upload friendly evaluators defined through evaluator metadata and prompt/scoring configuration.
- Run evaluations via SDK after uploading evaluators through the SDK.
- Upload evaluators through the SDK and then select and run them from Azure AI Studio UI.
- Create an evaluation definition through the Evaluations REST API and monitor the run through API and/or Azure AI Studio.
- Validate that evaluation outputs match the scoring logic, labels, thresholds, and other result fields defined by the evaluator.

### Supported Workflows
- `SDK -> SDK`: Upload evaluator using SDK and run evaluations using SDK.
- `SDK -> UI`: Upload evaluator using SDK, then select and run the evaluator from Azure AI Studio UI.
- `API -> API/UI`: Create an evaluation definition, including testing criteria, using the Evaluations REST API, then monitor results via API and/or Azure AI Studio.

### Primary Validation Goal
This bug bash is not limited to upload and registration. The main validation target is end-to-end correctness:

- the evaluator uploads successfully
- the evaluator can be selected and invoked successfully
- the evaluation run completes successfully
- the returned evaluation results match the evaluator definition
- labels, scores, thresholds, reasoning, and other evaluator-defined properties are preserved correctly in the final results

### Note
- Uploading evaluators via UI is not supported, evaluator upload must be done through the SDK.

### References
The bug bash scenarios are based on the following SDK samples:

- Simple custom evaluator upload sample:
[sample_custom_eval_upload_simple.py](https://github.com/Azure/azure-sdk-for-python/blob/feature/azure-ai-projects/2.0.2/sdk/ai/azure-ai-projects/samples/evaluations/sample_custom_eval_upload_simple.py)
- Advanced custom evaluator upload sample:
[sample_custom_eval_upload_advanced.py](https://github.com/Azure/azure-sdk-for-python/blob/feature/azure-ai-projects/2.0.2/sdk/ai/azure-ai-projects/samples/evaluations/sample_custom_eval_upload_advanced.py)
Comment on lines +40 to +42
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sample links are pinned to a feature branch (feature/azure-ai-projects/2.0.2). This is brittle and will break once the branch is deleted/renamed. Prefer relative links to files in this repo/branch, or pin to a tag/commit SHA that will remain valid.

Suggested change
[sample_custom_eval_upload_simple.py](https://github.com/Azure/azure-sdk-for-python/blob/feature/azure-ai-projects/2.0.2/sdk/ai/azure-ai-projects/samples/evaluations/sample_custom_eval_upload_simple.py)
- Advanced custom evaluator upload sample:
[sample_custom_eval_upload_advanced.py](https://github.com/Azure/azure-sdk-for-python/blob/feature/azure-ai-projects/2.0.2/sdk/ai/azure-ai-projects/samples/evaluations/sample_custom_eval_upload_advanced.py)
[sample_custom_eval_upload_simple.py](../../../../ai/azure-ai-projects/samples/evaluations/sample_custom_eval_upload_simple.py)
- Advanced custom evaluator upload sample:
[sample_custom_eval_upload_advanced.py](../../../../ai/azure-ai-projects/samples/evaluations/sample_custom_eval_upload_advanced.py)

Copilot uses AI. Check for mistakes.

## Instructions

### 1. Setup Virtualenv

#### Recommended path
```bash
python -m venv .bugbashenv
```

#### Linux based
```bash
source .bugbashenv/bin/activate
```

#### Windows
```bash
.bugbashenv\Scripts\activate
```

### 2. Checkout the Branch and Locate the Samples
```bash
git clone https://github.com/Azure/azure-sdk-for-python.git
cd azure-sdk-for-python
git checkout feature/azure-ai-projects/2.0.2
cd sdk/ai/azure-ai-projects/samples/evaluations
```
Comment on lines +65 to +69
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These steps require checking out a specific feature branch and running sample_custom_eval_upload_simple.py / advanced.py, but those sample files are not present in the current repo tree under sdk/ai/azure-ai-projects/samples/evaluations. As written, the instructions will fail for readers on main; either add the referenced samples or update the doc to point at the sample entry points that actually exist in this repo.

Copilot uses AI. Check for mistakes.

### 3. Install Dependencies

Authenticate with Azure:

```bash
az login
```

Install the SDK package:

```bash
pip install azure-ai-projects
```
Comment on lines +79 to +83
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc lives under azure-ai-evaluation samples, but the install step is for azure-ai-projects. This mismatch will confuse readers about which package they should be using. Consider moving this doc under the azure-ai-projects samples tree, or updating the instructions to use azure-ai-evaluation and its sample entry points.

Copilot uses AI. Check for mistakes.

### 4. Confirm Project Prerequisites
Make sure all of the following are true:

- You have an Azure subscription with access to Azure AI Projects.
- Your Azure AI Project is in the `INT` environment hosted in the `Central US EUAP` region.
- The project is visible in `https://ai.azure.com`.
- You are using Python 3.9 or later.
- Your authentication method is ready, such as Azure CLI login or service principal.

### 4.1 Shared Project Option
If you do not have your own Azure AI Project available for the bug bash, you can use the shared `np-int` project that the team has been using.

- ask the team for access to the `np-int` project if you do not already have it
- use `np-int` as your project for upload, evaluation runs, and result validation if you do not have your own project
- `np-int` is in the `INT` environment hosted in `Central US EUAP`
- project URL: `https://ai.azure.com/nextgen/r/e0PPodqSSMyGXVSZRms7XA,naposani,,np-int,default/home`
- project endpoint: `https://np-int.services.ai.azure.com/api/projects/default`
- fetch the API key from the project URL before running the samples

Comment on lines +99 to +103
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section hardcodes a shared internal project URL and internal service endpoint. If this repository is public, this should not be published and it will also become stale quickly. Use placeholders and direct participants to request access/values via internal channels; avoid guidance that implies retrieving or pasting API keys from URLs.

Copilot uses AI. Check for mistakes.
### 5. Azure AI Project Configuration
Configure the sample with your project connection details before running it.

At minimum, verify:

- project endpoint or project connection settings
- API key fetched from the project URL
- model names are explicitly filled in by the user for the samples they run
- evaluator name
- evaluator version
- evaluator definition or evaluator logic
- test inputs with known expected outputs

If you are using the shared project, configure the samples against `np-int` after access has been granted, verify that you can open the project URL successfully, use `https://np-int.services.ai.azure.com/api/projects/default` as the endpoint, and explicitly fill in the model variables required by the sample you are running.

### 5.2 Required Variables For The Provided Samples
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Section numbering is inconsistent: heading 5.2 appears before 5.1 later in the document. Please reorder/renumber to keep the steps sequential.

Copilot uses AI. Check for mistakes.
For the provided samples, configure these values:

- `FOUNDRY_PROJECT_ENDPOINT=https://np-int.services.ai.azure.com/api/projects/default`
- `FOUNDRY_MODEL_NAME=<fill this in>`
- `OPENAI_MODEL=<fill this in>`
- fetch the API key from the shared project URL and use it for the sample that requires `OPENAI_API_KEY`

Do not rely on defaults for model configuration. Fill in `FOUNDRY_MODEL_NAME` and `OPENAI_MODEL` explicitly before running the samples.

### 5.1 Using Your Own Custom Evaluator
Bug bash participants can use their own defined custom evaluators in addition to the provided samples.

Use the following naming format:

- evaluator class name: `CustomNameEvaluator`
- evaluator file name: `custom_name_evaluator.py`

Validate that:

- the class name follows the `CustomNameEvaluator` pattern
- the Python file follows the `custom_name_evaluator.py` pattern
- the evaluator definition is consistent with the expected output fields you want to validate during the run

### 6. Prepare Validation Inputs
Before running the bug bash, prepare a small set of prompts or dataset rows where the expected evaluator output is known ahead of time.

Examples:

- an input that should clearly pass
- an input that should clearly fail
- an input that should produce a specific label
- an input that should exercise threshold boundaries
- an input that should produce evaluator-specific metadata or properties

## Bug Bash Scenarios

### Scenario 1: Upload Custom Evaluator via SDK

#### Goal
Validate uploading a custom evaluator, ensuring it is registered successfully, and confirming that evaluation runs return the expected outputs defined by that evaluator.

This scenario applies both to the provided sample evaluator and to a user-defined custom evaluator that follows the required class and file naming format.

#### Steps
Open the sample:

```bash
python sample_custom_eval_upload_simple.py
```

Configure:

- project connection details
- evaluator name and version
- custom evaluator logic
- validation inputs with known expected results

#### Expected Results
- Evaluator upload completes without error.
- Evaluator appears in SDK list APIs.
- Evaluator appears in Azure AI Studio UI under Evaluators.
- Evaluation runs using the uploaded evaluator complete successfully.
- Output fields returned by the run match the evaluator definition.
- Any evaluator-defined score, label, threshold, reasoning, or custom properties are returned with the expected values.

#### What to Test
- invalid evaluator schema
- duplicate evaluator names
- versioning behavior
- deterministic inputs that should return known results
- mismatches between evaluator logic and actual returned outputs
- missing or incorrectly mapped result fields

### Scenario 2: Upload Advanced Custom Evaluator via SDK

#### Goal
Validate uploading a more advanced custom evaluator definition and confirming that evaluation runs using it return the expected outputs from the configured definition.

#### Steps
Open the sample:

```bash
python sample_custom_eval_upload_advanced.py
```

Configure:

- advanced evaluator metadata or configuration
- evaluator logic and output definition
- validation inputs with known expected results

#### Expected Results
- Advanced custom evaluator uploads successfully.
- Evaluator is selectable in Azure AI Studio UI.
- Evaluation runs complete successfully.
- Returned labels, scores, thresholds, and reasoning align with the evaluator configuration.
- Any additional evaluator-defined output fields are present and correct.

#### What to Test
- missing required fields
- invalid evaluator configuration
- evaluator discoverability in UI
- incorrect scoring or labeling behavior
- incorrect threshold application
- result payloads that do not match the evaluator definition

### Scenario 3: Run Evaluation via UI with SDK-Uploaded Evaluator

#### Goal
Ensure evaluators uploaded via SDK can be used in Azure AI Studio UI and that the final evaluation results are correct.

#### Steps
- Navigate to `https://ai.azure.com`.
- Open the project in the `INT` environment hosted in `Central US EUAP`.
- Go to Evaluations.
- Create a new evaluation run.
- Select the SDK-uploaded evaluator.
- Choose a dataset or inputs.
- Run the evaluation.
- Compare the resulting outputs against the expected values from the evaluator definition.

#### Expected Results
- Evaluation run starts successfully.
- Results are generated and visible in the UI.
- Metrics match expectations from evaluator logic.
- Result fields are not missing, renamed incorrectly, or assigned incorrect values.

### Scenario 4: Validate Result Correctness End to End

#### Goal
Confirm that evaluator execution returns the proper results defined in the evaluator definition, not just that the run succeeds.

#### Validation Checklist
- score values are correct for the given test inputs
- labels are correct for the given test inputs
- thresholds are applied correctly
- pass/fail outcomes are correct where applicable
- reasoning or explanation fields are returned when defined
- custom properties or evaluator-specific fields are returned when defined
- results are consistent between SDK and UI views of the same evaluation run

#### Failure Cases to Capture
- run succeeds but returned values are wrong
- expected fields are missing from results
- field names differ from the evaluator definition
- values are present but assigned to the wrong result fields
- UI and SDK show different outputs for the same run

## Known Limitations
- Evaluator upload is SDK-only.
- Feature is `INT` region only.
- Uploading evaluators directly through the UI is not supported.

## Feedback to Capture
During bug bash, please report:

- SDK usability issues
- error message clarity
- UI discoverability of evaluators
- documentation gaps
- region-related confusion or failures
- support issues for user-defined evaluators authored and uploaded by customers
- result correctness issues where the evaluation completed but the outputs did not match the evaluator definition
- field mapping issues for labels, scores, thresholds, explanations, or custom properties

## Reporting Bugs
When filing bugs, include:

- project region
- evaluator used
- SDK version
- full error messages or stack traces
- repro steps
Loading