From b216ad2055ac76a1fa5e7c1b371084642d6575b2 Mon Sep 17 00:00:00 2001 From: Changjian Wang Date: Wed, 22 Apr 2026 11:24:21 +0800 Subject: [PATCH 01/19] Add scripts for running samples and setting up environment for Azure AI Content Understanding Java SDK - Created `run_sample.sh` to compile and execute Java samples with Maven. - Added `setup_samples.sh` to check prerequisites, install SDK from Maven Central or build locally, and create a sample `.env` file. - Introduced `SKILL.md` for interactive setup of environment variables required for running samples. - Updated `README.md` to include information about new GitHub Copilot skills for environment setup and sample execution. --- .../skills/cu-sdk-common-knowledge/SKILL.md | 55 +++ .../skills/cu-sdk-java-sample-run/SKILL.md | 427 ++++++++++++++++++ .../scripts/run_sample.sh | 215 +++++++++ .../scripts/setup_samples.sh | 231 ++++++++++ .../skills/cu-sdk-java-setup-env/SKILL.md | 212 +++++++++ .../azure-ai-contentunderstanding/README.md | 37 ++ 6 files changed, 1177 insertions(+) create mode 100644 sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-common-knowledge/SKILL.md create mode 100644 sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/SKILL.md create mode 100644 sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh create mode 100644 sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh create mode 100644 sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-setup-env/SKILL.md diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-common-knowledge/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-common-knowledge/SKILL.md new file mode 100644 index 000000000000..e3524b401d2d --- /dev/null +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-common-knowledge/SKILL.md @@ -0,0 +1,55 @@ +--- +name: cu-sdk-common-knowledge +description: Domain knowledge for Azure AI Content Understanding. Use this skill to answer questions about Content Understanding concepts, analyzers, field schemas, API operations, and Java SDK usage. Always consult official documentation before answering. +--- + +# Azure AI Content Understanding Domain Knowledge + +This skill provides domain knowledge for Azure AI Content Understanding, a multimodal AI service that extracts semantic content from documents, video, audio, and image files. + +> **[COPILOT GUIDANCE]:** Always consult the official documentation first before answering user questions. Use `fetch_webpage` to read the relevant doc page when the reference material below is insufficient or may be outdated. +> +> When a user's question is broad or ambiguous, ask them to clarify: +> - "Which modality are you working with — documents, images, audio, or video?" +> - "Are you using a prebuilt analyzer, or building a custom one?" +> - "Are you asking about the Java SDK specifically, or the service in general?" + +## Official Documentation + +The authoritative source for Content Understanding is: **https://learn.microsoft.com/azure/ai-services/content-understanding/** + +Always read the relevant page (via `fetch_webpage`) before answering if the reference material below does not cover the topic. + +### Key Documentation Pages + +| Topic | URL | +|-------|-----| +| **Overview** | https://learn.microsoft.com/azure/ai-services/content-understanding/overview | +| **What's new** | https://learn.microsoft.com/azure/ai-services/content-understanding/whats-new | +| **Content Understanding Studio** | https://learn.microsoft.com/azure/ai-services/content-understanding/quickstart/content-understanding-studio?tabs=portal%2Ccu-studio | +| **Service limits** | https://learn.microsoft.com/azure/ai-services/content-understanding/service-limits | +| **Region & language support** | https://learn.microsoft.com/azure/ai-services/content-understanding/language-region-support | +| **Prebuilt analyzers** | https://learn.microsoft.com/azure/ai-services/content-understanding/concepts/prebuilt-analyzers | +| **Create custom analyzer** | https://learn.microsoft.com/azure/ai-services/content-understanding/tutorial/create-custom-analyzer?tabs=portal%2Cdocument&pivots=programming-language-java | +| **Document markdown** | https://learn.microsoft.com/azure/ai-services/content-understanding/document/markdown | +| **Document elements** | https://learn.microsoft.com/azure/ai-services/content-understanding/document/elements | +| **Video overview** | https://learn.microsoft.com/azure/ai-services/content-understanding/video/overview | +| **Video elements** | https://learn.microsoft.com/azure/ai-services/content-understanding/video/elements | +| **Audio overview** | https://learn.microsoft.com/azure/ai-services/content-understanding/audio/overview | +| **Image overview** | https://learn.microsoft.com/azure/ai-services/content-understanding/image/overview | +| **REST API reference** | https://learn.microsoft.com/rest/api/contentunderstanding/operation-groups | + +### Java SDK Resources + +| Resource | URL | +|----------|-----| +| **Maven Central** | https://central.sonatype.com/artifact/com.azure/azure-ai-contentunderstanding | +| **Java SDK README** | https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md | +| **Java SDK Samples** | https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples | + +> **Search tip:** If the above pages don't cover the user's question, search the doc tree at `https://learn.microsoft.com/azure/ai-services/content-understanding/`. + +## Related Skills + +- `cu-sdk-java-setup-env` — Set up environment variables for Java SDK samples +- `cu-sdk-java-sample-run` — Run specific Java SDK samples interactively diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/SKILL.md new file mode 100644 index 000000000000..f86fa3cd66a3 --- /dev/null +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/SKILL.md @@ -0,0 +1,427 @@ +--- +name: cu-sdk-java-sample-run +description: Run a specific sample for the Azure AI Content Understanding Java SDK. Use when users want to run a particular sample like Sample02_AnalyzeUrl or Sample03_AnalyzeInvoice. +--- + +# Run a Specific Sample + +Run a specific sample from the Azure AI Content Understanding Java SDK. + +> **[COPILOT INTERACTION MODEL]:** This skill is designed to be interactive. At each step marked with **[ASK USER]**, pause execution and prompt the user for input or confirmation before proceeding. Do NOT silently skip these prompts. Use the `ask_questions` tool when available. + +## Prerequisites + +- Java >= 8 (JDK) +- Maven +- SDK package available (public Maven Central or local build) +- Environment variables configured (via shell `export`) +- For prebuilt analyzers: model deployments configured (run `Sample00_UpdateDefaults` first) + +> **[ASK USER] Prerequisites check:** +> Before proceeding, verify the user's environment: +> 1. "Do you have **Java** and **Maven** installed?" -- If no, direct them to install JDK 8+ and Maven. +> 2. "Have you **built the SDK** or is it available on Maven Central?" -- If no, direct them to Step 2 below. +> 3. "Have you configured your **environment variables** (endpoint and credentials)?" -- If no, direct them to Step 3. +> 4. "Have you run `Sample00_UpdateDefaults` to configure model defaults?" -- If no and they want to use prebuilt analyzers, guide them to run it first. + +## Package Directory + +``` +sdk/contentunderstanding/azure-ai-contentunderstanding +``` + +## Available Samples + +All sync samples have async versions with an `Async` suffix. Samples are located in: + +``` +src/samples/java/com/azure/ai/contentunderstanding/samples/ +``` + +### Getting Started (Run These First) + +#### `Sample00_UpdateDefaults` -- Required First! +**One-time setup** - Configures model deployment mappings (GPT-4.1, GPT-4.1-mini, text-embedding-3-large) for your Microsoft Foundry resource. Must run before using prebuilt analyzers. + +#### `Sample02_AnalyzeUrl` -- Start Here! +Analyzes content from a URL using `prebuilt-documentSearch`. Works with documents, images, audio, and video. +- Key concepts: URL input, markdown extraction, multi-modal content + +#### `Sample01_AnalyzeBinary` +Analyzes local PDF/image files using `prebuilt-documentSearch`. +- Key concepts: Binary input, local file reading, page properties + +### Document Analysis + +#### `Sample03_AnalyzeInvoice` +Extracts structured fields from invoices using `prebuilt-invoice`. +- Key concepts: Field extraction (customer name, totals, dates, line items), confidence scores, array fields + +#### `Sample10_AnalyzeConfigs` +Extracts advanced features: charts, hyperlinks, formulas, annotations. +- Key concepts: Chart.js output, LaTeX formulas, PDF annotations, enhanced analysis options + +#### `Sample11_AnalyzeReturnRawJson` +Gets raw JSON response for custom processing. +- Key concepts: Raw response access, saving to file, debugging + +### Custom Analyzers + +#### `Sample04_CreateAnalyzer` +Creates custom analyzer with field schema for domain-specific extraction. +- Key concepts: Field types (string, number, date, object, array), extraction methods (extract, generate, classify) + +#### `Sample05_CreateClassifier` +Creates classifier to categorize documents (Loan_Application, Invoice, Bank_Statement). +- Key concepts: Content categories, segmentation, document routing + +#### `Sample16_CreateAnalyzerWithLabels` +Builds analyzers with training labels (labeled data from Azure Blob Storage). +- Key concepts: Labeled data, knowledge sources, Blob Storage SAS URIs + +### Analyzer Management + +#### `Sample06_GetAnalyzer` +Retrieves analyzer details and configuration. + +#### `Sample08_UpdateAnalyzer` +Updates analyzer description and tags. + +#### `Sample09_DeleteAnalyzer` +Deletes a custom analyzer. + +#### `Sample14_CopyAnalyzer` +Copies analyzer within the same resource. + +#### `Sample15_GrantCopyAuth` +Cross-resource copying between different Azure resources/regions. +- Requires additional env vars: `CONTENTUNDERSTANDING_TARGET_ENDPOINT`, `CONTENTUNDERSTANDING_TARGET_RESOURCE_ID` + +### Result Management + +#### `Sample12_GetResultFile` +Retrieves keyframe images from video analysis. +- Key concepts: Operation IDs, extracting generated files + +#### `Sample13_DeleteResult` +Deletes analysis results for data cleanup. +- Key concepts: Result retention (24-hour auto-deletion), compliance + +## Workflow + +### Step 1: Navigate to Package Directory + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding +``` + +### Step 2: Build the SDK Package + +The SDK package must be available for Maven to resolve. It will be published to **Maven Central** — if it's already available there, Maven will download it automatically and you can **skip this step**. + +If the package is **not yet published** (or you want to test local changes), build and install it to your local Maven repository. **Run from the azure-sdk-for-java repo root:** + +```bash +cd ~/repos/azure-sdk-for-java # or wherever you cloned the repo +mvn install -DskipTests -pl sdk/contentunderstanding/azure-ai-contentunderstanding -am +``` + +> **Important:** You must build from the repo root with `-pl` and `-am` flags. Building from within the package directory will fail because in-repo dependencies cannot be resolved without the `-am` (also-make) flag. + +> **[ASK USER] Build check:** +> Ask: "Is the package already published on Maven Central, or do you need to build locally?" +> - If published: Skip to Step 3. +> - If not published / unsure: Run `mvn install -DskipTests` above and confirm it shows `BUILD SUCCESS`. +> +> If the build fails, common fixes: +> - Missing JDK: ensure `java -version` shows JDK 8+ +> - Missing Maven: ensure `mvn -version` works +> - Parent POM not found: run `mvn install -DskipTests -f ../../parents/azure-client-sdk-parent/pom.xml` first + +
+Alternative: use the setup script (optional) + +The `setup_samples.sh` script automates this — it checks Maven Central first and falls back to a local build: + +```bash +.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh +``` + +Use `--local` to force local build: + +```bash +.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh --local +``` + +
+ +### Step 3: Configure Environment Variables + +> **[ASK USER] Configuration check:** +> Ask the user: "Do you already have your environment variables configured (`.env` file or exported in shell)?" +> - If yes: Skip to Step 4. +> - If no: Direct them to the `cu-sdk-java-setup-env` skill for interactive setup, or guide them through the steps below. + +Java samples read credentials from **OS environment variables** via `System.getenv()`. Unlike Python (`dotenv`) or JavaScript (`dotenv/config`), Java does not have a built-in `.env` loader — the variables must be present in the shell environment when the JVM starts. + +The recommended approach is to create a **`.env` file** and source it before running samples. + +> **Tip:** Use the `cu-sdk-java-setup-env` skill for an interactive walkthrough that creates your `.env` file step by step. + +**Create a `.env` file** in the package root (`sdk/contentunderstanding/azure-ai-contentunderstanding/.env`): + +``` +# Azure AI Content Understanding - Environment Variables + +# Required: Your Microsoft Foundry resource endpoint +CONTENTUNDERSTANDING_ENDPOINT=https://your-foundry.services.ai.azure.com/ + +# Optional: API key (leave empty to use DefaultAzureCredential via az login) +CONTENTUNDERSTANDING_KEY= + +# Model deployment names (used by Sample00_UpdateDefaults) +GPT_4_1_DEPLOYMENT=gpt-4.1 +GPT_4_1_MINI_DEPLOYMENT=gpt-4.1-mini +TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=text-embedding-3-large +``` + +**Then load it into your shell:** + +```bash +set -a && source .env && set +a +``` + +> **Note:** You must re-run `set -a && source .env && set +a` each time you open a new terminal or edit `.env`. + +
+Alternative: export variables directly (without .env file) + +**Linux / macOS:** + +```bash +export CONTENTUNDERSTANDING_ENDPOINT="https://your-foundry.services.ai.azure.com/" +export CONTENTUNDERSTANDING_KEY="" # Leave empty to use DefaultAzureCredential + +export GPT_4_1_DEPLOYMENT="gpt-4.1" +export GPT_4_1_MINI_DEPLOYMENT="gpt-4.1-mini" +export TEXT_EMBEDDING_3_LARGE_DEPLOYMENT="text-embedding-3-large" +``` + +**Windows (PowerShell):** + +```powershell +$env:CONTENTUNDERSTANDING_ENDPOINT = "https://your-foundry.services.ai.azure.com/" +$env:CONTENTUNDERSTANDING_KEY = "" # Leave empty to use DefaultAzureCredential + +$env:GPT_4_1_DEPLOYMENT = "gpt-4.1" +$env:GPT_4_1_MINI_DEPLOYMENT = "gpt-4.1-mini" +$env:TEXT_EMBEDDING_3_LARGE_DEPLOYMENT = "text-embedding-3-large" +``` + +
+ +> **[ASK USER] Provide endpoint:** +> Ask the user: "Please provide your **Microsoft Foundry endpoint URL**." +> - It should look like: `https://.services.ai.azure.com/` +> - If the user does not know where to find it: direct them to Azure Portal → Their Foundry resource → Keys and Endpoint. + +> **[ASK USER] Authentication method:** +> Ask the user: "How would you like to **authenticate** with Azure?" +> - **Option A: DefaultAzureCredential (recommended)** — Uses `az login` or managed identity. No API key needed. Make sure you have run `az login`. +> - **Option B: API Key** — Provide your `CONTENTUNDERSTANDING_KEY` from the Azure Portal → Keys and Endpoint → Key1 or Key2. + +> **[ASK USER] Confirm env vars:** +> After the user sets their variables, ask: "Does this configuration look correct?" Wait for confirmation before proceeding. + +#### Settings by sample + +| Setting | Required By | Description | +| ----------------------------------- | ---------------------- | ------------------------------------------------------------------------------------------------------------ | +| `CONTENTUNDERSTANDING_ENDPOINT` | **All samples** | Your Microsoft Foundry resource endpoint URL | +| `CONTENTUNDERSTANDING_KEY` | All samples (optional) | API key for key-based auth. If empty, `DefaultAzureCredential` is used (recommended — run `az login` first) | +| `GPT_4_1_DEPLOYMENT` | Sample00_UpdateDefaults| Deployment name for gpt-4.1 model (default: `gpt-4.1`) | +| `GPT_4_1_MINI_DEPLOYMENT` | Sample00_UpdateDefaults| Deployment name for gpt-4.1-mini model (default: `gpt-4.1-mini`) | +| `TEXT_EMBEDDING_3_LARGE_DEPLOYMENT` | Sample00_UpdateDefaults| Deployment name for text-embedding-3-large model (default: `text-embedding-3-large`) | + +| `CONTENTUNDERSTANDING_TARGET_ENDPOINT` | Sample15_GrantCopyAuth | Target Foundry resource endpoint for cross-resource copy | +| `CONTENTUNDERSTANDING_TARGET_RESOURCE_ID` | Sample15_GrantCopyAuth | Target ARM resource ID for cross-resource copy | + +#### Samples that need a local file + +The `Sample01_AnalyzeBinary` and `Sample10_AnalyzeConfigs` samples load a local file from `src/samples/resources/`. The default file paths are built into the samples. To use your own file, update the `filePath` variable in the sample code. + +> **[ASK USER] Local file (if applicable):** +> If the user chose a sample that requires a local file (Sample01_AnalyzeBinary, Sample10_AnalyzeConfigs), ask: +> "This sample requires a local document file. Would you like to:" +> - **Use the default test file** — The sample has a built-in file path under `src/samples/resources/`. +> - **Provide your own file** — You'll need to update the `filePath` variable in the sample code. + +#### Setting up Sample15_GrantCopyAuth cross-resource environment + +The `Sample15_GrantCopyAuth` sample requires **two separate Microsoft Foundry resources** (source and target). + +Add the following environment variables: + +```bash +export CONTENTUNDERSTANDING_TARGET_ENDPOINT="https://your-target-foundry.services.ai.azure.com/" +export CONTENTUNDERSTANDING_TARGET_RESOURCE_ID="/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.CognitiveServices/accounts/{targetAccountName}" +``` + +> **[ASK USER] Cross-resource setup (Sample15_GrantCopyAuth only):** +> If the user chose Sample15_GrantCopyAuth, ask: +> 1. "Do you have **two separate Microsoft Foundry resources** (source and target) set up?" — If no, guide them to create a second resource. +> 2. "Please provide the **target** resource endpoint URL and ARM Resource ID." +> 3. Confirm: "Both resources must have the **Cognitive Services User** role assigned if using `DefaultAzureCredential`. Is this configured?" + +### Step 4: Choose and Run the Sample + +> **[ASK USER] Which sample?:** +> Ask the user: "Which sample would you like to run?" with options: +> - `Sample00_UpdateDefaults` — Configure model defaults (one-time setup, required first) +> - `Sample02_AnalyzeUrl` — Analyze content from a URL (recommended for first-time users) +> - `Sample01_AnalyzeBinary` — Analyze a local PDF/image file +> - `Sample03_AnalyzeInvoice` — Extract structured fields from an invoice +> - `Sample04_CreateAnalyzer` — Create a custom analyzer +> - Other — Let me see the full list + +> **[ASK USER] Sync or async?:** +> Ask: "Would you like to run the **sync** or **async** version of this sample?" +> - Sync (default) — e.g., `Sample02_AnalyzeUrl` +> - Async — e.g., `Sample02_AnalyzeUrlAsync` + +Run the sample with Maven directly: + +```bash +# Make sure .env is loaded first (if not already done in Step 3) +set -a && source .env && set +a + +# From the package directory: sdk/contentunderstanding/azure-ai-contentunderstanding +mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrl" -Dexec.classpathScope=test +``` + +**More examples:** + +```bash +# Run async sample +mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrlAsync" -Dexec.classpathScope=test + +# Run update defaults (one-time setup) +mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample00_UpdateDefaults" -Dexec.classpathScope=test + +# Run invoice extraction +mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample03_AnalyzeInvoice" -Dexec.classpathScope=test +``` + +> **Note:** The `-Dexec.classpathScope=test` flag is **required**. Samples live in `src/samples/`, which is compiled as a test source root — not part of the main classpath. This is an Azure SDK for Java convention: samples are not shipped in the published JAR, and they depend on test-scoped dependencies (e.g., `azure-identity`). Without this flag, Maven cannot find the sample classes and will fail with `ClassNotFoundException`. + +> **Note:** Maven inherits the current shell's environment variables. `System.getenv()` in the sample code reads these values at runtime, so your `.env` must be sourced in the same terminal session before running `mvn`. + +
+Alternative: use the helper script (optional) + +The `run_sample.sh` script is a convenience wrapper around `mvn exec:java`. It resolves the class name, validates the sample exists, and optionally loads `.env` files. + +```bash +# Run a sample +.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl + +# Run with .env file (auto-loads environment variables into the shell) +.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl --env .env + +# List all available samples +.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh --list +``` + +
+ +> **[ASK USER] Sample result:** +> After running the sample, ask: "Did the sample run successfully?" +> - If yes: "Would you like to run another sample, or are you all set?" +> - If no: Help troubleshoot using the Troubleshooting section below. Common issues include missing environment variables, SDK not built, or model defaults not configured. + +> **[ASK USER] Run another?:** +> If the user wants to run another sample, loop back to the "Which sample?" prompt above. + +## Quick Reference + +### Most Common Samples for New Users + +1. **First-time setup** (run once per Foundry resource): + ```bash + mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample00_UpdateDefaults" -Dexec.classpathScope=test + ``` + +2. **Analyze a document from URL:** + ```bash + mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrl" -Dexec.classpathScope=test + ``` + +3. **Analyze a local PDF file:** + ```bash + mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample01_AnalyzeBinary" -Dexec.classpathScope=test + ``` + +4. **Extract invoice fields:** + ```bash + mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample03_AnalyzeInvoice" -Dexec.classpathScope=test + ``` + +## Scripts (Optional) + +Helper scripts are provided in `scripts/` as a convenience. They are **not required** — you can always use `mvn exec:java` directly. + +### `setup_samples.sh` -- Automated Environment Setup + +Checks Maven Central for the published package, falls back to local build, and creates a `.env` template. + +```bash +# Default: try Maven Central, fall back to local build +.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh + +# Force local build (e.g., testing local changes) +.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh --local + +# Local mode: skip build if already built +.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh --local --skip-build +``` + +### `run_sample.sh` -- Run a Sample with Conveniences + +Wraps `mvn exec:java` with sample name resolution, validation, and optional `.env` loading. + +```bash +# Run a sample (resolves class name automatically) +.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl + +# Load env vars from .env file before running +.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl --env .env + +# List available samples +.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh --list + +# Dry run (show what would be executed) +.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl --dry-run +``` + +## Troubleshooting + +| Error | Solution | +|-------|----------| +| `BUILD FAILURE` during compile | Ensure JDK 8+ and Maven are installed; run `mvn install -DskipTests` from the package directory | +| `ClassNotFoundException` or `NoClassDefFoundError` | Add `-Dexec.classpathScope=test` to the `mvn exec:java` command. Samples are compiled as test sources (Azure SDK convention) and are not on the main classpath. If still failing, rebuild with: `mvn compile test-compile` | +| `CONTENTUNDERSTANDING_ENDPOINT` is null | Set the environment variable: `export CONTENTUNDERSTANDING_ENDPOINT="https://..."` | +| `Access denied` or authorization errors | Ensure **Cognitive Services User** role is assigned; check API key or run `az login` | +| `Model deployment not found` | Run `Sample00_UpdateDefaults` first to configure model mappings | +| `FileNotFoundException` for binary samples | Run samples from the package root directory (`sdk/contentunderstanding/azure-ai-contentunderstanding`) | +| `Parent POM not resolved` | Run `mvn install -DskipTests -f ../../parents/azure-client-sdk-parent/pom.xml` first | +| `Permission denied` when running scripts | Make scripts executable: `chmod +x .github/skills/cu-sdk-java-sample-run/scripts/*.sh` | + +## Related Skills + +- `cu-sdk-java-setup-env` — Interactive .env file setup + +## Additional Resources + +- [SDK README](../../../README.md) — Full SDK documentation +- [Product Documentation](https://learn.microsoft.com/azure/ai-services/content-understanding/) +- [Azure SDK for Java Contributing Guide](https://github.com/Azure/azure-sdk-for-java/blob/main/CONTRIBUTING.md) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh new file mode 100644 index 000000000000..7a2cec318099 --- /dev/null +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh @@ -0,0 +1,215 @@ +#!/usr/bin/env bash +set -euo pipefail + +# run_sample.sh +# Run a specific Java sample for the Azure AI Content Understanding SDK. +# Compiles the samples module (if needed) and runs the specified sample class +# using mvn exec:java. +# +# Usage: +# run_sample.sh [--env ] [--dry-run] +# Examples: +# run_sample.sh Sample02_AnalyzeUrl +# run_sample.sh Sample02_AnalyzeUrlAsync +# run_sample.sh Sample02_AnalyzeUrl --env .env +# run_sample.sh --list + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +# Package root is 4 levels up from scripts: .github/skills/cu-sdk-java-sample-run/scripts -> package root +PACKAGE_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)" +SAMPLES_DIR="$PACKAGE_ROOT/src/samples/java/com/azure/ai/contentunderstanding/samples" +PACKAGE="com.azure.ai.contentunderstanding.samples" + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +# Defaults +DRY_RUN=0 +ENV_FILE="" +SAMPLE_NAME="" + +print_info() { echo -e "${BLUE}$1${NC}"; } +print_success() { echo -e "${GREEN}$1${NC}"; } +print_warning() { echo -e "${YELLOW}$1${NC}"; } +print_error() { echo -e "${RED}$1${NC}"; } + +print_help() { + cat < [OPTIONS] + +Run a specific Java sample for the Azure AI Content Understanding SDK. + +Arguments: + Sample class name (e.g., Sample02_AnalyzeUrl). + The .java extension is optional. + +Options: + --env Load environment variables from the given .env file before running. + --dry-run Print what would be executed without running. + --list List available samples and exit. + --help, -h Show this help message. + +Examples: + $(basename "$0") Sample02_AnalyzeUrl + $(basename "$0") Sample02_AnalyzeUrlAsync + $(basename "$0") Sample02_AnalyzeUrl --env .env + $(basename "$0") --list +EOF +} + +list_samples() { + echo "" + print_info "=== Available Sync Samples ===" + for f in "$SAMPLES_DIR"/Sample*.java; do + [ -f "$f" ] || continue + local name + name="$(basename "$f" .java)" + # Skip async samples in this section + [[ "$name" == *Async ]] && continue + echo " $name" + done + echo "" + print_info "=== Available Async Samples ===" + for f in "$SAMPLES_DIR"/Sample*Async.java; do + [ -f "$f" ] || continue + local name + name="$(basename "$f" .java)" + echo " $name" + done + echo "" +} + +# Load environment variables from a .env file +load_env_file() { + local envfile="$1" + if [[ ! -f "$envfile" ]]; then + print_error "Error: .env file not found: $envfile" + exit 1 + fi + print_info "Loading environment variables from: $envfile" + set -o allexport + # Read .env, skip comments and blank lines + while IFS= read -r line || [[ -n "$line" ]]; do + # Skip empty lines and comments + [[ -z "$line" || "$line" =~ ^[[:space:]]*# ]] && continue + # Remove surrounding quotes from values + eval "export $line" 2>/dev/null || true + done < "$envfile" + set +o allexport + print_success "✓ Environment variables loaded" +} + +# Parse arguments +while [[ $# -gt 0 ]]; do + case "$1" in + --help|-h) + print_help + exit 0 + ;; + --list|-l) + list_samples + exit 0 + ;; + --dry-run) + DRY_RUN=1 + shift + ;; + --env) + if [[ -z "${2:-}" ]]; then + print_error "Error: --env requires a file path argument" + exit 1 + fi + ENV_FILE="$2" + shift 2 + ;; + -*) + print_error "Unknown option: $1" + print_help + exit 1 + ;; + *) + if [[ -z "$SAMPLE_NAME" ]]; then + SAMPLE_NAME="$1" + else + print_error "Error: Multiple samples specified. Only one sample is supported." + exit 1 + fi + shift + ;; + esac +done + +if [[ -z "$SAMPLE_NAME" ]]; then + print_error "Error: No sample name provided" + echo "" + print_help + exit 1 +fi + +# Normalize: strip .java extension if provided +SAMPLE_NAME="${SAMPLE_NAME%.java}" + +# Verify the sample file exists +SAMPLE_FILE="$SAMPLES_DIR/${SAMPLE_NAME}.java" +if [[ ! -f "$SAMPLE_FILE" ]]; then + print_error "Error: Sample not found: $SAMPLE_FILE" + echo "" + echo "Did you mean one of these?" + ls "$SAMPLES_DIR"/Sample*.java 2>/dev/null | xargs -n1 basename | sed 's/\.java$//' | grep -i "${SAMPLE_NAME}" | head -5 || true + echo "" + echo "Run '$(basename "$0") --list' to see all available samples" + exit 1 +fi + +FULL_CLASS="${PACKAGE}.${SAMPLE_NAME}" + +echo "" +print_info "=== Run Java Sample ===" +echo "Package root: $PACKAGE_ROOT" +echo "Sample class: $FULL_CLASS" +echo "Sample file: $SAMPLE_FILE" +echo "" + +# Navigate to package root +cd "$PACKAGE_ROOT" + +# Load .env file if specified +if [[ -n "$ENV_FILE" ]]; then + # Resolve relative path from original cwd + if [[ "$ENV_FILE" != /* ]]; then + ENV_FILE="$PACKAGE_ROOT/$ENV_FILE" + fi + load_env_file "$ENV_FILE" + echo "" +fi + +# Check for required environment variable +if [[ -z "${CONTENTUNDERSTANDING_ENDPOINT:-}" ]]; then + print_warning "⚠ CONTENTUNDERSTANDING_ENDPOINT is not set. Most samples will fail without it." + echo " Set it with: export CONTENTUNDERSTANDING_ENDPOINT=\"https://your-foundry.services.ai.azure.com/\"" + echo " Or use: $(basename "$0") $SAMPLE_NAME --env .env" + echo "" +fi + +# Build command +MVN_CMD="mvn exec:java -Dexec.mainClass=\"${FULL_CLASS}\" -Dexec.classpathScope=test" + +if [[ $DRY_RUN -eq 1 ]]; then + echo "DRY RUN: would execute:" + echo " cd $PACKAGE_ROOT" + [[ -n "${ENV_FILE:-}" ]] && echo " (env loaded from $ENV_FILE)" + echo " $MVN_CMD" + exit 0 +fi + +# Run the sample +print_info "Running: $SAMPLE_NAME" +echo "" +mvn exec:java -Dexec.mainClass="${FULL_CLASS}" -Dexec.classpathScope=test + +echo "" +print_success "✓ Sample completed: $SAMPLE_NAME" diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh new file mode 100644 index 000000000000..3356204c2087 --- /dev/null +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh @@ -0,0 +1,231 @@ +#!/usr/bin/env bash +set -euo pipefail + +# setup_samples.sh +# Sets up the environment for running Azure AI Content Understanding Java SDK samples. +# This includes: +# 1. Check Java and Maven are installed +# 2. Try resolving the SDK package from Maven Central (if published) +# 3. If not available, fall back to building locally with mvn install +# 4. Create a sample .env file if none exists +# +# Usage: +# setup_samples.sh [--local] [--skip-build] +# Options: +# --local Force local build (skip Maven Central check) +# --skip-build Skip building even in local mode (assumes already built) + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PACKAGE_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)" + +GROUP_ID="com.azure" +ARTIFACT_ID="azure-ai-contentunderstanding" +# Extract version from pom.xml +VERSION="$(grep -m1 '' "$PACKAGE_ROOT/pom.xml" | sed 's/.*\(.*\)<\/version>.*/\1/' | sed 's/\s*//g')" + +FORCE_LOCAL=0 +SKIP_BUILD=0 + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' + +print_info() { echo -e "${BLUE}$1${NC}"; } +print_success() { echo -e "${GREEN}$1${NC}"; } +print_warning() { echo -e "${YELLOW}$1${NC}"; } +print_error() { echo -e "${RED}$1${NC}"; } + +print_help() { + cat </dev/null; then + print_error "Error: Java is not installed or not on PATH." + echo " Install JDK 8 or later: https://learn.microsoft.com/java/openjdk/download" + exit 1 +fi +JAVA_VER="$(java -version 2>&1 | head -1)" +echo " Java: $JAVA_VER" + +if ! command -v mvn &>/dev/null; then + print_error "Error: Maven is not installed or not on PATH." + echo " Install Maven: https://maven.apache.org/install.html" + exit 1 +fi +MVN_VER="$(mvn -version 2>&1 | head -1)" +echo " Maven: $MVN_VER" + +print_success "✓ Prerequisites OK" +echo "" + +# ========================================= +# Step 1: Install the SDK package +# ========================================= +# Default: check if published on Maven Central. If available, no build needed +# (Maven will download it automatically when running samples). +# If not published, fall back to local build. + +check_maven_central() { + echo "Step 1: Checking Maven Central for ${GROUP_ID}:${ARTIFACT_ID}:${VERSION}..." + local group_path="${GROUP_ID//\.//}" + local url="https://repo1.maven.org/maven2/${group_path}/${ARTIFACT_ID}/${VERSION}/${ARTIFACT_ID}-${VERSION}.pom" + + if curl -sf --head "$url" &>/dev/null; then + print_success "✓ Package is available on Maven Central" + echo " Maven will download it automatically when running samples." + return 0 + else + echo " Package not yet published on Maven Central" + return 1 + fi +} + +build_local() { + echo "Step 1: Building package locally..." + cd "$PACKAGE_ROOT" + + if [[ $SKIP_BUILD -eq 1 ]]; then + echo " Skipping build (--skip-build)" + # Verify the artifact exists in local Maven repo + local local_jar="$HOME/.m2/repository/${GROUP_ID//\.//}/${ARTIFACT_ID}/${VERSION}/${ARTIFACT_ID}-${VERSION}.jar" + if [[ -f "$local_jar" ]]; then + print_success "✓ Package found in local Maven repository" + else + print_warning "⚠ Package not found in local Maven repository: $local_jar" + echo " Run without --skip-build to build it." + fi + return 0 + fi + + echo " Building with: mvn install -DskipTests" + if mvn install -DskipTests; then + print_success "✓ Package built and installed to local Maven repository" + else + print_error "Error: Build failed." + echo " Common fixes:" + echo " - Ensure JDK 8+ is installed: java -version" + echo " - Build parent POM first: mvn install -DskipTests -f ../../parents/azure-client-sdk-parent/pom.xml" + exit 1 + fi +} + +if [[ $FORCE_LOCAL -eq 1 ]]; then + echo "(--local flag: skipping Maven Central check, building locally)" + echo "" + build_local +else + if ! check_maven_central; then + echo " Falling back to local build..." + echo "" + build_local + fi +fi +echo "" + +# ========================================= +# Step 2: Create .env file if needed +# ========================================= +echo "Step 2: Checking .env file..." +ENV_FILE="$PACKAGE_ROOT/.env" + +if [[ -f "$ENV_FILE" ]]; then + print_success "✓ .env file already exists at $ENV_FILE" +else + print_info "Creating sample .env file..." + cat > "$ENV_FILE" <<'ENVEOF' +# Azure AI Content Understanding - Environment Variables +# Fill in your values below. See SKILL.md for details. + +# Required: Your Microsoft Foundry resource endpoint +CONTENTUNDERSTANDING_ENDPOINT=https://your-foundry.services.ai.azure.com/ + +# Optional: API key (leave empty to use DefaultAzureCredential via az login) +CONTENTUNDERSTANDING_KEY= + +# Model deployment names (used by Sample00_UpdateDefaults) +GPT_4_1_DEPLOYMENT=gpt-4.1 +GPT_4_1_MINI_DEPLOYMENT=gpt-4.1-mini +TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=text-embedding-3-large + +# Cross-resource copy (only needed for Sample15_GrantCopyAuth) +# CONTENTUNDERSTANDING_TARGET_ENDPOINT=https://your-target-foundry.services.ai.azure.com/ +# CONTENTUNDERSTANDING_TARGET_RESOURCE_ID=/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.CognitiveServices/accounts/{targetAccountName} +ENVEOF + print_success "✓ Created .env file at $ENV_FILE" + print_warning "⚠ Please edit $ENV_FILE and fill in your actual values before running samples" +fi +echo "" + +echo "=========================================" +echo "✓ Setup complete!" +echo "=========================================" +echo "" +echo "Next steps:" +echo " 1. Edit .env with your endpoint and credentials (if not done already):" +echo " $ENV_FILE" +echo "" +echo " 2. Run a sample:" +echo " cd $PACKAGE_ROOT" +echo " .github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl --env .env" +echo "" +echo " Or export variables manually and use Maven directly:" +echo " export CONTENTUNDERSTANDING_ENDPOINT=\"https://...\"" +echo " mvn exec:java -Dexec.mainClass=\"com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrl\"" +echo "" diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-setup-env/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-setup-env/SKILL.md new file mode 100644 index 000000000000..d44bfe749cc9 --- /dev/null +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-setup-env/SKILL.md @@ -0,0 +1,212 @@ +--- +name: cu-sdk-java-setup-env +description: Help users create and configure their .env file for the Azure AI Content Understanding Java SDK. Use when users need to set up credentials, endpoint, and model deployment configuration. +--- + +# Set Up Environment Variables + +Interactively help users create and populate a `.env` file with the required configuration for running Azure AI Content Understanding Java SDK samples. + +> **[COPILOT INTERACTION MODEL]:** This skill is designed to be interactive. At each step marked with **[ASK USER]**, pause execution and prompt the user for input or confirmation before proceeding. Do NOT silently skip these prompts. Use the `ask_questions` tool when available. + +## Package Directory + +``` +sdk/contentunderstanding/azure-ai-contentunderstanding +``` + +## How Java Samples Use Environment Variables + +Java samples read configuration via `System.getenv()`. The variables must be `export`ed in the shell before running `mvn exec:java`. The `.env` file created by this skill can be sourced into your shell with: + +```bash +set -a && source .env && set +a +``` + +Or used with the optional helper script: + +```bash +.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh --env .env +``` + +## Workflow + +### Step 1: Navigate to Package Directory + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding +``` + +### Step 2: Check for Existing .env + +> **[ASK USER] Existing .env check:** +> Check if `.env` already exists in the package directory. +> - If it exists: Ask "You already have a `.env` file. Would you like to **update** it or **start fresh**?" +> - Update: Read the current file and ask which values to change. +> - Start fresh: Overwrite with new values. +> - If it doesn't exist: Proceed to Step 3. + +### Step 3: Gather Required Configuration + +#### 3a. Endpoint + +> **[ASK USER] Endpoint:** +> Ask: "Please provide your **Microsoft Foundry endpoint URL**." +> - It should look like: `https://.services.ai.azure.com/` +> - If the user doesn't know: Direct them to **Azure Portal → their Foundry resource → Keys and Endpoint**. +> - Validate it starts with `https://` and ends with `.services.ai.azure.com/`. + +#### 3b. Authentication Method + +> **[ASK USER] Authentication:** +> Ask: "How would you like to **authenticate** with Azure?" +> - **DefaultAzureCredential (recommended)** — No API key needed. Uses `az login`, managed identity, or other Azure credential chain. Make sure you've run `az login`. +> - **API Key** — Provide your key from Azure Portal → Foundry resource → Keys and Endpoint → Key1 or Key2. +> +> If DefaultAzureCredential: Leave `CONTENTUNDERSTANDING_KEY` empty. +> If API Key: Ask the user to provide the key value. + +#### 3c. Model Deployment Names + +> **[ASK USER] Model deployments:** +> Ask: "What are your **model deployment names**? Press Enter to accept each default." +> - GPT-4.1 deployment name (default: `gpt-4.1`) → `GPT_4_1_DEPLOYMENT` +> - GPT-4.1-mini deployment name (default: `gpt-4.1-mini`) → `GPT_4_1_MINI_DEPLOYMENT` +> - text-embedding-3-large deployment name (default: `text-embedding-3-large`) → `TEXT_EMBEDDING_3_LARGE_DEPLOYMENT` +> +> If the user's deployment names match the model names (which is common), they can accept all defaults. + +#### 3d. Cross-Resource Copy (Optional) + +> **[ASK USER] Cross-resource copy:** +> Ask: "Do you plan to use **cross-resource analyzer copying** (`Sample15_GrantCopyAuth`)?" +> - If no: Skip this section. +> - If yes: Gather the following additional values: +> 1. Source resource ID (ARM resource ID) +> 2. Source region (e.g., `eastus`) +> 3. Target endpoint URL +> 4. Target API key (or empty for DefaultAzureCredential) +> 5. Target resource ID (ARM resource ID) +> 6. Target region (e.g., `swedencentral`) + +### Step 4: Write the .env File + +Write the `.env` file to the package root directory (`sdk/contentunderstanding/azure-ai-contentunderstanding/.env`). + +**Template (basic):** + +``` +# Azure AI Content Understanding - Environment Variables +# Generated by cu-sdk-java-setup-env skill + +# Required: Your Microsoft Foundry resource endpoint +CONTENTUNDERSTANDING_ENDPOINT=https://.services.ai.azure.com/ + +# Optional: API key (leave empty to use DefaultAzureCredential via az login) +CONTENTUNDERSTANDING_KEY= + +# Model deployment names (used by Sample00_UpdateDefaults) +GPT_4_1_DEPLOYMENT=gpt-4.1 +GPT_4_1_MINI_DEPLOYMENT=gpt-4.1-mini +TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=text-embedding-3-large +``` + +**Template (with cross-resource copy):** + +``` +# Azure AI Content Understanding - Environment Variables +# Generated by cu-sdk-java-setup-env skill + +# Required: Your Microsoft Foundry resource endpoint +CONTENTUNDERSTANDING_ENDPOINT=https://.services.ai.azure.com/ + +# Optional: API key (leave empty to use DefaultAzureCredential via az login) +CONTENTUNDERSTANDING_KEY= + +# Model deployment names (used by Sample00_UpdateDefaults) +GPT_4_1_DEPLOYMENT=gpt-4.1 +GPT_4_1_MINI_DEPLOYMENT=gpt-4.1-mini +TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=text-embedding-3-large + +# Cross-resource copy settings (only for Sample15_GrantCopyAuth) +CONTENTUNDERSTANDING_SOURCE_RESOURCE_ID=/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.CognitiveServices/accounts/{sourceAccountName} +CONTENTUNDERSTANDING_SOURCE_REGION=eastus +CONTENTUNDERSTANDING_TARGET_ENDPOINT=https://.services.ai.azure.com/ +CONTENTUNDERSTANDING_TARGET_KEY= +CONTENTUNDERSTANDING_TARGET_RESOURCE_ID=/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.CognitiveServices/accounts/{targetAccountName} +CONTENTUNDERSTANDING_TARGET_REGION=swedencentral +``` + +### Step 5: Confirm and Verify + +> **[ASK USER] Confirm .env:** +> Display the written `.env` file contents (masking any API key values to show only the first 4 characters + `***`). +> Ask: "Does this configuration look correct?" +> - If yes: Proceed to the next steps. +> - If no: Ask which value needs to be changed and update the file. + +### Step 6: Load into Shell + +Show the user how to load the `.env` into their current shell: + +```bash +set -a && source .env && set +a +``` + +> **[ASK USER] Verify loaded:** +> Ask the user to verify the variables are set: +> ```bash +> echo $CONTENTUNDERSTANDING_ENDPOINT +> ``` +> Ask: "Does the endpoint value look correct?" + +### Next Steps + +After the `.env` is configured, direct the user to: + +1. **Run `Sample00_UpdateDefaults`** (if not done already) to configure model deployment mappings: + ```bash + set -a && source .env && set +a + mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample00_UpdateDefaults" + ``` + +2. **Run a sample** — see the `cu-sdk-java-sample-run` skill: + ```bash + mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrl" + ``` + +## Environment Variable Reference + +| Variable | Required By | Description | +|----------|------------|-------------| +| `CONTENTUNDERSTANDING_ENDPOINT` | **All samples** | Microsoft Foundry resource endpoint URL | +| `CONTENTUNDERSTANDING_KEY` | All (optional) | API key. If empty, `DefaultAzureCredential` is used (run `az login` first) | +| `GPT_4_1_DEPLOYMENT` | Sample00_UpdateDefaults | GPT-4.1 deployment name (default: `gpt-4.1`) | +| `GPT_4_1_MINI_DEPLOYMENT` | Sample00_UpdateDefaults | GPT-4.1-mini deployment name (default: `gpt-4.1-mini`) | +| `TEXT_EMBEDDING_3_LARGE_DEPLOYMENT` | Sample00_UpdateDefaults | text-embedding-3-large deployment name (default: `text-embedding-3-large`) | +| `CONTENTUNDERSTANDING_SOURCE_RESOURCE_ID` | Sample15_GrantCopyAuth | Source ARM resource ID | +| `CONTENTUNDERSTANDING_SOURCE_REGION` | Sample15_GrantCopyAuth | Source region (e.g., `eastus`) | +| `CONTENTUNDERSTANDING_TARGET_ENDPOINT` | Sample15_GrantCopyAuth | Target Foundry endpoint for cross-resource copy | +| `CONTENTUNDERSTANDING_TARGET_KEY` | Sample15_GrantCopyAuth (optional) | Target API key (empty = DefaultAzureCredential) | +| `CONTENTUNDERSTANDING_TARGET_RESOURCE_ID` | Sample15_GrantCopyAuth | Target ARM resource ID | +| `CONTENTUNDERSTANDING_TARGET_REGION` | Sample15_GrantCopyAuth | Target region (e.g., `swedencentral`) | + +## Troubleshooting + +| Problem | Solution | +|---------|----------| +| `CONTENTUNDERSTANDING_ENDPOINT` is null at runtime | Make sure you ran `set -a && source .env && set +a` in the same terminal before `mvn exec:java` | +| `Access denied` / 401 errors | Check API key is correct, or run `az login` if using DefaultAzureCredential | +| `az login` not working | Install Azure CLI: https://learn.microsoft.com/cli/azure/install-azure-cli | +| Changes to `.env` not taking effect | Re-run `set -a && source .env && set +a` — changes are not auto-reloaded | +| `.env` committed to git | Add `.env` to `.gitignore` — never commit credentials | + +## Security Notes + +- **Never commit `.env` files** to version control. Ensure `.gitignore` includes `.env`. +- Prefer **DefaultAzureCredential** over API keys when possible. +- If using API keys, rotate them regularly via the Azure Portal. + +## Related Skills + +- `cu-sdk-java-sample-run` — Run SDK samples (uses the env vars configured here) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md index ad33c7b52811..440dfd2c2d4b 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md @@ -422,6 +422,39 @@ For more information, see [Azure SDK for Java logging][logging]. * Explore the [samples directory][samples_directory] for complete code examples * Read the [Azure AI Content Understanding documentation][product_docs] for detailed service information +## GitHub Copilot Skills + +This package includes [GitHub Copilot][github_copilot] skills under `.github/skills/` that provide interactive, AI-assisted workflows for common tasks. In VS Code, Copilot can use these skills to help with environment setup, running samples, and understanding the service. + +### Available Skills + +| Skill | Description | How to Use | +|-------|-------------|------------| +| [**cu-sdk-java-setup-env**][cu_sdk_java_setup_env_skill] | Interactive environment setup — creates and configures your `.env` file with endpoint, credentials, and model deployment settings | In VS Code Copilot Chat, ask: *"Set up my Java environment for Content Understanding"* or reference the skill directly | +| [**cu-sdk-java-sample-run**][cu_sdk_java_sample_run_skill] | Guided sample runner — helps you build the SDK, configure credentials, and run specific samples with Maven | Ask: *"Run Sample02_AnalyzeUrl"* or *"Run the invoice analysis sample"* | +| [**cu-sdk-common-knowledge**][cu_sdk_common_knowledge_skill] | Domain knowledge reference — answers questions about Content Understanding concepts, analyzers, field schemas, API operations, and Java SDK usage | Ask: *"What prebuilt analyzers are available?"* or *"How do I create a custom analyzer?"* | + +### Using Skills in VS Code + +1. In VS Code, open the package folder `sdk/contentunderstanding/azure-ai-contentunderstanding` (File → Open Folder). This is required for VS Code to discover the skills in `.github/skills/`. +2. Ensure [GitHub Copilot][github_copilot] is installed and activated +3. Open Copilot Chat from the Chat view or Command Palette +4. Ask a question related to Content Understanding; Copilot can use the relevant skill when appropriate + +**Example prompts:** +- *"Set up my Content Understanding environment"* → likely uses `cu-sdk-java-setup-env` +- *"Run Sample03_AnalyzeInvoice"* → likely uses `cu-sdk-java-sample-run` +- *"Explain how custom analyzers work"* → likely uses `cu-sdk-common-knowledge` + +### Troubleshooting Skill Selection + +If Copilot does not use the expected skill, try the following: + +1. Be explicit about intent and context in one prompt (for example: *"Use cu-sdk-java-sample-run to run Sample01_AnalyzeBinary"*). +2. Include your goal and current state (for example: *"My .env is configured; help me run Sample02_AnalyzeUrl"*). +3. Ask for a step-by-step interactive flow when needed (for example: *"Guide me step by step to set up my environment"*). +4. For build or runtime errors, mention the exact error text so Copilot can apply the right troubleshooting path. + ## Contributing For details on contributing to this repository, see the [contributing guide][contributing]. @@ -458,3 +491,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con [code_of_conduct]: https://opensource.microsoft.com/codeofconduct/ [code_of_conduct_faq]: https://opensource.microsoft.com/codeofconduct/faq/ [opencode_email]: mailto:opencode@microsoft.com +[github_copilot]: https://github.com/features/copilot +[cu_sdk_java_setup_env_skill]: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-setup-env +[cu_sdk_java_sample_run_skill]: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run +[cu_sdk_common_knowledge_skill]: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-common-knowledge From a7f9a1c68b3a2bd77a46e84752011cdbfbe4829a Mon Sep 17 00:00:00 2001 From: Changjian Wang Date: Wed, 22 Apr 2026 14:54:42 +0800 Subject: [PATCH 02/19] Enhance SKILL.md files to include related skills for better user guidance --- .../.github/skills/cu-sdk-java-sample-run/SKILL.md | 3 ++- .../.github/skills/cu-sdk-java-setup-env/SKILL.md | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/SKILL.md index f86fa3cd66a3..04a431752cc6 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/SKILL.md @@ -418,7 +418,8 @@ Wraps `mvn exec:java` with sample name resolution, validation, and optional `.en ## Related Skills -- `cu-sdk-java-setup-env` — Interactive .env file setup +- `cu-sdk-java-setup-env` — Interactive .env file setup (configure endpoint, auth, and model deployments before running samples) +- `cu-sdk-common-knowledge` — Domain knowledge for Content Understanding concepts ## Additional Resources diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-setup-env/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-setup-env/SKILL.md index d44bfe749cc9..fab367c0660a 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-setup-env/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-setup-env/SKILL.md @@ -210,3 +210,4 @@ After the `.env` is configured, direct the user to: ## Related Skills - `cu-sdk-java-sample-run` — Run SDK samples (uses the env vars configured here) +- `cu-sdk-common-knowledge` — Domain knowledge for Content Understanding concepts From 7c7478814ca5f9db60fdd0590df570c76d47e35e Mon Sep 17 00:00:00 2001 From: Changjian Wang Date: Wed, 22 Apr 2026 16:07:41 +0800 Subject: [PATCH 03/19] Update README.md to enhance Table of Contents with new sections for better navigation and user guidance Co-authored-by: Copilot --- .../azure-ai-contentunderstanding/README.md | 40 ++++++++++++++++--- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md index 440dfd2c2d4b..b1259fb21500 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md @@ -13,6 +13,32 @@ Use the client library for Azure AI Content Understanding to: [Source code][source_code] | [Package (Maven)][package_maven] | [API reference documentation][api_reference_docs] | [Product documentation][product_docs] +## Table of Contents + +- [Getting started](#getting-started) + - [Prerequisites](#prerequisites) + - [Configuring Microsoft Foundry resource](#configuring-microsoft-foundry-resource) + - [Adding the package to your product](#adding-the-package-to-your-product) + - [Authenticate the client](#authenticate-the-client) +- [Key concepts](#key-concepts) + - [Prebuilt analyzers](#prebuilt-analyzers) + - [Content types](#content-types) + - [Asynchronous operations](#asynchronous-operations) + - [Main classes](#main-classes) + - [Thread safety](#thread-safety) + - [Additional concepts](#additional-concepts) +- [Examples](#examples) + - [Running samples](#running-samples) +- [Troubleshooting](#troubleshooting) + - [Common issues](#common-issues) + - [Enable logging](#enable-logging) +- [GitHub Copilot Skills](#github-copilot-skills) + - [Available Skills](#available-skills) + - [Using Skills in VS Code](#using-skills-in-vs-code) + - [Troubleshooting Skill Selection](#troubleshooting-skill-selection) +- [Next steps](#next-steps) +- [Contributing](#contributing) + ## Getting started ### Prerequisites @@ -417,11 +443,6 @@ ContentUnderstandingClient client = new ContentUnderstandingClientBuilder() For more information, see [Azure SDK for Java logging][logging]. -## Next steps - -* Explore the [samples directory][samples_directory] for complete code examples -* Read the [Azure AI Content Understanding documentation][product_docs] for detailed service information - ## GitHub Copilot Skills This package includes [GitHub Copilot][github_copilot] skills under `.github/skills/` that provide interactive, AI-assisted workflows for common tasks. In VS Code, Copilot can use these skills to help with environment setup, running samples, and understanding the service. @@ -455,6 +476,13 @@ If Copilot does not use the expected skill, try the following: 3. Ask for a step-by-step interactive flow when needed (for example: *"Guide me step by step to set up my environment"*). 4. For build or runtime errors, mention the exact error text so Copilot can apply the right troubleshooting path. +## Next steps + +* [Sample 00: Configure model deployment defaults][sample00] - Required one-time setup to configure model deployments for prebuilt and custom analyzers +* [Sample 01: Analyze a document from binary data][sample01] - Analyze PDF files from disk using `prebuilt-documentSearch` +* Explore the [samples directory][samples_directory] for complete code examples +* Read the [Azure AI Content Understanding documentation][product_docs] for detailed service information + ## Contributing For details on contributing to this repository, see the [contributing guide][contributing]. @@ -481,6 +509,8 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con [deploy_models_docs]: https://learn.microsoft.com/azure/ai-studio/how-to/deploy-models-openai [prebuilt_analyzers_docs]: https://learn.microsoft.com/azure/ai-services/content-understanding/concepts/prebuilt-analyzers [samples_directory]: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples +[sample00]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample00_UpdateDefaults.java +[sample01]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample01_AnalyzeBinary.java [sample00_update_defaults]: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample00_UpdateDefaults.java [logging]: https://github.com/Azure/azure-sdk-for-java/wiki/Logging-in-Azure-SDK [azure_core_http_client]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/core/azure-core/README.md#configuring-service-clients From 872dd16991ee0c22463e87eab25282de0996d0a4 Mon Sep 17 00:00:00 2001 From: aluneth Date: Wed, 22 Apr 2026 19:33:27 +0800 Subject: [PATCH 04/19] Add setup and run scripts for Azure AI Content Understanding SDK samples - Introduced `run_sample.sh` to execute specific Java samples with Maven. - Added `setup_samples.sh` to prepare the environment, check prerequisites, and create a `.env` file. - Created `SKILL.md` to guide users in setting up their environment variables interactively. - Updated `README.md` to reflect new skills for environment setup and sample execution. --- .../skills/cu-sdk-common-knowledge/SKILL.md | 4 +- .../SKILL.md | 59 ++++++++++++------- .../scripts/run_sample.sh | 2 +- .../scripts/setup_samples.sh | 2 +- .../SKILL.md | 12 ++-- .../azure-ai-contentunderstanding/README.md | 14 ++--- 6 files changed, 56 insertions(+), 37 deletions(-) rename sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/{cu-sdk-java-sample-run => cu-sdk-sample-run}/SKILL.md (86%) rename sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/{cu-sdk-java-sample-run => cu-sdk-sample-run}/scripts/run_sample.sh (99%) rename sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/{cu-sdk-java-sample-run => cu-sdk-sample-run}/scripts/setup_samples.sh (98%) rename sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/{cu-sdk-java-setup-env => cu-sdk-setup}/SKILL.md (96%) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-common-knowledge/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-common-knowledge/SKILL.md index e3524b401d2d..acf85279ac93 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-common-knowledge/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-common-knowledge/SKILL.md @@ -51,5 +51,5 @@ Always read the relevant page (via `fetch_webpage`) before answering if the refe ## Related Skills -- `cu-sdk-java-setup-env` — Set up environment variables for Java SDK samples -- `cu-sdk-java-sample-run` — Run specific Java SDK samples interactively +- `cu-sdk-setup` — Set up environment variables for Java SDK samples +- `cu-sdk-sample-run` — Run specific Java SDK samples interactively diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md similarity index 86% rename from sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/SKILL.md rename to sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md index 04a431752cc6..e530deaa02a7 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md @@ -1,5 +1,5 @@ --- -name: cu-sdk-java-sample-run +name: cu-sdk-sample-run description: Run a specific sample for the Azure AI Content Understanding Java SDK. Use when users want to run a particular sample like Sample02_AnalyzeUrl or Sample03_AnalyzeInvoice. --- @@ -144,13 +144,13 @@ mvn install -DskipTests -pl sdk/contentunderstanding/azure-ai-contentunderstandi The `setup_samples.sh` script automates this — it checks Maven Central first and falls back to a local build: ```bash -.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh +.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh ``` Use `--local` to force local build: ```bash -.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh --local +.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh --local ``` @@ -160,13 +160,13 @@ Use `--local` to force local build: > **[ASK USER] Configuration check:** > Ask the user: "Do you already have your environment variables configured (`.env` file or exported in shell)?" > - If yes: Skip to Step 4. -> - If no: Direct them to the `cu-sdk-java-setup-env` skill for interactive setup, or guide them through the steps below. +> - If no: Direct them to the `cu-sdk-setup` skill for interactive setup, or guide them through the steps below. Java samples read credentials from **OS environment variables** via `System.getenv()`. Unlike Python (`dotenv`) or JavaScript (`dotenv/config`), Java does not have a built-in `.env` loader — the variables must be present in the shell environment when the JVM starts. The recommended approach is to create a **`.env` file** and source it before running samples. -> **Tip:** Use the `cu-sdk-java-setup-env` skill for an interactive walkthrough that creates your `.env` file step by step. +> **Tip:** Use the `cu-sdk-setup` skill for an interactive walkthrough that creates your `.env` file step by step. **Create a `.env` file** in the package root (`sdk/contentunderstanding/azure-ai-contentunderstanding/.env`): @@ -323,21 +323,40 @@ The `run_sample.sh` script is a convenience wrapper around `mvn exec:java`. It r ```bash # Run a sample -.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl +.github/skills/cu-sdk-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl # Run with .env file (auto-loads environment variables into the shell) -.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl --env .env +.github/skills/cu-sdk-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl --env .env # List all available samples -.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh --list +.github/skills/cu-sdk-sample-run/scripts/run_sample.sh --list ``` +### After the Sample Runs — Review Results and Explain the Sample + +After the sample completes, the skill **must** do the following for the user (do not skip): + +1. **Show the terminal command to re-run this sample directly**, so the user can iterate without the skill. For example: + ```bash + set -a && source .env && set +a + mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrl" -Dexec.classpathScope=test + ``` + Substitute `Sample02_AnalyzeUrl` with the sample the user just ran. + +2. **Briefly explain the key code concepts** demonstrated in the sample. Tailor the explanation to the specific sample; common concepts include: + - **Client creation** — how `ContentUnderstandingClient` is constructed via the builder (endpoint + `DefaultAzureCredentialBuilder` or `AzureKeyCredential`) + - **Analyzer selection** — which prebuilt (`prebuilt-documentSearch`, `prebuilt-invoice`, etc.) or custom analyzer is used and why + - **Input type** — URL vs. `BinaryData` vs. local file + - **Result processing** — how the returned `AnalyzeResult` is traversed (pages, fields, contents) + - **Content type casting** — e.g., casting `AnalyzedContent` to `AnalyzedDocumentContent` / `AnalyzedImageContent` / `AnalyzedAudioContent` / `AnalyzedVideoContent` when needed + - **Long-running operation polling** — if the sample uses `SyncPoller` / `beginAnalyze` + > **[ASK USER] Sample result:** -> After running the sample, ask: "Did the sample run successfully?" -> - If yes: "Would you like to run another sample, or are you all set?" -> - If no: Help troubleshoot using the Troubleshooting section below. Common issues include missing environment variables, SDK not built, or model defaults not configured. +> Ask: "Did the sample run successfully?" +> - If yes: present the re-run command and the key-code explanation (above), then ask: "Would you like to run another sample, or are you all set?" +> - If no: help troubleshoot using the Troubleshooting section below. Common issues include missing environment variables, SDK not built, or model defaults not configured. > **[ASK USER] Run another?:** > If the user wants to run another sample, loop back to the "Which sample?" prompt above. @@ -376,13 +395,13 @@ Checks Maven Central for the published package, falls back to local build, and c ```bash # Default: try Maven Central, fall back to local build -.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh +.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh # Force local build (e.g., testing local changes) -.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh --local +.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh --local # Local mode: skip build if already built -.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh --local --skip-build +.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh --local --skip-build ``` ### `run_sample.sh` -- Run a Sample with Conveniences @@ -391,16 +410,16 @@ Wraps `mvn exec:java` with sample name resolution, validation, and optional `.en ```bash # Run a sample (resolves class name automatically) -.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl +.github/skills/cu-sdk-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl # Load env vars from .env file before running -.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl --env .env +.github/skills/cu-sdk-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl --env .env # List available samples -.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh --list +.github/skills/cu-sdk-sample-run/scripts/run_sample.sh --list # Dry run (show what would be executed) -.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl --dry-run +.github/skills/cu-sdk-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl --dry-run ``` ## Troubleshooting @@ -414,11 +433,11 @@ Wraps `mvn exec:java` with sample name resolution, validation, and optional `.en | `Model deployment not found` | Run `Sample00_UpdateDefaults` first to configure model mappings | | `FileNotFoundException` for binary samples | Run samples from the package root directory (`sdk/contentunderstanding/azure-ai-contentunderstanding`) | | `Parent POM not resolved` | Run `mvn install -DskipTests -f ../../parents/azure-client-sdk-parent/pom.xml` first | -| `Permission denied` when running scripts | Make scripts executable: `chmod +x .github/skills/cu-sdk-java-sample-run/scripts/*.sh` | +| `Permission denied` when running scripts | Make scripts executable: `chmod +x .github/skills/cu-sdk-sample-run/scripts/*.sh` | ## Related Skills -- `cu-sdk-java-setup-env` — Interactive .env file setup (configure endpoint, auth, and model deployments before running samples) +- `cu-sdk-setup` — Interactive .env file setup (configure endpoint, auth, and model deployments before running samples) - `cu-sdk-common-knowledge` — Domain knowledge for Content Understanding concepts ## Additional Resources diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh similarity index 99% rename from sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh rename to sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh index 7a2cec318099..0c0438b898b2 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh @@ -15,7 +15,7 @@ set -euo pipefail # run_sample.sh --list SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -# Package root is 4 levels up from scripts: .github/skills/cu-sdk-java-sample-run/scripts -> package root +# Package root is 4 levels up from scripts: .github/skills/cu-sdk-sample-run/scripts -> package root PACKAGE_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)" SAMPLES_DIR="$PACKAGE_ROOT/src/samples/java/com/azure/ai/contentunderstanding/samples" PACKAGE="com.azure.ai.contentunderstanding.samples" diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh similarity index 98% rename from sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh rename to sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh index 3356204c2087..299059734c07 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run/scripts/setup_samples.sh +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh @@ -223,7 +223,7 @@ echo " $ENV_FILE" echo "" echo " 2. Run a sample:" echo " cd $PACKAGE_ROOT" -echo " .github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl --env .env" +echo " .github/skills/cu-sdk-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl --env .env" echo "" echo " Or export variables manually and use Maven directly:" echo " export CONTENTUNDERSTANDING_ENDPOINT=\"https://...\"" diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-setup-env/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md similarity index 96% rename from sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-setup-env/SKILL.md rename to sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md index fab367c0660a..748c4da0947f 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-setup-env/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md @@ -1,5 +1,5 @@ --- -name: cu-sdk-java-setup-env +name: cu-sdk-setup description: Help users create and configure their .env file for the Azure AI Content Understanding Java SDK. Use when users need to set up credentials, endpoint, and model deployment configuration. --- @@ -26,7 +26,7 @@ set -a && source .env && set +a Or used with the optional helper script: ```bash -.github/skills/cu-sdk-java-sample-run/scripts/run_sample.sh --env .env +.github/skills/cu-sdk-sample-run/scripts/run_sample.sh --env .env ``` ## Workflow @@ -97,7 +97,7 @@ Write the `.env` file to the package root directory (`sdk/contentunderstanding/a ``` # Azure AI Content Understanding - Environment Variables -# Generated by cu-sdk-java-setup-env skill +# Generated by cu-sdk-setup skill # Required: Your Microsoft Foundry resource endpoint CONTENTUNDERSTANDING_ENDPOINT=https://.services.ai.azure.com/ @@ -115,7 +115,7 @@ TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=text-embedding-3-large ``` # Azure AI Content Understanding - Environment Variables -# Generated by cu-sdk-java-setup-env skill +# Generated by cu-sdk-setup skill # Required: Your Microsoft Foundry resource endpoint CONTENTUNDERSTANDING_ENDPOINT=https://.services.ai.azure.com/ @@ -170,7 +170,7 @@ After the `.env` is configured, direct the user to: mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample00_UpdateDefaults" ``` -2. **Run a sample** — see the `cu-sdk-java-sample-run` skill: +2. **Run a sample** — see the `cu-sdk-sample-run` skill: ```bash mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrl" ``` @@ -209,5 +209,5 @@ After the `.env` is configured, direct the user to: ## Related Skills -- `cu-sdk-java-sample-run` — Run SDK samples (uses the env vars configured here) +- `cu-sdk-sample-run` — Run SDK samples (uses the env vars configured here) - `cu-sdk-common-knowledge` — Domain knowledge for Content Understanding concepts diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md index b1259fb21500..053b85a8af8c 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md @@ -451,8 +451,8 @@ This package includes [GitHub Copilot][github_copilot] skills under `.github/ski | Skill | Description | How to Use | |-------|-------------|------------| -| [**cu-sdk-java-setup-env**][cu_sdk_java_setup_env_skill] | Interactive environment setup — creates and configures your `.env` file with endpoint, credentials, and model deployment settings | In VS Code Copilot Chat, ask: *"Set up my Java environment for Content Understanding"* or reference the skill directly | -| [**cu-sdk-java-sample-run**][cu_sdk_java_sample_run_skill] | Guided sample runner — helps you build the SDK, configure credentials, and run specific samples with Maven | Ask: *"Run Sample02_AnalyzeUrl"* or *"Run the invoice analysis sample"* | +| [**cu-sdk-setup**][cu_sdk_setup_skill] | Interactive environment setup — creates and configures your `.env` file with endpoint, credentials, and model deployment settings | In VS Code Copilot Chat, ask: *"Set up my Java environment for Content Understanding"* or reference the skill directly | +| [**cu-sdk-sample-run**][cu_sdk_sample_run_skill] | Guided sample runner — helps you build the SDK, configure credentials, and run specific samples with Maven | Ask: *"Run Sample02_AnalyzeUrl"* or *"Run the invoice analysis sample"* | | [**cu-sdk-common-knowledge**][cu_sdk_common_knowledge_skill] | Domain knowledge reference — answers questions about Content Understanding concepts, analyzers, field schemas, API operations, and Java SDK usage | Ask: *"What prebuilt analyzers are available?"* or *"How do I create a custom analyzer?"* | ### Using Skills in VS Code @@ -463,15 +463,15 @@ This package includes [GitHub Copilot][github_copilot] skills under `.github/ski 4. Ask a question related to Content Understanding; Copilot can use the relevant skill when appropriate **Example prompts:** -- *"Set up my Content Understanding environment"* → likely uses `cu-sdk-java-setup-env` -- *"Run Sample03_AnalyzeInvoice"* → likely uses `cu-sdk-java-sample-run` +- *"Set up my Content Understanding environment"* → likely uses `cu-sdk-setup` +- *"Run Sample03_AnalyzeInvoice"* → likely uses `cu-sdk-sample-run` - *"Explain how custom analyzers work"* → likely uses `cu-sdk-common-knowledge` ### Troubleshooting Skill Selection If Copilot does not use the expected skill, try the following: -1. Be explicit about intent and context in one prompt (for example: *"Use cu-sdk-java-sample-run to run Sample01_AnalyzeBinary"*). +1. Be explicit about intent and context in one prompt (for example: *"Use cu-sdk-sample-run to run Sample01_AnalyzeBinary"*). 2. Include your goal and current state (for example: *"My .env is configured; help me run Sample02_AnalyzeUrl"*). 3. Ask for a step-by-step interactive flow when needed (for example: *"Guide me step by step to set up my environment"*). 4. For build or runtime errors, mention the exact error text so Copilot can apply the right troubleshooting path. @@ -522,6 +522,6 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con [code_of_conduct_faq]: https://opensource.microsoft.com/codeofconduct/faq/ [opencode_email]: mailto:opencode@microsoft.com [github_copilot]: https://github.com/features/copilot -[cu_sdk_java_setup_env_skill]: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-setup-env -[cu_sdk_java_sample_run_skill]: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-java-sample-run +[cu_sdk_setup_skill]: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup +[cu_sdk_sample_run_skill]: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run [cu_sdk_common_knowledge_skill]: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-common-knowledge From 8146dd88977b89e1072402dbdb7a1109b82b6620 Mon Sep 17 00:00:00 2001 From: aluneth Date: Thu, 23 Apr 2026 22:26:33 +0800 Subject: [PATCH 05/19] Update SKILL.md to enhance user guidance for Java SDK setup and configuration --- .../.github/skills/cu-sdk-setup/SKILL.md | 265 ++++++++++++++---- 1 file changed, 213 insertions(+), 52 deletions(-) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md index 748c4da0947f..b8acfcc73549 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md @@ -1,14 +1,33 @@ --- name: cu-sdk-setup -description: Help users create and configure their .env file for the Azure AI Content Understanding Java SDK. Use when users need to set up credentials, endpoint, and model deployment configuration. +description: Guide SDK users through setting up their Java environment for Azure AI Content Understanding. Use this skill when users need help installing the SDK, configuring Azure resources, deploying required models, setting environment variables, or running samples. --- -# Set Up Environment Variables +# SDK User Environment Setup for Azure AI Content Understanding (Java) -Interactively help users create and populate a `.env` file with the required configuration for running Azure AI Content Understanding Java SDK samples. +Set up your Java environment to use the Azure AI Content Understanding SDK and run samples. + +> **Note:** This skill is for SDK users who want to run samples and use the SDK. For SDK development (regenerating code, running tests, pushing recordings), see the sibling `sdkinternal-java-*` skills. > **[COPILOT INTERACTION MODEL]:** This skill is designed to be interactive. At each step marked with **[ASK USER]**, pause execution and prompt the user for input or confirmation before proceeding. Do NOT silently skip these prompts. Use the `ask_questions` tool when available. +## Prerequisites + +Before starting, ensure you have: + +- **JDK 8 or later** installed (JDK 11+ recommended; JDK 17/21 LTS also supported) +- **Apache Maven 3.6+** installed +- An **Azure subscription** ([create one for free](https://azure.microsoft.com/free/)) +- A **Microsoft Foundry resource** in a [supported region](https://learn.microsoft.com/azure/ai-services/content-understanding/language-region-support) +- **Azure CLI** installed (recommended for `DefaultAzureCredential` auth via `az login`) + +> **[ASK USER] Prerequisites Check:** +> Before proceeding, ask the user to confirm their prerequisites: +> 1. "Do you have **JDK 8+** installed? (`java -version`)" — If no, guide them to install a JDK first (e.g., Microsoft Build of OpenJDK, Temurin). +> 2. "Do you have **Maven 3.6+** installed? (`mvn -version`)" — If no, install Maven. +> 3. "Do you already have a **Microsoft Foundry resource** set up in Azure?" — If no, jump to **Step 5** (Azure Resource Setup) first, then return here. +> 4. "Have you already deployed the required **AI models** (GPT-4.1, GPT-4.1-mini, text-embedding-3-large) in Microsoft Foundry?" — If no, include Step 5.3 and Step 6 in the workflow. + ## Package Directory ``` @@ -17,7 +36,7 @@ sdk/contentunderstanding/azure-ai-contentunderstanding ## How Java Samples Use Environment Variables -Java samples read configuration via `System.getenv()`. The variables must be `export`ed in the shell before running `mvn exec:java`. The `.env` file created by this skill can be sourced into your shell with: +Java samples read configuration via `System.getenv()`. The variables must be exported in the shell before running `mvn exec:java`. The `.env` file created by this skill can be sourced into your shell with: ```bash set -a && source .env && set +a @@ -37,48 +56,102 @@ Or used with the optional helper script: cd sdk/contentunderstanding/azure-ai-contentunderstanding ``` -### Step 2: Check for Existing .env +### Step 2: Verify Toolchain + +> **[ASK USER] Platform:** +> Ask the user: "Which **platform** are you on?" with options: +> - Linux/macOS +> - Windows PowerShell +> - Windows Command Prompt +> +> Use their answer to show the correct commands throughout the rest of the setup. + +Verify your toolchain: + +```bash +java -version # Should print 1.8.x or higher (11+, 17, 21 also fine) +mvn -version # Should print 3.6.x or higher +``` + +> **[ASK USER] Confirm toolchain:** +> Ask: "Did both commands print valid versions? If either is missing, install it before continuing." + +### Step 3: Install SDK Dependencies + +> **[ASK USER] Installation mode:** +> Ask the user: "How would you like to install the SDK?" +> - **Option A: Use the published artifact (recommended)** — Maven will download `com.azure:azure-ai-contentunderstanding` from Maven Central. Best for running samples as-is. +> - **Option B: Local build (for development)** — Installs the current source tree into your local Maven repo so changes are picked up immediately. + +**Option A: Download dependencies only:** +```bash +mvn dependency:resolve +``` + +**Option B: Local install from source:** +```bash +mvn install -DskipTests +``` + +This compiles the SDK and all sample sources under `src/samples/java`. + +> **[ASK USER] Installation check:** +> After running the command, ask: "Did the build complete with `BUILD SUCCESS`?" If the user reports errors (e.g., dependency resolution failures, JDK version mismatches), help troubleshoot before continuing. + +### Step 4: Configure Environment Variables + +#### 4.1 Check for Existing .env > **[ASK USER] Existing .env check:** > Check if `.env` already exists in the package directory. > - If it exists: Ask "You already have a `.env` file. Would you like to **update** it or **start fresh**?" > - Update: Read the current file and ask which values to change. -> - Start fresh: Overwrite with new values. -> - If it doesn't exist: Proceed to Step 3. +> - Start fresh: Overwrite with new values (confirm destructive action first). +> - If it doesn't exist: Proceed to 4.2. -### Step 3: Gather Required Configuration +**Linux/macOS:** +```bash +if [ -f ".env" ]; then + echo "NOTE: .env file already exists" +else + echo "No .env file found — will create one" +fi +``` -#### 3a. Endpoint +**Windows PowerShell:** +```powershell +if (Test-Path ".env") { + Write-Host "NOTE: .env file already exists" +} else { + Write-Host "No .env file found — will create one" +} +``` + +#### 4.2 Gather Required Configuration > **[ASK USER] Endpoint:** > Ask: "Please provide your **Microsoft Foundry endpoint URL**." > - It should look like: `https://.services.ai.azure.com/` -> - If the user doesn't know: Direct them to **Azure Portal → their Foundry resource → Keys and Endpoint**. -> - Validate it starts with `https://` and ends with `.services.ai.azure.com/`. - -#### 3b. Authentication Method +> - Validate: it should NOT include `api-version` or other query parameters. +> - If the user doesn't know where to find it: direct them to Azure Portal → Their Foundry resource → Keys and Endpoint. -> **[ASK USER] Authentication:** +> **[ASK USER] Authentication method:** > Ask: "How would you like to **authenticate** with Azure?" -> - **DefaultAzureCredential (recommended)** — No API key needed. Uses `az login`, managed identity, or other Azure credential chain. Make sure you've run `az login`. -> - **API Key** — Provide your key from Azure Portal → Foundry resource → Keys and Endpoint → Key1 or Key2. +> - **Option A: DefaultAzureCredential (recommended)** — Uses `az login`, managed identity, or other Azure credential chain. No API key needed. +> - **Option B: API Key** — You'll need your `CONTENTUNDERSTANDING_KEY` from the Azure Portal. > -> If DefaultAzureCredential: Leave `CONTENTUNDERSTANDING_KEY` empty. -> If API Key: Ask the user to provide the key value. +> If Option A: Remind the user to run `az login` before invoking samples. Leave `CONTENTUNDERSTANDING_KEY` empty. +> If Option B: Ask for the key value (retrievable at Azure Portal → Foundry resource → Keys and Endpoint → Key1 or Key2). -#### 3c. Model Deployment Names - -> **[ASK USER] Model deployments:** +> **[ASK USER] Model deployment names:** > Ask: "What are your **model deployment names**? Press Enter to accept each default." > - GPT-4.1 deployment name (default: `gpt-4.1`) → `GPT_4_1_DEPLOYMENT` > - GPT-4.1-mini deployment name (default: `gpt-4.1-mini`) → `GPT_4_1_MINI_DEPLOYMENT` > - text-embedding-3-large deployment name (default: `text-embedding-3-large`) → `TEXT_EMBEDDING_3_LARGE_DEPLOYMENT` > -> If the user's deployment names match the model names (which is common), they can accept all defaults. - -#### 3d. Cross-Resource Copy (Optional) +> These are required by `Sample00_UpdateDefaults` (one-time mapping setup). -> **[ASK USER] Cross-resource copy:** +> **[ASK USER] Cross-resource copy (optional):** > Ask: "Do you plan to use **cross-resource analyzer copying** (`Sample15_GrantCopyAuth`)?" > - If no: Skip this section. > - If yes: Gather the following additional values: @@ -89,13 +162,29 @@ cd sdk/contentunderstanding/azure-ai-contentunderstanding > 5. Target resource ID (ARM resource ID) > 6. Target region (e.g., `swedencentral`) -### Step 4: Write the .env File +#### 4.3 Validate Configuration + +> **[ASK USER] Validate configuration:** +> After the user has provided all values, summarize the configuration and ask them to confirm: +> ``` +> Here's your configuration: +> CONTENTUNDERSTANDING_ENDPOINT = +> Authentication: DefaultAzureCredential / API Key (masked) +> GPT_4_1_DEPLOYMENT = +> GPT_4_1_MINI_DEPLOYMENT = +> TEXT_EMBEDDING_3_LARGE_DEPLOYMENT = +> +> Does this look correct? (Yes / No — let me fix something) +> ``` +> Only write to `.env` after the user confirms. + +#### 4.4 Write the .env File Write the `.env` file to the package root directory (`sdk/contentunderstanding/azure-ai-contentunderstanding/.env`). **Template (basic):** -``` +```bash # Azure AI Content Understanding - Environment Variables # Generated by cu-sdk-setup skill @@ -113,7 +202,7 @@ TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=text-embedding-3-large **Template (with cross-resource copy):** -``` +```bash # Azure AI Content Understanding - Environment Variables # Generated by cu-sdk-setup skill @@ -137,17 +226,50 @@ CONTENTUNDERSTANDING_TARGET_RESOURCE_ID=/subscriptions/{subscriptionId}/resource CONTENTUNDERSTANDING_TARGET_REGION=swedencentral ``` -### Step 5: Confirm and Verify +### Step 5: Azure Resource Setup (if not done) + +> **[NOTE]:** Only guide the user through this step if they indicated during the prerequisites check that they do NOT yet have a Microsoft Foundry resource. Otherwise, skip to Step 6. + +#### 5.1 Create Microsoft Foundry Resource + +1. Go to [Azure Portal](https://portal.azure.com/) +2. Create a **Microsoft Foundry resource** in a [supported region](https://learn.microsoft.com/azure/ai-services/content-understanding/language-region-support) +3. Navigate to **Resource Management** → **Keys and Endpoint** +4. Copy the **Endpoint** URL and optionally a **Key** + +> **[ASK USER] Resource created:** +> After guiding the user to create the resource, ask: "Have you created the Microsoft Foundry resource? Please share the **endpoint URL** so we can continue with configuration." -> **[ASK USER] Confirm .env:** -> Display the written `.env` file contents (masking any API key values to show only the first 4 characters + `***`). -> Ask: "Does this configuration look correct?" -> - If yes: Proceed to the next steps. -> - If no: Ask which value needs to be changed and update the file. +#### 5.2 Grant Cognitive Services User Role -### Step 6: Load into Shell +This role is required even if you own the resource: -Show the user how to load the `.env` into their current shell: +1. In your Foundry resource, go to **Access Control (IAM)** +2. Click **Add** → **Add role assignment** +3. Select **Cognitive Services User** role +4. Assign it to yourself + +> **[ASK USER] Role assigned:** +> Ask: "Have you assigned the **Cognitive Services User** role to yourself? This is required even if you own the resource." + +#### 5.3 Deploy Required Models + +| Analyzer Type | Required Models | +|--------------|-----------------| +| `prebuilt-documentSearch`, `prebuilt-imageSearch`, `prebuilt-audioSearch`, `prebuilt-videoSearch` | gpt-4.1-mini, text-embedding-3-large | +| Other prebuilt analyzers (invoice, receipt, etc.) | gpt-4.1, text-embedding-3-large | + +**To deploy a model:** +1. In Microsoft Foundry → **Deployments** → **Deploy model** → **Deploy base model** +2. Search and deploy: `gpt-4.1`, `gpt-4.1-mini`, `text-embedding-3-large` +3. Note deployment names (recommendation: use the model name as the deployment name) + +> **[ASK USER] Models deployed:** +> Ask: "Have you deployed the required models? Please provide the **deployment names** you used for each (GPT-4.1, GPT-4.1-mini, text-embedding-3-large)." Use these names to populate the `.env` file. + +### Step 6: Load .env and Configure Model Defaults (One-Time Setup) + +#### 6.1 Load .env into the current shell ```bash set -a && source .env && set +a @@ -160,20 +282,49 @@ set -a && source .env && set +a > ``` > Ask: "Does the endpoint value look correct?" -### Next Steps +#### 6.2 Run Sample00_UpdateDefaults + +> **[ASK USER] Run model defaults?:** +> Ask: "Would you like to run `Sample00_UpdateDefaults` now to configure model defaults? This is a **one-time setup** per Microsoft Foundry resource. (Yes / Skip for now)" +> - If yes, ensure deployment name env vars are set, then run the sample. +> - If no, let them know they'll need to run it before using prebuilt analyzers. -After the `.env` is configured, direct the user to: +```bash +mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample00_UpdateDefaults" +``` -1. **Run `Sample00_UpdateDefaults`** (if not done already) to configure model deployment mappings: - ```bash - set -a && source .env && set +a - mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample00_UpdateDefaults" - ``` +This is a **one-time setup per Microsoft Foundry resource**. -2. **Run a sample** — see the `cu-sdk-sample-run` skill: - ```bash - mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrl" - ``` +### Step 7: Run Samples + +> **[ASK USER] Which samples?:** +> Ask: "Which sample would you like to run first?" with options: +> - `Sample02_AnalyzeUrl` — Analyze content from a URL (recommended start) +> - `Sample01_AnalyzeBinary` — Analyze a local file +> - `Sample03_AnalyzeInvoice` — Extract invoice fields +> - Other — Let me see the full list +> - Skip — I'll run samples on my own later +> +> If the user picks "Other", list available samples from `src/samples/java/com/azure/ai/contentunderstanding/samples/`. + +**Sync sample:** +```bash +set -a && source .env && set +a +mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrl" +``` + +**Async sample (same package, `*Async` suffix):** +```bash +mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrlAsync" +``` + +For a more fluent experience, use the sample-run helper skill: +```bash +.github/skills/cu-sdk-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl --env .env +``` + +> **[ASK USER] Sample result:** +> After running a sample, ask: "Did the sample run successfully? Would you like to run another sample or are you all set?" ## Environment Variable Reference @@ -195,19 +346,29 @@ After the `.env` is configured, direct the user to: | Problem | Solution | |---------|----------| -| `CONTENTUNDERSTANDING_ENDPOINT` is null at runtime | Make sure you ran `set -a && source .env && set +a` in the same terminal before `mvn exec:java` | -| `Access denied` / 401 errors | Check API key is correct, or run `az login` if using DefaultAzureCredential | -| `az login` not working | Install Azure CLI: https://learn.microsoft.com/cli/azure/install-azure-cli | -| Changes to `.env` not taking effect | Re-run `set -a && source .env && set +a` — changes are not auto-reloaded | -| `.env` committed to git | Add `.env` to `.gitignore` — never commit credentials | +| `java: command not found` | Install a JDK 8+ (Microsoft Build of OpenJDK or Temurin) and ensure `JAVA_HOME` is set. | +| `mvn: command not found` | Install Maven 3.6+ and add it to `PATH`. | +| `CONTENTUNDERSTANDING_ENDPOINT` is null at runtime | Make sure you ran `set -a && source .env && set +a` in the same terminal before `mvn exec:java`. | +| `Access denied` / 401 errors | Check API key is correct, or run `az login` if using DefaultAzureCredential. Verify the `Cognitive Services User` role is assigned. | +| `Model deployment not found` | Deploy required models in Microsoft Foundry and run `Sample00_UpdateDefaults`. | +| Changes to `.env` not taking effect | Re-run `set -a && source .env && set +a` — changes are not auto-reloaded. | +| `.env` committed to git | Add `.env` to `.gitignore` — never commit credentials. | ## Security Notes - **Never commit `.env` files** to version control. Ensure `.gitignore` includes `.env`. - Prefer **DefaultAzureCredential** over API keys when possible. - If using API keys, rotate them regularly via the Azure Portal. +- When displaying `.env` contents back to the user for confirmation, **mask API keys** (e.g., show only the first 4 characters + `***`). ## Related Skills - `cu-sdk-sample-run` — Run SDK samples (uses the env vars configured here) - `cu-sdk-common-knowledge` — Domain knowledge for Content Understanding concepts + +## Additional Resources + +- [SDK README](../../../README.md) - Full documentation +- [Samples README](../../../src/samples/README.md) - Sample descriptions +- [Product Documentation](https://learn.microsoft.com/azure/ai-services/content-understanding/) +- [Prebuilt Analyzers](https://learn.microsoft.com/azure/ai-services/content-understanding/concepts/prebuilt-analyzers) From c44e5da102a4771f809f89736da465432a5cbb80 Mon Sep 17 00:00:00 2001 From: aluneth Date: Thu, 23 Apr 2026 23:12:41 +0800 Subject: [PATCH 06/19] Update SKILL.md to clarify Maven installation commands and add troubleshooting notes for Java SDK setup Co-authored-by: Copilot --- .../.github/skills/cu-sdk-setup/SKILL.md | 23 +++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md index b8acfcc73549..9357d4ee608e 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md @@ -90,11 +90,13 @@ mvn dependency:resolve **Option B: Local install from source:** ```bash -mvn install -DskipTests +mvn install -DskipTests -Djacoco.skip=true ``` This compiles the SDK and all sample sources under `src/samples/java`. +> **Note:** `-Djacoco.skip=true` is required because the default build enforces a minimum test coverage ratio. When `-DskipTests` is set, no coverage data is produced and the jacoco `check` goal would fail the build. Skipping jacoco is safe for environment setup / running samples. + > **[ASK USER] Installation check:** > After running the command, ask: "Did the build complete with `BUILD SUCCESS`?" If the user reports errors (e.g., dependency resolution failures, JDK version mismatches), help troubleshoot before continuing. @@ -290,9 +292,14 @@ set -a && source .env && set +a > - If no, let them know they'll need to run it before using prebuilt analyzers. ```bash -mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample00_UpdateDefaults" +mvn exec:java \ + -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample00_UpdateDefaults" \ + -Dexec.classpathScope=test \ + -Djacoco.skip=true -q ``` +> **Note:** `-Dexec.classpathScope=test` is required because sample sources live under `src/samples/java` and are compiled into the test classpath. Without it you will get `ClassNotFoundException: com.azure.ai.contentunderstanding.samples.Sample00_UpdateDefaults`. + This is a **one-time setup per Microsoft Foundry resource**. ### Step 7: Run Samples @@ -310,12 +317,18 @@ This is a **one-time setup per Microsoft Foundry resource**. **Sync sample:** ```bash set -a && source .env && set +a -mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrl" +mvn exec:java \ + -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrl" \ + -Dexec.classpathScope=test \ + -Djacoco.skip=true -q ``` **Async sample (same package, `*Async` suffix):** ```bash -mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrlAsync" +mvn exec:java \ + -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrlAsync" \ + -Dexec.classpathScope=test \ + -Djacoco.skip=true -q ``` For a more fluent experience, use the sample-run helper skill: @@ -349,6 +362,8 @@ For a more fluent experience, use the sample-run helper skill: | `java: command not found` | Install a JDK 8+ (Microsoft Build of OpenJDK or Temurin) and ensure `JAVA_HOME` is set. | | `mvn: command not found` | Install Maven 3.6+ and add it to `PATH`. | | `CONTENTUNDERSTANDING_ENDPOINT` is null at runtime | Make sure you ran `set -a && source .env && set +a` in the same terminal before `mvn exec:java`. | +| `ClassNotFoundException: ...samples.SampleXX_...` | Add `-Dexec.classpathScope=test` to the `mvn exec:java` command. Samples live under `src/samples/java` (test classpath). | +| `jacoco-maven-plugin:...:check` fails after `mvn install -DskipTests` | Add `-Djacoco.skip=true`. Skipping tests produces no coverage data, which fails the coverage check. | | `Access denied` / 401 errors | Check API key is correct, or run `az login` if using DefaultAzureCredential. Verify the `Cognitive Services User` role is assigned. | | `Model deployment not found` | Deploy required models in Microsoft Foundry and run `Sample00_UpdateDefaults`. | | Changes to `.env` not taking effect | Re-run `set -a && source .env && set +a` — changes are not auto-reloaded. | From e0332ec5b261fff7e4756cb7a85e264525908247 Mon Sep 17 00:00:00 2001 From: Changjian Wang Date: Fri, 24 Apr 2026 10:18:14 +0800 Subject: [PATCH 07/19] Add setup script for Azure AI Content Understanding Java SDK environment Co-authored-by: Copilot --- .../.github/skills/cu-sdk-setup/SKILL.md | 110 +++++++---- .../cu-sdk-setup/scripts/setup_user_env.sh | 173 ++++++++++++++++++ 2 files changed, 252 insertions(+), 31 deletions(-) create mode 100644 sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/scripts/setup_user_env.sh diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md index 9357d4ee608e..002d500872bb 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md @@ -7,8 +7,6 @@ description: Guide SDK users through setting up their Java environment for Azure Set up your Java environment to use the Azure AI Content Understanding SDK and run samples. -> **Note:** This skill is for SDK users who want to run samples and use the SDK. For SDK development (regenerating code, running tests, pushing recordings), see the sibling `sdkinternal-java-*` skills. - > **[COPILOT INTERACTION MODEL]:** This skill is designed to be interactive. At each step marked with **[ASK USER]**, pause execution and prompt the user for input or confirmation before proceeding. Do NOT silently skip these prompts. Use the `ask_questions` tool when available. ## Prerequisites @@ -80,8 +78,8 @@ mvn -version # Should print 3.6.x or higher > **[ASK USER] Installation mode:** > Ask the user: "How would you like to install the SDK?" -> - **Option A: Use the published artifact (recommended)** — Maven will download `com.azure:azure-ai-contentunderstanding` from Maven Central. Best for running samples as-is. -> - **Option B: Local build (for development)** — Installs the current source tree into your local Maven repo so changes are picked up immediately. +> - **Option A: Use the published artifact (recommended)** — Maven will download `com.azure:azure-ai-contentunderstanding` from Maven Central. Best for running samples and developing Content Understanding-based solutions using the SDK. +> - **Option B: Local build (for Content Understanding SDK contribution)** — Use this only when you are contributing to the Content Understanding SDK. Installs the current source tree into your local Maven repo so changes are reflected immediately without reinstalling. **Option A: Download dependencies only:** ```bash @@ -139,11 +137,11 @@ if (Test-Path ".env") { > **[ASK USER] Authentication method:** > Ask: "How would you like to **authenticate** with Azure?" -> - **Option A: DefaultAzureCredential (recommended)** — Uses `az login`, managed identity, or other Azure credential chain. No API key needed. -> - **Option B: API Key** — You'll need your `CONTENTUNDERSTANDING_KEY` from the Azure Portal. +> - **Option A: API Key** — You'll need your `CONTENTUNDERSTANDING_KEY` from the Azure Portal. +> - **Option B: DefaultAzureCredential (recommended)** — Uses `az login`, managed identity, or other Azure credential chain. No API key needed. > -> If Option A: Remind the user to run `az login` before invoking samples. Leave `CONTENTUNDERSTANDING_KEY` empty. -> If Option B: Ask for the key value (retrievable at Azure Portal → Foundry resource → Keys and Endpoint → Key1 or Key2). +> If Option A: Ask for the key value (retrievable at Azure Portal → Foundry resource → Keys and Endpoint → Key1 or Key2). +> If Option B: Remind the user to run `az login` before invoking samples. Leave `CONTENTUNDERSTANDING_KEY` empty. > **[ASK USER] Model deployment names:** > Ask: "What are your **model deployment names**? Press Enter to accept each default." @@ -157,11 +155,11 @@ if (Test-Path ".env") { > Ask: "Do you plan to use **cross-resource analyzer copying** (`Sample15_GrantCopyAuth`)?" > - If no: Skip this section. > - If yes: Gather the following additional values: -> 1. Source resource ID (ARM resource ID) +> 1. Source Azure Resource Manager (ARM) resource ID — the full `/subscriptions/.../resourceGroups/.../providers/Microsoft.CognitiveServices/accounts/` path (find it at Azure Portal → your Foundry resource → **Overview** → **JSON View** → `id`) > 2. Source region (e.g., `eastus`) > 3. Target endpoint URL > 4. Target API key (or empty for DefaultAzureCredential) -> 5. Target resource ID (ARM resource ID) +> 5. Target ARM resource ID (same format as above, for the target Foundry resource) > 6. Target region (e.g., `swedencentral`) #### 4.3 Validate Configuration @@ -339,21 +337,78 @@ For a more fluent experience, use the sample-run helper skill: > **[ASK USER] Sample result:** > After running a sample, ask: "Did the sample run successfully? Would you like to run another sample or are you all set?" +## Automated Setup Script (Linux/macOS) + +Run the interactive setup script that handles Steps 2–4 automatically: + +```bash +# From the package directory +cd sdk/contentunderstanding/azure-ai-contentunderstanding +.github/skills/cu-sdk-setup/scripts/setup_user_env.sh +``` + +The script will: +1. Verify `java` and `mvn` are available +2. Install SDK dependencies (prompt for `mvn dependency:resolve` vs. `mvn install -DskipTests -Djacoco.skip=true`) +3. Create `.env` (without overwriting existing) and interactively prompt for: + - `CONTENTUNDERSTANDING_ENDPOINT` + - Authentication method (DefaultAzureCredential or API key) + - Model deployment names (with sensible defaults) + - Optional cross-resource copy vars (Sample15) +4. Print next-step commands for loading `.env`, running `Sample00_UpdateDefaults`, and running samples. + +> **Note:** The script does **not** load `.env` into your shell for you — you must still run `set -a && source .env && set +a` before invoking `mvn exec:java`, because Java samples read values via `System.getenv()`. + +### Manual Quick Setup + +If you prefer to run steps manually: + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding + +# Verify toolchain +java -version +mvn -version + +# Resolve dependencies (Option A) — or run the Option B `mvn install` command from Step 3. +mvn dependency:resolve + +# Create .env if absent (no env.sample exists — use the template from Step 4.4) +if [ ! -f ".env" ]; then + cat > .env <<'EOF' +CONTENTUNDERSTANDING_ENDPOINT=https://.services.ai.azure.com/ +CONTENTUNDERSTANDING_KEY= +GPT_4_1_DEPLOYMENT=gpt-4.1 +GPT_4_1_MINI_DEPLOYMENT=gpt-4.1-mini +TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=text-embedding-3-large +EOF + echo "Created .env — please edit and configure required variables" +else + echo "WARNING: .env already exists — skipping creation" +fi + +# Load .env into the current shell before running samples +set -a && source .env && set +a +``` + ## Environment Variable Reference -| Variable | Required By | Description | -|----------|------------|-------------| -| `CONTENTUNDERSTANDING_ENDPOINT` | **All samples** | Microsoft Foundry resource endpoint URL | -| `CONTENTUNDERSTANDING_KEY` | All (optional) | API key. If empty, `DefaultAzureCredential` is used (run `az login` first) | -| `GPT_4_1_DEPLOYMENT` | Sample00_UpdateDefaults | GPT-4.1 deployment name (default: `gpt-4.1`) | -| `GPT_4_1_MINI_DEPLOYMENT` | Sample00_UpdateDefaults | GPT-4.1-mini deployment name (default: `gpt-4.1-mini`) | -| `TEXT_EMBEDDING_3_LARGE_DEPLOYMENT` | Sample00_UpdateDefaults | text-embedding-3-large deployment name (default: `text-embedding-3-large`) | -| `CONTENTUNDERSTANDING_SOURCE_RESOURCE_ID` | Sample15_GrantCopyAuth | Source ARM resource ID | -| `CONTENTUNDERSTANDING_SOURCE_REGION` | Sample15_GrantCopyAuth | Source region (e.g., `eastus`) | -| `CONTENTUNDERSTANDING_TARGET_ENDPOINT` | Sample15_GrantCopyAuth | Target Foundry endpoint for cross-resource copy | -| `CONTENTUNDERSTANDING_TARGET_KEY` | Sample15_GrantCopyAuth (optional) | Target API key (empty = DefaultAzureCredential) | -| `CONTENTUNDERSTANDING_TARGET_RESOURCE_ID` | Sample15_GrantCopyAuth | Target ARM resource ID | -| `CONTENTUNDERSTANDING_TARGET_REGION` | Sample15_GrantCopyAuth | Target region (e.g., `swedencentral`) | +Required for all samples: + +- `CONTENTUNDERSTANDING_ENDPOINT` — Microsoft Foundry resource endpoint URL. +- `CONTENTUNDERSTANDING_KEY` — API key. Leave empty to use `DefaultAzureCredential` (run `az login` first). + +Required for `Sample00_UpdateDefaults` (one-time model mapping): + +- `GPT_4_1_DEPLOYMENT` (default: `gpt-4.1`) +- `GPT_4_1_MINI_DEPLOYMENT` (default: `gpt-4.1-mini`) +- `TEXT_EMBEDDING_3_LARGE_DEPLOYMENT` (default: `text-embedding-3-large`) + +Required for `Sample15_GrantCopyAuth` (cross-resource analyzer copy) only: + +- `CONTENTUNDERSTANDING_SOURCE_RESOURCE_ID`, `CONTENTUNDERSTANDING_SOURCE_REGION` +- `CONTENTUNDERSTANDING_TARGET_ENDPOINT`, `CONTENTUNDERSTANDING_TARGET_KEY` (optional) +- `CONTENTUNDERSTANDING_TARGET_RESOURCE_ID`, `CONTENTUNDERSTANDING_TARGET_REGION` ## Troubleshooting @@ -369,16 +424,9 @@ For a more fluent experience, use the sample-run helper skill: | Changes to `.env` not taking effect | Re-run `set -a && source .env && set +a` — changes are not auto-reloaded. | | `.env` committed to git | Add `.env` to `.gitignore` — never commit credentials. | -## Security Notes - -- **Never commit `.env` files** to version control. Ensure `.gitignore` includes `.env`. -- Prefer **DefaultAzureCredential** over API keys when possible. -- If using API keys, rotate them regularly via the Azure Portal. -- When displaying `.env` contents back to the user for confirmation, **mask API keys** (e.g., show only the first 4 characters + `***`). - ## Related Skills -- `cu-sdk-sample-run` — Run SDK samples (uses the env vars configured here) +- `cu-sdk-sample-run` — Run individual samples (including `Sample00_UpdateDefaults` for model deployment setup) - `cu-sdk-common-knowledge` — Domain knowledge for Content Understanding concepts ## Additional Resources diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/scripts/setup_user_env.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/scripts/setup_user_env.sh new file mode 100644 index 000000000000..ae13ec0edd36 --- /dev/null +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/scripts/setup_user_env.sh @@ -0,0 +1,173 @@ +#!/bin/bash +# Setup script for Azure AI Content Understanding Java SDK users +# This script sets up the environment for running samples (JDK + Maven based). + +set -e + +# Determine script directory and package root +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PACKAGE_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)" + +echo "=== Azure AI Content Understanding (Java) - User Environment Setup ===" +echo "Package root: $PACKAGE_ROOT" +echo "" + +cd "$PACKAGE_ROOT" + +# Step 1: Verify toolchain +echo "Step 1: Verifying toolchain..." +if ! command -v java >/dev/null 2>&1; then + echo " ✗ 'java' not found in PATH. Install JDK 8+ (e.g., Microsoft Build of OpenJDK or Temurin)." + exit 1 +fi +if ! command -v mvn >/dev/null 2>&1; then + echo " ✗ 'mvn' not found in PATH. Install Apache Maven 3.6+." + exit 1 +fi +echo " ✓ Java: $(java -version 2>&1 | head -n 1)" +echo " ✓ Maven: $(mvn -version 2>&1 | head -n 1)" +echo "" + +# Step 2: Install SDK dependencies +echo "Step 2: Installing SDK dependencies..." +read -p "Installation mode — (A) Download deps only (recommended) | (B) Local build from source [A/b]: " install_mode +install_mode="${install_mode:-A}" +if [[ "$install_mode" =~ ^[Bb]$ ]]; then + echo " Running: mvn install -DskipTests -Djacoco.skip=true" + mvn install -DskipTests -Djacoco.skip=true -q +else + echo " Running: mvn dependency:resolve" + mvn dependency:resolve -q +fi +echo " ✓ Dependencies ready" +echo "" + +# Step 3: Configure .env file +echo "Step 3: Configuring .env file..." +ENV_FILE="$PACKAGE_ROOT/.env" +if [ -f "$ENV_FILE" ]; then + echo " ⚠ .env file already exists — NOT overwriting." + echo " To start fresh, delete it manually: rm \"$ENV_FILE\"" + read -p " Continue with existing .env? (Y/n): " keep_env + if [[ "$keep_env" =~ ^[Nn]$ ]]; then + echo " Aborting. Remove .env and re-run this script." + exit 1 + fi + CREATE_ENV=false +else + CREATE_ENV=true +fi + +if [ "$CREATE_ENV" = true ]; then + read -p "Would you like to configure variables interactively now? (Y/n): " configure_now + configure_now="${configure_now:-Y}" + if [[ "$configure_now" =~ ^[Yy]$ ]]; then + # CONTENTUNDERSTANDING_ENDPOINT + read -p " CONTENTUNDERSTANDING_ENDPOINT (e.g., https://.services.ai.azure.com/): " endpoint + + # Auth method + echo " Authentication:" + echo " (A) DefaultAzureCredential via 'az login' (recommended)" + echo " (B) API Key" + read -p " Choose [A/b]: " auth_mode + auth_mode="${auth_mode:-A}" + api_key="" + if [[ "$auth_mode" =~ ^[Bb]$ ]]; then + read -p " CONTENTUNDERSTANDING_KEY: " api_key + else + echo " ℹ Using DefaultAzureCredential — remember to run 'az login' before invoking samples." + fi + + # Deployment names + read -p " GPT_4_1_DEPLOYMENT (default: gpt-4.1): " gpt41 + gpt41="${gpt41:-gpt-4.1}" + read -p " GPT_4_1_MINI_DEPLOYMENT (default: gpt-4.1-mini): " gpt41mini + gpt41mini="${gpt41mini:-gpt-4.1-mini}" + read -p " TEXT_EMBEDDING_3_LARGE_DEPLOYMENT (default: text-embedding-3-large): " embedding + embedding="${embedding:-text-embedding-3-large}" + + # Cross-resource copy + read -p " Configure cross-resource copy vars for Sample15? (y/N): " want_copy + src_rid=""; src_region=""; tgt_ep=""; tgt_key=""; tgt_rid=""; tgt_region="" + if [[ "$want_copy" =~ ^[Yy]$ ]]; then + read -p " Source resource ID: " src_rid + read -p " Source region (e.g., eastus): " src_region + read -p " Target endpoint: " tgt_ep + read -p " Target API key (blank = DefaultAzureCredential): " tgt_key + read -p " Target resource ID: " tgt_rid + read -p " Target region (e.g., swedencentral): " tgt_region + fi + + cat > "$ENV_FILE" <> "$ENV_FILE" < "$ENV_FILE" <<'EOF' +# Azure AI Content Understanding - Environment Variables +# Fill in your values below. + +# Required: Your Microsoft Foundry resource endpoint +CONTENTUNDERSTANDING_ENDPOINT=https://.services.ai.azure.com/ + +# Optional: API key (leave empty to use DefaultAzureCredential via az login) +CONTENTUNDERSTANDING_KEY= + +# Model deployment names (used by Sample00_UpdateDefaults) +GPT_4_1_DEPLOYMENT=gpt-4.1 +GPT_4_1_MINI_DEPLOYMENT=gpt-4.1-mini +TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=text-embedding-3-large +EOF + echo " ✓ Wrote template to $ENV_FILE — please edit it before running samples." + fi +fi +echo "" + +# Summary +echo "=== Setup Complete ===" +echo "" +echo "Next steps:" +echo "" +echo " 1. Load .env into your current shell (Java reads System.getenv, so this is REQUIRED):" +echo " cd $PACKAGE_ROOT" +echo " set -a && source .env && set +a" +echo "" +echo " 2. (One-time per Foundry resource) Configure model defaults:" +echo " mvn exec:java \\" +echo " -Dexec.mainClass=\"com.azure.ai.contentunderstanding.samples.Sample00_UpdateDefaults\" \\" +echo " -Dexec.classpathScope=test -Djacoco.skip=true -q" +echo "" +echo " 3. Run a sample:" +echo " mvn exec:java \\" +echo " -Dexec.mainClass=\"com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrl\" \\" +echo " -Dexec.classpathScope=test -Djacoco.skip=true -q" +echo "" +echo " Or use the sample-run helper:" +echo " .github/skills/cu-sdk-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl --env .env" +echo "" From 2aa38c241a494d892e4f1dff424f23b7a9d16e1e Mon Sep 17 00:00:00 2001 From: aluneth Date: Sat, 25 Apr 2026 18:16:50 +0800 Subject: [PATCH 08/19] Enhance setup_user_env.sh script for Azure AI Content Understanding SDK - Added a helper function to offer installation of JDK and Maven via package managers. - Improved prerequisite checks for JDK and Maven, including automatic installation prompts. - Refactored dependency installation steps to avoid redundant prompts and added a marker for dependency resolution. - Enhanced .env file configuration with improved user prompts and automatic detection of existing model defaults. - Introduced a PowerShell loader script for easier environment variable loading on Windows. - Updated comments and code structure for better readability and maintainability. Co-authored-by: Copilot --- .../.github/skills/cu-sdk-setup/SKILL.md | 162 ++++-- .../cu-sdk-setup/scripts/setup_user_env.ps1 | 509 ++++++++++++++++++ .../cu-sdk-setup/scripts/setup_user_env.sh | 414 ++++++++++++-- 3 files changed, 995 insertions(+), 90 deletions(-) create mode 100644 sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/scripts/setup_user_env.ps1 diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md index 002d500872bb..52be6f6748e4 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md @@ -19,12 +19,42 @@ Before starting, ensure you have: - A **Microsoft Foundry resource** in a [supported region](https://learn.microsoft.com/azure/ai-services/content-understanding/language-region-support) - **Azure CLI** installed (recommended for `DefaultAzureCredential` auth via `az login`) +> **[COPILOT] Probe JDK/Maven runtime first (before asking):** +> Do not take the user's word for it — run these checks, then report. This prevents silent failures later during `mvn` operations. +> +> ```bash +> java -version 2>&1 | head -1 +> mvn -version 2>&1 | head -1 +> ``` +> +> **Decision table:** +> +> | Finding | Action | +> |---|---| +> | JDK 8+ and Maven 3.6+ both present | ✓ Good to go. Proceed to the `[ASK USER]` block below. | +> | `java` missing | Report the finding, then go to the **[ASK USER] JDK/Maven install choice** block below. | +> | JDK version < 8 | Report the finding, then go to the **[ASK USER] JDK/Maven install choice** block below. | +> | `mvn` missing | Report the finding, then go to the **[ASK USER] JDK/Maven install choice** block below. | +> | Maven version < 3.6 | Report the finding, then go to the **[ASK USER] JDK/Maven install choice** block below. | +> +> **[ASK USER] JDK/Maven install choice (only when probe fails):** +> Ask the user: "JDK or Maven is missing / too old. How would you like to proceed?" +> - **Option A: Install it for me** — Agent runs the platform-appropriate install command (see below), verifies, and continues. +> - **Option B: I'll install it myself** — Agent prints the install command for the user's platform and stops. User runs it, re-opens the terminal, and tells the agent to resume. +> +> **Default install commands (Option A):** +> - **macOS** → `brew install openjdk@21 maven` (requires Homebrew; if not installed, fall back to Option B) +> - **Debian / Ubuntu / WSL** → `sudo apt update && sudo apt install -y openjdk-21-jdk maven` +> - **Windows** → `winget install Microsoft.OpenJDK.21` and `winget install Apache.Maven` +> +> **Before running Option A, confirm with the user one more time** by restating the exact command that will execute, then proceed. After install, re-run the probe to verify JDK 8+ and Maven 3.6+ before continuing. +> +> Report the detected versions back to the user in one sentence before the `[ASK USER]` block below. + > **[ASK USER] Prerequisites Check:** -> Before proceeding, ask the user to confirm their prerequisites: -> 1. "Do you have **JDK 8+** installed? (`java -version`)" — If no, guide them to install a JDK first (e.g., Microsoft Build of OpenJDK, Temurin). -> 2. "Do you have **Maven 3.6+** installed? (`mvn -version`)" — If no, install Maven. -> 3. "Do you already have a **Microsoft Foundry resource** set up in Azure?" — If no, jump to **Step 5** (Azure Resource Setup) first, then return here. -> 4. "Have you already deployed the required **AI models** (GPT-4.1, GPT-4.1-mini, text-embedding-3-large) in Microsoft Foundry?" — If no, include Step 5.3 and Step 6 in the workflow. +> After the probe above, confirm the remaining items: +> 1. "Do you already have a **Microsoft Foundry resource** set up in Azure?" — If no, jump to **Step 5** (Azure Resource Setup) first, then return here. +> 2. "Have you already deployed the required **AI models** (GPT-4.1, GPT-4.1-mini, text-embedding-3-large) in Microsoft Foundry?" — If no, include Step 5.3 and Step 6 in the workflow. ## Package Directory @@ -34,13 +64,23 @@ sdk/contentunderstanding/azure-ai-contentunderstanding ## How Java Samples Use Environment Variables -Java samples read configuration via `System.getenv()`. The variables must be exported in the shell before running `mvn exec:java`. The `.env` file created by this skill can be sourced into your shell with: +Java samples read configuration via `System.getenv()`. The variables must be exported in the shell before running `mvn exec:java`. + +**Linux / macOS (bash / zsh):** ```bash set -a && source .env && set +a ``` -Or used with the optional helper script: +**Windows PowerShell (or PowerShell on macOS / Linux):** + +```powershell +. ./load-env.ps1 +``` + +The `load-env.ps1` helper is generated next to `.env` by the setup scripts. It strips matching surrounding single/double quotes (which the setup scripts add to make values bash-safe) before exporting, so values reach the JVM unquoted. + +Alternatively, use the optional sample-run helper: ```bash .github/skills/cu-sdk-sample-run/scripts/run_sample.sh --env .env @@ -54,7 +94,7 @@ Or used with the optional helper script: cd sdk/contentunderstanding/azure-ai-contentunderstanding ``` -### Step 2: Verify Toolchain +### Step 2: Pick Platform > **[ASK USER] Platform:** > Ask the user: "Which **platform** are you on?" with options: @@ -64,15 +104,13 @@ cd sdk/contentunderstanding/azure-ai-contentunderstanding > > Use their answer to show the correct commands throughout the rest of the setup. -Verify your toolchain: - -```bash -java -version # Should print 1.8.x or higher (11+, 17, 21 also fine) -mvn -version # Should print 3.6.x or higher -``` - -> **[ASK USER] Confirm toolchain:** -> Ask: "Did both commands print valid versions? If either is missing, install it before continuing." +> **[COPILOT] Toolchain already verified.** +> The JDK / Maven probe in the **Prerequisites** section above is the source of truth — do not re-ask the user to confirm `java -version` / `mvn -version` here. Reference command (only print if the user explicitly wants to recheck): +> +> ```bash +> java -version # JDK 8+ (11/17/21 LTS recommended) +> mvn -version # Maven 3.6+ +> ``` ### Step 3: Install SDK Dependencies @@ -98,6 +136,9 @@ This compiles the SDK and all sample sources under `src/samples/java`. > **[ASK USER] Installation check:** > After running the command, ask: "Did the build complete with `BUILD SUCCESS`?" If the user reports errors (e.g., dependency resolution failures, JDK version mismatches), help troubleshoot before continuing. +> **[COPILOT] Repeated-run behavior:** +> On repeated runs, if Maven reports that all dependencies are already downloaded (i.e., `mvn dependency:resolve` completes instantly with no downloads), the setup scripts may skip the dependency resolution step. Only rerun when dependencies are missing, the POM has changed, or the user is experiencing classpath issues. + ### Step 4: Configure Environment Variables #### 4.1 Check for Existing .env @@ -143,13 +184,56 @@ if (Test-Path ".env") { > If Option A: Ask for the key value (retrievable at Azure Portal → Foundry resource → Keys and Endpoint → Key1 or Key2). > If Option B: Remind the user to run `az login` before invoking samples. Leave `CONTENTUNDERSTANDING_KEY` empty. -> **[ASK USER] Model deployment names:** -> Ask: "What are your **model deployment names**? Press Enter to accept each default." -> - GPT-4.1 deployment name (default: `gpt-4.1`) → `GPT_4_1_DEPLOYMENT` -> - GPT-4.1-mini deployment name (default: `gpt-4.1-mini`) → `GPT_4_1_MINI_DEPLOYMENT` -> - text-embedding-3-large deployment name (default: `text-embedding-3-large`) → `TEXT_EMBEDDING_3_LARGE_DEPLOYMENT` +> **[COPILOT] Probe existing model defaults on the Foundry resource:** +> Before asking the user for deployment names, probe what the resource already has configured. Use `curl` with the endpoint and credentials gathered above. +> +> ```bash +> probe_endpoint="${CONTENTUNDERSTANDING_ENDPOINT%/}" +> http_code="" +> body="" +> if [ -n "$CONTENTUNDERSTANDING_KEY" ]; then +> probe_response=$(curl -s -w "\n%{http_code}" \ +> -H "Ocp-Apim-Subscription-Key: $CONTENTUNDERSTANDING_KEY" \ +> "$probe_endpoint/contentunderstanding/defaults?api-version=2025-11-01") +> else +> token=$(az account get-access-token --resource https://cognitiveservices.azure.com --query accessToken -o tsv 2>/dev/null) +> if [ -z "$token" ]; then +> # Short-circuit: skip the curl call and go straight to the AUTH_ERROR branch below. +> http_code="401" +> body="" +> else +> probe_response=$(curl -s -w "\n%{http_code}" \ +> -H "Authorization: Bearer $token" \ +> "$probe_endpoint/contentunderstanding/defaults?api-version=2025-11-01") +> fi +> fi +> if [ -z "$http_code" ]; then +> http_code=$(echo "$probe_response" | tail -1) +> body=$(echo "$probe_response" | sed '$d') +> fi +> ``` +> +> Branch on the HTTP status and response body: +> +> | HTTP code | Meaning | Action | +> |-----------|---------|--------| +> | `200` + all 3 models present in `modelDeployments` | **ALL_SET** | Show the detected values and ask *"Detected existing defaults: gpt-4.1=``, gpt-4.1-mini=``, text-embedding-3-large=``. Use these? (Y/n)"*. On Y, prefill the 3 env vars and **skip Step 6** (defaults already configured). On n, fall through to the per-model prompts below. | +> | `200` + some models present | **PARTIAL** | Prefill the ones that are set. For missing models, ask per-item with the default shown below. After Step 4 completes, run Step 6 to fill the gaps. | +> | `200` + no models | **NONE** | Fall through to the per-model prompts below. Step 6 will configure them. | +> | `401` / `403` | **AUTH_ERROR** | Print a one-line warning: *"Probe unavailable (auth failed). If you're using DefaultAzureCredential, run `az login` and ensure the Cognitive Services User role is assigned. Continuing with manual entry."* Fall through to per-model prompts. | +> | other | Unexpected error | Print *"Probe failed. Continuing with manual entry."* Fall through. | > -> These are required by `Sample00_UpdateDefaults` (one-time mapping setup). +> Only proceed to the per-model prompts below when the probe outcome requires it. +> +> The `setup_user_env.sh` / `setup_user_env.ps1` scripts implement this probe with hardened error handling (connect/read timeouts, transport-failure fallbacks). The pseudocode above is a conceptual sketch — treat the scripts as the source of truth. + +> **[ASK USER] Model deployment names (only when probe did not yield all values):** +> For each model not already prefilled from the probe, ask with a sensible default: +> - "What is your **GPT-4.1** deployment name?" (default: `gpt-4.1`) → `GPT_4_1_DEPLOYMENT` +> - "What is your **GPT-4.1-mini** deployment name?" (default: `gpt-4.1-mini`) → `GPT_4_1_MINI_DEPLOYMENT` +> - "What is your **text-embedding-3-large** deployment name?" (default: `text-embedding-3-large`) → `TEXT_EMBEDDING_3_LARGE_DEPLOYMENT` +> +> If the user prefers to configure these later, let them know they can run `Sample00_UpdateDefaults` (Step 6) anytime before using prebuilt analyzers. > **[ASK USER] Cross-resource copy (optional):** > Ask: "Do you plan to use **cross-resource analyzer copying** (`Sample15_GrantCopyAuth`)?" @@ -284,6 +368,9 @@ set -a && source .env && set +a #### 6.2 Run Sample00_UpdateDefaults +> **[COPILOT] Skip condition:** +> If the Step 4.2 probe returned **ALL_SET** and the user accepted the detected values, defaults are already configured on the Foundry resource — skip this step and tell the user *"Your Foundry resource already has model defaults configured; skipping Step 6.2."* Otherwise continue below. + > **[ASK USER] Run model defaults?:** > Ask: "Would you like to run `Sample00_UpdateDefaults` now to configure model defaults? This is a **one-time setup** per Microsoft Foundry resource. (Yes / Skip for now)" > - If yes, ensure deployment name env vars are set, then run the sample. @@ -304,13 +391,15 @@ This is a **one-time setup per Microsoft Foundry resource**. > **[ASK USER] Which samples?:** > Ask: "Which sample would you like to run first?" with options: -> - `Sample02_AnalyzeUrl` — Analyze content from a URL (recommended start) -> - `Sample01_AnalyzeBinary` — Analyze a local file +> - `Sample01_AnalyzeBinary` — Analyze a local PDF (quickest; completes in under a minute) +> - `Sample02_AnalyzeUrl` — Full demo: document + video + audio + image from URLs (runs several analyses; takes a few minutes, please be patient) > - `Sample03_AnalyzeInvoice` — Extract invoice fields > - Other — Let me see the full list > - Skip — I'll run samples on my own later > > If the user picks "Other", list available samples from `src/samples/java/com/azure/ai/contentunderstanding/samples/`. +> +> **[COPILOT] Timing note (do not parrot verbatim to user):** `Sample02_AnalyzeUrl` runs multiple sequential LROs (document + video + audio + image, with multiple content-range variants). Video/audio chapter generation is slow on the service side, so total runtime can be on the order of 15+ minutes today. Do not interpret quiet periods (no stdout for several minutes during a video/audio LRO) as a hang. Only consider killing if there is **no new stdout for 5+ minutes** AND no active HTTP traffic. When talking to the user, prefer phrasing like "takes a few minutes" or "please be patient" rather than citing exact large minute counts. **Sync sample:** ```bash @@ -348,16 +437,23 @@ cd sdk/contentunderstanding/azure-ai-contentunderstanding ``` The script will: -1. Verify `java` and `mvn` are available -2. Install SDK dependencies (prompt for `mvn dependency:resolve` vs. `mvn install -DskipTests -Djacoco.skip=true`) +1. Check `java` and `mvn` prerequisites (with offer to install if missing) +2. Install SDK dependencies (skip if already resolved; prompt for `mvn dependency:resolve` vs. `mvn install -DskipTests -Djacoco.skip=true`) 3. Create `.env` (without overwriting existing) and interactively prompt for: - `CONTENTUNDERSTANDING_ENDPOINT` - Authentication method (DefaultAzureCredential or API key) - - Model deployment names (with sensible defaults) + - Probe existing model defaults on the Foundry resource (skip manual entry if all set) + - Model deployment names (with sensible defaults, pre-filled from probe when available) - Optional cross-resource copy vars (Sample15) 4. Print next-step commands for loading `.env`, running `Sample00_UpdateDefaults`, and running samples. -> **Note:** The script does **not** load `.env` into your shell for you — you must still run `set -a && source .env && set +a` before invoking `mvn exec:java`, because Java samples read values via `System.getenv()`. +**Windows PowerShell:** +```powershell +cd sdk\contentunderstanding\azure-ai-contentunderstanding +.github\skills\cu-sdk-setup\scripts\setup_user_env.ps1 +``` + +> **Note:** The script does **not** load `.env` into your shell for you — you must still load it before invoking `mvn exec:java`, because Java samples read values via `System.getenv()`. Use `set -a && source .env && set +a` in bash, or `. ./load-env.ps1` in PowerShell. ### Manual Quick Setup @@ -416,13 +512,17 @@ Required for `Sample15_GrantCopyAuth` (cross-resource analyzer copy) only: |---------|----------| | `java: command not found` | Install a JDK 8+ (Microsoft Build of OpenJDK or Temurin) and ensure `JAVA_HOME` is set. | | `mvn: command not found` | Install Maven 3.6+ and add it to `PATH`. | -| `CONTENTUNDERSTANDING_ENDPOINT` is null at runtime | Make sure you ran `set -a && source .env && set +a` in the same terminal before `mvn exec:java`. | +| `CONTENTUNDERSTANDING_ENDPOINT` is null at runtime | Load `.env` in the same terminal before `mvn exec:java`: `set -a && source .env && set +a` (bash) or `. ./load-env.ps1` (PowerShell). | | `ClassNotFoundException: ...samples.SampleXX_...` | Add `-Dexec.classpathScope=test` to the `mvn exec:java` command. Samples live under `src/samples/java` (test classpath). | | `jacoco-maven-plugin:...:check` fails after `mvn install -DskipTests` | Add `-Djacoco.skip=true`. Skipping tests produces no coverage data, which fails the coverage check. | | `Access denied` / 401 errors | Check API key is correct, or run `az login` if using DefaultAzureCredential. Verify the `Cognitive Services User` role is assigned. | | `Model deployment not found` | Deploy required models in Microsoft Foundry and run `Sample00_UpdateDefaults`. | -| Changes to `.env` not taking effect | Re-run `set -a && source .env && set +a` — changes are not auto-reloaded. | +| Changes to `.env` not taking effect | Re-run the loader (`set -a && source .env && set +a` in bash, or `. ./load-env.ps1` in PowerShell) — changes are not auto-reloaded. | | `.env` committed to git | Add `.env` to `.gitignore` — never commit credentials. | +| Probe returns 401/403 even after `az login` | Assign the `Cognitive Services User` role on the Foundry resource to your account in Azure Portal → Access Control (IAM). | +| `load-env.ps1` reports `'.env' not found` | Run the loader from the package root (`sdk/contentunderstanding/azure-ai-contentunderstanding`), not a subdirectory. | +| Re-running the setup script doesn’t change my `.env` | The script never overwrites an existing `.env`. Delete it first (`rm .env`) and re-run, or edit it manually. | +| `ClassNotFoundException` after a clean tree change despite the script reporting “deps already resolved” | Stale marker. Delete `target/.cu-setup-deps-ok` (or run `mvn clean`) and re-run the setup script. | ## Related Skills diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/scripts/setup_user_env.ps1 b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/scripts/setup_user_env.ps1 new file mode 100644 index 000000000000..10d5bf09e55d --- /dev/null +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/scripts/setup_user_env.ps1 @@ -0,0 +1,509 @@ +# Setup script for Azure AI Content Understanding Java SDK users (PowerShell) +# Mirrors scripts/setup_user_env.sh for Windows / cross-platform PowerShell. + +[CmdletBinding()] +param() + +$ErrorActionPreference = 'Stop' + +# Determine script directory and package root +$ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path +$PackageRoot = (Resolve-Path (Join-Path $ScriptDir '..\..\..\..')).Path + +Write-Host "=== Azure AI Content Understanding (Java) - User Environment Setup ===" +Write-Host "Package root: $PackageRoot" +Write-Host "" + +Set-Location $PackageRoot + +# --- helper: offer to install JDK/Maven via the platform's package manager --- +function Invoke-OfferInstallTool { + param([string]$Tool) # 'jdk' | 'maven' + $isWin = $IsWindows -or $PSVersionTable.PSEdition -eq 'Desktop' + $cmds = @() + if ($isWin) { + $winget = Get-Command winget -ErrorAction SilentlyContinue + if (-not $winget) { + Write-Host " (winget not found — install JDK/Maven manually.)" + return $false + } + switch ($Tool) { + 'jdk' { $cmds = @('winget install -e --id Microsoft.OpenJDK.21 --accept-source-agreements --accept-package-agreements') } + 'maven' { $cmds = @('winget install -e --id Apache.Maven --accept-source-agreements --accept-package-agreements') } + } + } elseif ($IsMacOS) { + $brew = Get-Command brew -ErrorAction SilentlyContinue + if (-not $brew) { + Write-Host " (Homebrew not found — install it first: https://brew.sh/)" + return $false + } + switch ($Tool) { + 'jdk' { $cmds = @('brew install openjdk@21') } + 'maven' { $cmds = @('brew install maven') } + } + } elseif ($IsLinux) { + $apt = Get-Command apt-get -ErrorAction SilentlyContinue + if (-not $apt) { + Write-Host " (No apt-get detected — install JDK/Maven with your distro's package manager.)" + return $false + } + switch ($Tool) { + 'jdk' { $cmds = @('sudo apt-get update && sudo apt-get install -y openjdk-21-jdk') } + 'maven' { $cmds = @('sudo apt-get update && sudo apt-get install -y maven') } + } + } else { + Write-Host " (Unsupported platform for auto-install.)" + return $false + } + + Write-Host "" + Write-Host " This script can run the following command(s) for you:" + foreach ($c in $cmds) { Write-Host " $c" } + $reply = Read-Host " Run them now? (y/N)" + if ($reply -notmatch '^[Yy]$') { + Write-Host " Please run them yourself, then re-run this script." + return $false + } + + foreach ($c in $cmds) { + try { + if ($isWin) { + Invoke-Expression $c + } else { + bash -lc $c + } + if ($LASTEXITCODE -ne 0) { + Write-Host " [FAIL] Command failed (exit $LASTEXITCODE): $c" + return $false + } + } catch { + Write-Host " [FAIL] Command failed: $c" + Write-Host " $_" + return $false + } + } + Write-Host " [OK] Installation complete. Re-probing..." + return $true +} + +# Step 0: Prerequisites check (JDK 8+ and Maven 3.6+) +Write-Host "Step 0: Checking prerequisites..." +$attempt = 1 +while ($true) { + $failReason = $null + $needTool = $null + $javaVerLine = $null + $mvnVerLine = $null + + $javaBin = Get-Command java -ErrorAction SilentlyContinue + if (-not $javaBin) { + Write-Host " [FAIL] 'java' not found on PATH." + $failReason = 'missing' + $needTool = 'jdk' + } else { + $javaVerLine = (& java -version 2>&1 | Select-Object -First 1).ToString() + if ($javaVerLine -match 'version "?(\d[\d.]+)') { + $javaVer = $Matches[1] + $javaMajor = [int]($javaVer -split '\.')[0] + if ($javaMajor -eq 1) { $javaMajor = [int]($javaVer -split '\.')[1] } + if ($javaMajor -lt 8) { + Write-Host " [FAIL] Found Java '$javaVerLine', need JDK 8+." + $failReason = 'too_old' + $needTool = 'jdk' + } + } else { + Write-Host " [FAIL] Cannot parse Java version from '$javaVerLine'." + $failReason = 'missing' + $needTool = 'jdk' + } + } + + if (-not $failReason) { + $mvnBin = Get-Command mvn -ErrorAction SilentlyContinue + if (-not $mvnBin) { + Write-Host " [FAIL] 'mvn' not found on PATH." + $failReason = 'missing' + $needTool = 'maven' + } else { + $mvnVerLine = (& mvn -version 2>&1 | Select-Object -First 1).ToString() + if ($mvnVerLine -match 'Maven (\d[\d.]+)') { + $mvnVer = $Matches[1] + $parts = $mvnVer -split '\.' + $mvnMajor = [int]$parts[0]; $mvnMinor = [int]$parts[1] + if ($mvnMajor -lt 3 -or ($mvnMajor -eq 3 -and $mvnMinor -lt 6)) { + Write-Host " [FAIL] Found Maven '$mvnVer', need 3.6+." + $failReason = 'too_old' + $needTool = 'maven' + } + } else { + Write-Host " [FAIL] Cannot parse Maven version from '$mvnVerLine'." + $failReason = 'missing' + $needTool = 'maven' + } + } + } + + if (-not $failReason) { + Write-Host " [OK] Java: $javaVerLine" + Write-Host " [OK] Maven: $mvnVerLine" + break + } + + if ($attempt -ge 2) { + Write-Host " [FAIL] Prerequisites still not satisfied after install attempt. Aborting." + exit 1 + } + if (-not (Invoke-OfferInstallTool -Tool $needTool)) { + exit 1 + } + $attempt++ +} +Write-Host "" + +# Marker written after a successful dependency resolution / install. Mirrors +# the .sh script. Removed by `mvn clean`; pom.xml mtime invalidates it. +$DepsMarker = Join-Path 'target' '.cu-setup-deps-ok' + +function Test-DepsMarkerValid { + if (-not (Test-Path $DepsMarker)) { return $false } + if (-not (Test-Path 'pom.xml')) { return $true } + $markerTime = (Get-Item $DepsMarker).LastWriteTimeUtc + $pomTime = (Get-Item 'pom.xml').LastWriteTimeUtc + return $markerTime -ge $pomTime +} + +# Step 1: Install SDK dependencies +Write-Host "Step 1: Installing SDK dependencies..." +if (Test-DepsMarkerValid) { + Write-Host " [OK] Dependencies already resolved (marker $DepsMarker present and up-to-date); skipping" + Write-Host " To force re-resolution: Remove-Item $DepsMarker (or run 'mvn clean')" +} else { + $modeChoice = Read-Host " Installation mode — (A) Download deps only (recommended) | (B) Local build from source [A/b]" + if ($modeChoice -match '^[Bb]$') { + Write-Host " Running: mvn install -DskipTests -Djacoco.skip=true" + & mvn install -DskipTests -Djacoco.skip=true -q + if ($LASTEXITCODE -ne 0) { Write-Host " [ERROR] mvn install failed." -ForegroundColor Red; exit 1 } + } else { + Write-Host " Running: mvn dependency:resolve" + & mvn dependency:resolve -q + if ($LASTEXITCODE -ne 0) { Write-Host " [ERROR] mvn dependency:resolve failed." -ForegroundColor Red; exit 1 } + } + if (-not (Test-Path 'target')) { New-Item -ItemType Directory -Path 'target' | Out-Null } + New-Item -ItemType File -Path $DepsMarker -Force | Out-Null + Write-Host " [OK] Dependencies ready" +} +Write-Host "" + +# Step 2: Configure .env file +Write-Host "Step 2: Configuring .env file..." +$envFile = Join-Path $PackageRoot '.env' +$createEnv = $true +if (Test-Path $envFile) { + Write-Host " [WARN] .env file already exists - NOT overwriting" + Write-Host " If you want to start fresh, delete .env manually: Remove-Item $envFile" + $keepEnv = Read-Host " Continue with existing .env? (Y/n)" + if ($keepEnv -match '^[Nn]$') { + Write-Host " Aborting. Remove .env and re-run this script." + exit 1 + } + $createEnv = $false +} + +# Escape a value for safe inclusion in a .env file consumed by +# `set -a && source .env && set +a` in bash. Wraps in single quotes +# and escapes internal single quotes as '\''. +# +# Contract (must stay in sync with setup_user_env.sh / load-env.ps1): +# - Every value written by this script is wrapped in single quotes. +# - Internal single quotes are encoded as the 4-char sequence: '\'' +# - bash `source .env` strips the wrapping quotes natively. +# - load-env.ps1 strips the wrapping quotes and reverses the '\'' escape. +function Format-EnvValue { + param([string]$Value) + if ($null -eq $Value) { $Value = '' } + $escaped = $Value -replace "'", "'\''" + return "'$escaped'" +} + +# Write a UTF-8 (no BOM) text file; cross-platform safe for downstream `source .env`. +function Write-Utf8NoBom { + param([string]$Path, [string]$Content) + $utf8NoBom = New-Object System.Text.UTF8Encoding($false) + [System.IO.File]::WriteAllText($Path, $Content, $utf8NoBom) +} + +$skipUpdateDefaults = $false + +if ($createEnv) { + $configureNow = Read-Host "Would you like to configure variables interactively now? (Y/n)" + if ($configureNow -notmatch '^[Nn]$') { + Write-Host "" + + # CONTENTUNDERSTANDING_ENDPOINT + $endpoint = Read-Host " CONTENTUNDERSTANDING_ENDPOINT (e.g., https://.services.ai.azure.com/)" + + # Auth method + Write-Host " Authentication:" + Write-Host " (A) DefaultAzureCredential via 'az login' (recommended)" + Write-Host " (B) API Key" + $authMode = Read-Host " Choose [A/b]" + $apiKey = '' + if ($authMode -match '^[Bb]$') { + $apiKey = Read-Host " CONTENTUNDERSTANDING_KEY" + } else { + Write-Host " [INFO] Using DefaultAzureCredential - make sure to run 'az login'" + } + + # Probe existing model defaults on the Foundry resource before prompting + $gpt41 = ''; $gpt41mini = ''; $embedding = '' + if ($endpoint) { + Write-Host "" + Write-Host " Probing existing model defaults on the Foundry resource..." + $probeEndpoint = $endpoint.TrimEnd('/') + $headers = @{} + $probeOk = $false + $skipProbe = $false + try { + if ($apiKey) { + $headers['Ocp-Apim-Subscription-Key'] = $apiKey + } else { + $azCmd = Get-Command az -ErrorAction SilentlyContinue + if (-not $azCmd) { + Write-Host " [WARN] Azure CLI ('az') not found; cannot acquire token. Continuing with manual entry." + $skipProbe = $true + } else { + $tokenJson = az account get-access-token --resource https://cognitiveservices.azure.com 2>$null | ConvertFrom-Json + if ($tokenJson -and $tokenJson.accessToken) { + $headers['Authorization'] = "Bearer $($tokenJson.accessToken)" + } else { + Write-Host " [WARN] Probe unavailable (no token from 'az account get-access-token')." + Write-Host " Run 'az login' and ensure Cognitive Services User role. Continuing with manual entry." + $skipProbe = $true + } + } + } + if (-not $skipProbe) { + # -TimeoutSec guards against the script hanging when the + # endpoint is unreachable (DNS, TLS, network outage). + $resp = Invoke-RestMethod ` + -Uri "$probeEndpoint/contentunderstanding/defaults?api-version=2025-11-01" ` + -Headers $headers ` + -TimeoutSec 15 ` + -ErrorAction Stop + $probeOk = $true + } + } catch { + # $_.Exception.Response is null for transport-layer failures + # (DNS, TLS, timeout). Guard before dereferencing. + $statusCode = $null + if ($_.Exception.Response) { + try { $statusCode = [int]$_.Exception.Response.StatusCode } catch { $statusCode = $null } + } + if ($statusCode -eq 401 -or $statusCode -eq 403) { + Write-Host " [WARN] Probe unavailable (authentication failed)." + Write-Host " If you're using DefaultAzureCredential, run 'az login' and ensure" + Write-Host " the Cognitive Services User role is assigned. Continuing with manual entry." + } else { + Write-Host " [WARN] Probe failed: $($_.Exception.Message). Continuing with manual entry." + } + } + + if ($probeOk -and $resp.modelDeployments) { + $md = $resp.modelDeployments + $gpt41 = if ($md.'gpt-4.1') { $md.'gpt-4.1' } else { '' } + $gpt41mini = if ($md.'gpt-4.1-mini') { $md.'gpt-4.1-mini' } else { '' } + $embedding = if ($md.'text-embedding-3-large') { $md.'text-embedding-3-large' } else { '' } + + if ($gpt41 -and $gpt41mini -and $embedding) { + Write-Host " [OK] Detected existing defaults:" + Write-Host " gpt-4.1 = $gpt41" + Write-Host " gpt-4.1-mini = $gpt41mini" + Write-Host " text-embedding-3-large = $embedding" + $useDetected = Read-Host " Use these detected values? (Y/n)" + if ($useDetected -notmatch '^[Nn]$') { + $skipUpdateDefaults = $true + } else { + $gpt41 = ''; $gpt41mini = ''; $embedding = '' + } + } elseif ($gpt41 -or $gpt41mini -or $embedding) { + Write-Host " [INFO] Partial defaults detected; missing entries will be prompted below." + } else { + Write-Host " [INFO] No existing defaults detected; continuing with manual entry." + } + } + } + + Write-Host "" + Write-Host " Model deployment configuration (for Sample00_UpdateDefaults):" + + if (-not $gpt41) { + $gpt41 = Read-Host " GPT_4_1_DEPLOYMENT (default: gpt-4.1)" + if (-not $gpt41) { $gpt41 = 'gpt-4.1' } + } else { + Write-Host " [OK] Using detected GPT_4_1_DEPLOYMENT=$gpt41" + } + + if (-not $gpt41mini) { + $gpt41mini = Read-Host " GPT_4_1_MINI_DEPLOYMENT (default: gpt-4.1-mini)" + if (-not $gpt41mini) { $gpt41mini = 'gpt-4.1-mini' } + } else { + Write-Host " [OK] Using detected GPT_4_1_MINI_DEPLOYMENT=$gpt41mini" + } + + if (-not $embedding) { + $embedding = Read-Host " TEXT_EMBEDDING_3_LARGE_DEPLOYMENT (default: text-embedding-3-large)" + if (-not $embedding) { $embedding = 'text-embedding-3-large' } + } else { + Write-Host " [OK] Using detected TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=$embedding" + } + + # Cross-resource copy + $wantCopy = Read-Host " Configure cross-resource copy vars for Sample15? (y/N)" + $srcRid = ''; $srcRegion = ''; $tgtEp = ''; $tgtKey = ''; $tgtRid = ''; $tgtRegion = '' + if ($wantCopy -match '^[Yy]$') { + $srcRid = Read-Host " Source resource ID" + $srcRegion = Read-Host " Source region (e.g., eastus)" + $tgtEp = Read-Host " Target endpoint" + $tgtKey = Read-Host " Target API key (blank = DefaultAzureCredential)" + $tgtRid = Read-Host " Target resource ID" + $tgtRegion = Read-Host " Target region (e.g., swedencentral)" + } + + # Build .env content with safely-quoted values + $endpointQ = Format-EnvValue $endpoint + $apiKeyQ = Format-EnvValue $apiKey + $gpt41Q = Format-EnvValue $gpt41 + $gpt41miniQ = Format-EnvValue $gpt41mini + $embeddingQ = Format-EnvValue $embedding + + $envContent = @" +# Azure AI Content Understanding - Environment Variables +# Generated by cu-sdk-setup/scripts/setup_user_env.ps1 + +# Required: Your Microsoft Foundry resource endpoint +CONTENTUNDERSTANDING_ENDPOINT=$endpointQ + +# Optional: API key (leave empty to use DefaultAzureCredential via az login) +CONTENTUNDERSTANDING_KEY=$apiKeyQ + +# Model deployment names (used by Sample00_UpdateDefaults) +GPT_4_1_DEPLOYMENT=$gpt41Q +GPT_4_1_MINI_DEPLOYMENT=$gpt41miniQ +TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=$embeddingQ +"@ + + if ($wantCopy -match '^[Yy]$') { + $srcRidQ = Format-EnvValue $srcRid + $srcRegionQ = Format-EnvValue $srcRegion + $tgtEpQ = Format-EnvValue $tgtEp + $tgtKeyQ = Format-EnvValue $tgtKey + $tgtRidQ = Format-EnvValue $tgtRid + $tgtRegionQ = Format-EnvValue $tgtRegion + $envContent += @" + +# Cross-resource copy settings (only for Sample15_GrantCopyAuth) +CONTENTUNDERSTANDING_SOURCE_RESOURCE_ID=$srcRidQ +CONTENTUNDERSTANDING_SOURCE_REGION=$srcRegionQ +CONTENTUNDERSTANDING_TARGET_ENDPOINT=$tgtEpQ +CONTENTUNDERSTANDING_TARGET_KEY=$tgtKeyQ +CONTENTUNDERSTANDING_TARGET_RESOURCE_ID=$tgtRidQ +CONTENTUNDERSTANDING_TARGET_REGION=$tgtRegionQ +"@ + } + + Write-Utf8NoBom -Path $envFile -Content $envContent + Write-Host " [OK] Wrote $envFile" + } else { + $templateContent = @' +# Azure AI Content Understanding - Environment Variables +# Fill in your values below. + +# Required: Your Microsoft Foundry resource endpoint +CONTENTUNDERSTANDING_ENDPOINT=https://.services.ai.azure.com/ + +# Optional: API key (leave empty to use DefaultAzureCredential via az login) +CONTENTUNDERSTANDING_KEY= + +# Model deployment names (used by Sample00_UpdateDefaults) +GPT_4_1_DEPLOYMENT=gpt-4.1 +GPT_4_1_MINI_DEPLOYMENT=gpt-4.1-mini +TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=text-embedding-3-large +'@ + Write-Utf8NoBom -Path $envFile -Content $templateContent + Write-Host " [OK] Wrote template to $envFile - please edit it before running samples." + } +} +Write-Host "" + +# Generate a tiny loader helper next to .env so users (and Copilot) don't have +# to remember a fragile one-liner. Strips matching surrounding single/double +# quotes and un-escapes '\'' for single-quoted values. +# +# Skip overwrite if a load-env.ps1 already exists AND it is not the one we +# generated previously (identified by the fingerprint marker on the first +# non-shebang line). This protects user customisations from being clobbered. +$loaderPath = Join-Path $PackageRoot 'load-env.ps1' +$loaderFingerprint = '# cu-sdk-setup-load-env-v1' +$shouldWriteLoader = $true +if (Test-Path $loaderPath) { + $firstLines = Get-Content -LiteralPath $loaderPath -TotalCount 5 -ErrorAction SilentlyContinue + if (-not ($firstLines -contains $loaderFingerprint)) { + Write-Host " [WARN] $loaderPath already exists and looks user-modified - not overwriting." + $shouldWriteLoader = $false + } +} +$loaderBody = @' +# cu-sdk-setup-load-env-v1 +# Load .env into the current PowerShell session. Generated by cu-sdk-setup. +# Usage: . ./load-env.ps1 +param([string]$EnvFile = '.env') +if (-not (Test-Path $EnvFile)) { + Write-Error "$EnvFile not found in $(Get-Location)" + return +} +Get-Content -LiteralPath $EnvFile | ForEach-Object { + $line = $_ + if ($line -match '^\s*#') { return } + if ($line -notmatch '^\s*([^=\s]+)\s*=(.*)$') { return } + $name = $Matches[1] + $val = $Matches[2] + if ($val -match "^'(.*)'$") { + $val = $Matches[1] -replace "'\\''", "'" + } elseif ($val -match '^"(.*)"$') { + $val = $Matches[1] + } + [System.Environment]::SetEnvironmentVariable($name, $val, 'Process') +} +'@ +if ($shouldWriteLoader) { + Write-Utf8NoBom -Path $loaderPath -Content $loaderBody +} + +# Summary +Write-Host "=== Setup Complete ===" +Write-Host "" +Write-Host "Next steps:" +Write-Host "" +Write-Host " 1. Load .env into your current shell (Java reads System.getenv, so this is REQUIRED):" +Write-Host " cd $PackageRoot" +if ($IsWindows -or $PSVersionTable.PSEdition -eq 'Desktop') { + Write-Host " . ./load-env.ps1 # PowerShell (uses generated $loaderPath)" +} else { + Write-Host " set -a && source .env && set +a # (in bash)" + Write-Host " . ./load-env.ps1 # (in PowerShell)" +} +Write-Host "" +if ($skipUpdateDefaults) { + Write-Host " 2. Model defaults already configured on your Foundry resource; skip Sample00_UpdateDefaults." +} else { + Write-Host " 2. (One-time per Foundry resource) Configure model defaults:" + Write-Host ' mvn exec:java \' + Write-Host ' -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample00_UpdateDefaults" \' + Write-Host ' -Dexec.classpathScope=test -Djacoco.skip=true -q' +} +Write-Host "" +Write-Host " 3. Run a sample:" +Write-Host ' mvn exec:java \' +Write-Host ' -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrl" \' +Write-Host ' -Dexec.classpathScope=test -Djacoco.skip=true -q' +Write-Host "" diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/scripts/setup_user_env.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/scripts/setup_user_env.sh index ae13ec0edd36..8f03be6042d4 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/scripts/setup_user_env.sh +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/scripts/setup_user_env.sh @@ -1,6 +1,7 @@ #!/bin/bash # Setup script for Azure AI Content Understanding Java SDK users # This script sets up the environment for running samples (JDK + Maven based). +# cspell:ignore esac set -e @@ -14,41 +15,178 @@ echo "" cd "$PACKAGE_ROOT" -# Step 1: Verify toolchain -echo "Step 1: Verifying toolchain..." -if ! command -v java >/dev/null 2>&1; then - echo " ✗ 'java' not found in PATH. Install JDK 8+ (e.g., Microsoft Build of OpenJDK or Temurin)." - exit 1 -fi -if ! command -v mvn >/dev/null 2>&1; then - echo " ✗ 'mvn' not found in PATH. Install Apache Maven 3.6+." - exit 1 -fi -echo " ✓ Java: $(java -version 2>&1 | head -n 1)" -echo " ✓ Maven: $(mvn -version 2>&1 | head -n 1)" +# --- helper: offer to install JDK/Maven via the platform's package manager --- +# Usage: offer_install_tool +# tool: "jdk" | "maven" +# Returns 0 if install ran successfully (caller should re-probe), non-zero if +# the user declined, the platform isn't supported, or the install failed. +offer_install_tool() { + local tool="$1" + local os_name + os_name="$(uname -s)" + local cmd="" + + case "$os_name" in + Darwin) + if ! command -v brew >/dev/null 2>&1; then + echo " (Homebrew not found — install it first: https://brew.sh/)" + return 1 + fi + case "$tool" in + jdk) cmd="brew install openjdk@21" ;; + maven) cmd="brew install maven" ;; + esac + ;; + Linux) + if ! command -v apt-get >/dev/null 2>&1; then + echo " (No apt-get detected — install JDK/Maven with your distro's package manager.)" + return 1 + fi + case "$tool" in + jdk) cmd="sudo apt-get update && sudo apt-get install -y openjdk-21-jdk" ;; + maven) cmd="sudo apt-get update && sudo apt-get install -y maven" ;; + esac + ;; + *) + echo " (Unsupported platform for auto-install: $os_name)" + return 1 + ;; + esac + + echo "" + echo " This script can run the following command for you:" + echo " $cmd" + local reply="" + read -r -p " Run it now? (y/N): " reply || reply="n" + if [[ ! "$reply" =~ ^[Yy]$ ]]; then + echo " Please run it yourself, then re-run this script." + return 1 + fi + if ! eval "$cmd"; then + echo " ✗ Installation command failed." + return 1 + fi + echo " ✓ Installation complete. Re-probing..." + hash -r 2>/dev/null || true + return 0 +} + +# Step 0: Prerequisites check (JDK 8+ and Maven 3.6+) +echo "Step 0: Checking prerequisites..." +attempt=1 +while :; do + fail_reason="" + need_tool="" + + if ! command -v java >/dev/null 2>&1; then + echo " ✗ 'java' not found on PATH." + fail_reason="missing" + need_tool="jdk" + else + java_ver_line="$(java -version 2>&1 | head -1)" + java_ver="$(echo "$java_ver_line" | sed -n 's/.*version[[:space:]]*"\{0,1\}\([0-9][0-9.]*\).*/\1/p')" + java_major="${java_ver%%.*}" + # Handle 1.x style versions (JDK 8 reports as 1.8) + if [ "$java_major" = "1" ]; then + java_major="$(echo "$java_ver" | cut -d. -f2)" + fi + # Strict numeric check — fail loudly when parsing fails instead of + # silently treating it as "version OK". + if ! printf '%s' "$java_major" | grep -qE '^[0-9]+$'; then + echo " ✗ Could not parse Java major version from '$java_ver_line'." + fail_reason="missing" + need_tool="jdk" + elif [ "$java_major" -lt 8 ]; then + echo " ✗ Found Java '$java_ver_line', need JDK 8+." + fail_reason="too_old" + need_tool="jdk" + fi + fi + + if [ -z "$fail_reason" ]; then + if ! command -v mvn >/dev/null 2>&1; then + echo " ✗ 'mvn' not found on PATH." + fail_reason="missing" + need_tool="maven" + else + mvn_ver_line="$(mvn -version 2>&1 | head -1)" + mvn_ver="$(echo "$mvn_ver_line" | sed -n 's/.*Maven \([0-9][0-9.]*\).*/\1/p')" + mvn_major="$(echo "$mvn_ver" | cut -d. -f1)" + mvn_minor="$(echo "$mvn_ver" | cut -d. -f2)" + if ! printf '%s' "$mvn_major" | grep -qE '^[0-9]+$' || \ + ! printf '%s' "${mvn_minor:-0}" | grep -qE '^[0-9]+$'; then + echo " ✗ Could not parse Maven version from '$mvn_ver_line'." + fail_reason="missing" + need_tool="maven" + elif [ "$mvn_major" -lt 3 ] || { [ "$mvn_major" -eq 3 ] && [ "${mvn_minor:-0}" -lt 6 ]; }; then + echo " ✗ Found Maven '$mvn_ver', need 3.6+." + fail_reason="too_old" + need_tool="maven" + fi + fi + fi + + if [ -z "$fail_reason" ]; then + echo " ✓ Java: $java_ver_line" + echo " ✓ Maven: $mvn_ver_line" + break + fi + + if [ "$attempt" -ge 2 ]; then + echo " ✗ Prerequisites still not satisfied after install attempt. Aborting." + exit 1 + fi + if ! offer_install_tool "$need_tool"; then + exit 1 + fi + attempt=$((attempt + 1)) +done echo "" -# Step 2: Install SDK dependencies -echo "Step 2: Installing SDK dependencies..." -read -p "Installation mode — (A) Download deps only (recommended) | (B) Local build from source [A/b]: " install_mode -install_mode="${install_mode:-A}" -if [[ "$install_mode" =~ ^[Bb]$ ]]; then - echo " Running: mvn install -DskipTests -Djacoco.skip=true" - mvn install -DskipTests -Djacoco.skip=true -q +# Marker written after a successful dependency resolution / install. Used to +# avoid reprompting on repeat runs. Removed by `mvn clean`, so a clean tree +# correctly triggers re-resolution. POM mtime is also tracked so a changed +# pom.xml invalidates the marker. +DEPS_MARKER="target/.cu-setup-deps-ok" + +deps_marker_valid() { + [ -f "$DEPS_MARKER" ] || return 1 + [ -f "pom.xml" ] || return 0 + # Marker must be at least as new as pom.xml + if [ "pom.xml" -nt "$DEPS_MARKER" ]; then + return 1 + fi + return 0 +} + +# Step 1: Install SDK dependencies +echo "Step 1: Installing SDK dependencies..." +if deps_marker_valid; then + echo " ✓ Dependencies already resolved (marker $DEPS_MARKER present and up-to-date); skipping" + echo " To force re-resolution: rm $DEPS_MARKER (or run 'mvn clean')" else - echo " Running: mvn dependency:resolve" - mvn dependency:resolve -q + read -r -p " Installation mode — (A) Download deps only (recommended) | (B) Local build from source [A/b]: " install_mode || install_mode="A" + install_mode="${install_mode:-A}" + if [[ "$install_mode" =~ ^[Bb]$ ]]; then + echo " Running: mvn install -DskipTests -Djacoco.skip=true" + mvn install -DskipTests -Djacoco.skip=true -q + else + echo " Running: mvn dependency:resolve" + mvn dependency:resolve -q + fi + mkdir -p target + : > "$DEPS_MARKER" + echo " ✓ Dependencies ready" fi -echo " ✓ Dependencies ready" echo "" -# Step 3: Configure .env file -echo "Step 3: Configuring .env file..." +# Step 2: Configure .env file +echo "Step 2: Configuring .env file..." ENV_FILE="$PACKAGE_ROOT/.env" if [ -f "$ENV_FILE" ]; then echo " ⚠ .env file already exists — NOT overwriting." echo " To start fresh, delete it manually: rm \"$ENV_FILE\"" - read -p " Continue with existing .env? (Y/n): " keep_env + read -r -p " Continue with existing .env? (Y/n): " keep_env || keep_env="" if [[ "$keep_env" =~ ^[Nn]$ ]]; then echo " Aborting. Remove .env and re-run this script." exit 1 @@ -58,44 +196,159 @@ else CREATE_ENV=true fi +# Escape a value for safe inclusion in a .env file consumed by +# `set -a && source .env && set +a` in bash. Wraps in single quotes +# and escapes internal single quotes as '\''. +# +# Contract (must stay in sync with load-env.ps1): +# - Every value written by this script is wrapped in single quotes. +# - Internal single quotes are encoded as the 4-char sequence: '\'' +# - bash `source .env` strips the wrapping quotes natively. +# - PowerShell load-env.ps1 strips the wrapping quotes and reverses the +# '\'' escape on read. +escape_env_val() { + local v="$1" + # bash native parameter expansion: replace each ' with '\'' + local escaped=${v//\'/\'\\\'\'} + printf "'%s'" "$escaped" +} + +skip_update_defaults=0 + if [ "$CREATE_ENV" = true ]; then - read -p "Would you like to configure variables interactively now? (Y/n): " configure_now + read -r -p "Would you like to configure variables interactively now? (Y/n): " configure_now || configure_now="Y" configure_now="${configure_now:-Y}" if [[ "$configure_now" =~ ^[Yy]$ ]]; then + echo "" + # CONTENTUNDERSTANDING_ENDPOINT - read -p " CONTENTUNDERSTANDING_ENDPOINT (e.g., https://.services.ai.azure.com/): " endpoint + read -r -p " CONTENTUNDERSTANDING_ENDPOINT (e.g., https://.services.ai.azure.com/): " endpoint || endpoint="" # Auth method echo " Authentication:" echo " (A) DefaultAzureCredential via 'az login' (recommended)" echo " (B) API Key" - read -p " Choose [A/b]: " auth_mode + read -r -p " Choose [A/b]: " auth_mode || auth_mode="A" auth_mode="${auth_mode:-A}" api_key="" if [[ "$auth_mode" =~ ^[Bb]$ ]]; then - read -p " CONTENTUNDERSTANDING_KEY: " api_key + read -r -p " CONTENTUNDERSTANDING_KEY: " api_key || api_key="" else echo " ℹ Using DefaultAzureCredential — remember to run 'az login' before invoking samples." fi - # Deployment names - read -p " GPT_4_1_DEPLOYMENT (default: gpt-4.1): " gpt41 - gpt41="${gpt41:-gpt-4.1}" - read -p " GPT_4_1_MINI_DEPLOYMENT (default: gpt-4.1-mini): " gpt41mini - gpt41mini="${gpt41mini:-gpt-4.1-mini}" - read -p " TEXT_EMBEDDING_3_LARGE_DEPLOYMENT (default: text-embedding-3-large): " embedding - embedding="${embedding:-text-embedding-3-large}" + # Probe existing model defaults on the Foundry resource before prompting. + # Uses curl to call the defaults API directly. + gpt41="" + gpt41mini="" + embedding="" + if [ -n "$endpoint" ]; then + echo "" + echo " Probing existing model defaults on the Foundry resource..." + probe_endpoint="${endpoint%/}" + # --connect-timeout / --max-time guard against the script hanging + # when the user provided a wrong/unreachable endpoint. + curl_opts=(--silent --show-error --connect-timeout 5 --max-time 15 -w "\n%{http_code}") + set +e + if [ -n "$api_key" ]; then + probe_response=$(curl "${curl_opts[@]}" \ + -H "Ocp-Apim-Subscription-Key: $api_key" \ + "$probe_endpoint/contentunderstanding/defaults?api-version=2025-11-01" 2>/dev/null) + else + token=$(az account get-access-token --resource https://cognitiveservices.azure.com --query accessToken -o tsv 2>/dev/null) + if [ -z "$token" ]; then + probe_response=$'\n403' + else + probe_response=$(curl "${curl_opts[@]}" \ + -H "Authorization: Bearer $token" \ + "$probe_endpoint/contentunderstanding/defaults?api-version=2025-11-01" 2>/dev/null) + fi + fi + curl_rc=$? + set -e + + if [ "$curl_rc" -ne 0 ]; then + # curl failed at the network/transport layer (DNS, TLS, timeout, ...) + http_code="000" + body="" + else + http_code=$(printf '%s' "$probe_response" | tail -n1) + # Strip the trailing http_code line. Use awk to drop the last + # newline-delimited record, which is robust regardless of body size. + body=$(printf '%s' "$probe_response" | awk 'NR>1{print prev} {prev=$0}') + fi + + if [ "$http_code" = "200" ]; then + # Parse modelDeployments from JSON using grep/sed (no jq dependency) + gpt41=$(echo "$body" | grep -o '"gpt-4\.1"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/.*: *"//;s/"//' | head -1) + gpt41mini=$(echo "$body" | grep -o '"gpt-4\.1-mini"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/.*: *"//;s/"//' | head -1) + embedding=$(echo "$body" | grep -o '"text-embedding-3-large"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/.*: *"//;s/"//' | head -1) + + if [ -n "$gpt41" ] && [ -n "$gpt41mini" ] && [ -n "$embedding" ]; then + echo " ✓ Detected existing defaults:" + echo " gpt-4.1 = $gpt41" + echo " gpt-4.1-mini = $gpt41mini" + echo " text-embedding-3-large = $embedding" + read -r -p " Use these detected values? (Y/n): " use_detected || use_detected="y" + if [[ ! "$use_detected" =~ ^[Nn]$ ]]; then + skip_update_defaults=1 + else + gpt41=""; gpt41mini=""; embedding="" + fi + elif [ -n "$gpt41" ] || [ -n "$gpt41mini" ] || [ -n "$embedding" ]; then + echo " ℹ Partial defaults detected; missing entries will be prompted below." + else + echo " ℹ No existing defaults detected; continuing with manual entry." + fi + elif [ "$http_code" = "401" ] || [ "$http_code" = "403" ]; then + echo " ⚠ Probe unavailable (authentication failed)." + echo " If you're using DefaultAzureCredential, run 'az login' and ensure" + echo " the Cognitive Services User role is assigned. Continuing with manual entry." + elif [ "$http_code" = "000" ]; then + echo " ⚠ Probe failed (network error / timeout / unreachable endpoint);" + echo " continuing with manual entry. Double-check CONTENTUNDERSTANDING_ENDPOINT." + else + echo " ⚠ Probe failed (HTTP $http_code); continuing with manual entry." + fi + fi + + echo "" + echo " Model deployment configuration (for Sample00_UpdateDefaults):" + + # GPT_4_1_DEPLOYMENT + if [ -z "$gpt41" ]; then + read -r -p " GPT_4_1_DEPLOYMENT (default: gpt-4.1): " gpt41 || gpt41="" + gpt41="${gpt41:-gpt-4.1}" + else + echo " ✓ Using detected GPT_4_1_DEPLOYMENT=$gpt41" + fi + + # GPT_4_1_MINI_DEPLOYMENT + if [ -z "$gpt41mini" ]; then + read -r -p " GPT_4_1_MINI_DEPLOYMENT (default: gpt-4.1-mini): " gpt41mini || gpt41mini="" + gpt41mini="${gpt41mini:-gpt-4.1-mini}" + else + echo " ✓ Using detected GPT_4_1_MINI_DEPLOYMENT=$gpt41mini" + fi + + # TEXT_EMBEDDING_3_LARGE_DEPLOYMENT + if [ -z "$embedding" ]; then + read -r -p " TEXT_EMBEDDING_3_LARGE_DEPLOYMENT (default: text-embedding-3-large): " embedding || embedding="" + embedding="${embedding:-text-embedding-3-large}" + else + echo " ✓ Using detected TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=$embedding" + fi # Cross-resource copy - read -p " Configure cross-resource copy vars for Sample15? (y/N): " want_copy + read -r -p " Configure cross-resource copy vars for Sample15? (y/N): " want_copy || want_copy="" src_rid=""; src_region=""; tgt_ep=""; tgt_key=""; tgt_rid=""; tgt_region="" if [[ "$want_copy" =~ ^[Yy]$ ]]; then - read -p " Source resource ID: " src_rid - read -p " Source region (e.g., eastus): " src_region - read -p " Target endpoint: " tgt_ep - read -p " Target API key (blank = DefaultAzureCredential): " tgt_key - read -p " Target resource ID: " tgt_rid - read -p " Target region (e.g., swedencentral): " tgt_region + read -r -p " Source resource ID: " src_rid || src_rid="" + read -r -p " Source region (e.g., eastus): " src_region || src_region="" + read -r -p " Target endpoint: " tgt_ep || tgt_ep="" + read -r -p " Target API key (blank = DefaultAzureCredential): " tgt_key || tgt_key="" + read -r -p " Target resource ID: " tgt_rid || tgt_rid="" + read -r -p " Target region (e.g., swedencentral): " tgt_region || tgt_region="" fi cat > "$ENV_FILE" <> "$ENV_FILE" </dev/null; then + echo " ⚠ $LOADER_PATH already exists and looks user-modified — not overwriting." +else + cat > "$LOADER_PATH" <<'PSEOF' +# cu-sdk-setup-load-env-v1 +# Load .env into the current PowerShell session. Generated by cu-sdk-setup. +# Usage: . ./load-env.ps1 +param([string]$EnvFile = '.env') +if (-not (Test-Path $EnvFile)) { + Write-Error "$EnvFile not found in $(Get-Location)" + return +} +Get-Content -LiteralPath $EnvFile | ForEach-Object { + $line = $_ + if ($line -match '^\s*#') { return } + if ($line -notmatch '^\s*([^=\s]+)\s*=(.*)$') { return } + $name = $Matches[1] + $val = $Matches[2] + if ($val -match "^'(.*)'$") { + $val = $Matches[1] -replace "'\\''", "'" + } elseif ($val -match '^"(.*)"$') { + $val = $Matches[1] + } + [System.Environment]::SetEnvironmentVariable($name, $val, 'Process') +} +PSEOF +fi + # Summary echo "=== Setup Complete ===" echo "" @@ -156,12 +447,17 @@ echo "Next steps:" echo "" echo " 1. Load .env into your current shell (Java reads System.getenv, so this is REQUIRED):" echo " cd $PACKAGE_ROOT" -echo " set -a && source .env && set +a" +echo " set -a && source .env && set +a # bash / zsh" +echo " . ./load-env.ps1 # PowerShell" echo "" -echo " 2. (One-time per Foundry resource) Configure model defaults:" -echo " mvn exec:java \\" -echo " -Dexec.mainClass=\"com.azure.ai.contentunderstanding.samples.Sample00_UpdateDefaults\" \\" -echo " -Dexec.classpathScope=test -Djacoco.skip=true -q" +if [ "$skip_update_defaults" = "1" ]; then + echo " 2. Model defaults already configured on your Foundry resource; skip Sample00_UpdateDefaults." +else + echo " 2. (One-time per Foundry resource) Configure model defaults:" + echo " mvn exec:java \\" + echo " -Dexec.mainClass=\"com.azure.ai.contentunderstanding.samples.Sample00_UpdateDefaults\" \\" + echo " -Dexec.classpathScope=test -Djacoco.skip=true -q" +fi echo "" echo " 3. Run a sample:" echo " mvn exec:java \\" From fd5b9ced0c2a537aee37513b8922558947ad02e7 Mon Sep 17 00:00:00 2001 From: aluneth Date: Sat, 25 Apr 2026 18:40:36 +0800 Subject: [PATCH 09/19] fix: update environment variable handling and improve script comments Co-authored-by: Copilot --- .../.github/skills/cu-sdk-sample-run/SKILL.md | 2 +- .../.github/skills/cu-sdk-sample-run/scripts/run_sample.sh | 1 + .../.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh | 1 + .../.github/skills/cu-sdk-setup/scripts/setup_user_env.sh | 2 +- 4 files changed, 4 insertions(+), 2 deletions(-) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md index e530deaa02a7..27f19ffa5ac9 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md @@ -162,7 +162,7 @@ Use `--local` to force local build: > - If yes: Skip to Step 4. > - If no: Direct them to the `cu-sdk-setup` skill for interactive setup, or guide them through the steps below. -Java samples read credentials from **OS environment variables** via `System.getenv()`. Unlike Python (`dotenv`) or JavaScript (`dotenv/config`), Java does not have a built-in `.env` loader — the variables must be present in the shell environment when the JVM starts. +Java samples read credentials from **OS environment variables** via `System.getenv()`. Java does not load `.env` files automatically, so the variables must be present in the shell environment when the JVM starts. The recommended approach is to create a **`.env` file** and source it before running samples. diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh index 0c0438b898b2..7d1f21bb7a60 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh @@ -1,5 +1,6 @@ #!/usr/bin/env bash set -euo pipefail +# cspell:ignore envfile esac # run_sample.sh # Run a specific Java sample for the Azure AI Content Understanding SDK. diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh index 299059734c07..91abb34da038 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh @@ -1,5 +1,6 @@ #!/usr/bin/env bash set -euo pipefail +# cspell:ignore esac ENVEOF # setup_samples.sh # Sets up the environment for running Azure AI Content Understanding Java SDK samples. diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/scripts/setup_user_env.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/scripts/setup_user_env.sh index 8f03be6042d4..696194cb1fef 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/scripts/setup_user_env.sh +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/scripts/setup_user_env.sh @@ -1,7 +1,7 @@ #!/bin/bash # Setup script for Azure AI Content Understanding Java SDK users # This script sets up the environment for running samples (JDK + Maven based). -# cspell:ignore esac +# cspell:ignore esac PSEOF set -e From b0275cfc486d74ab5511403ed66a9f845246fb70 Mon Sep 17 00:00:00 2001 From: aluneth Date: Sat, 25 Apr 2026 19:20:09 +0800 Subject: [PATCH 10/19] fix: update logging link in README to point to the new documentation --- .../azure-ai-contentunderstanding/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md index 053b85a8af8c..e5c0a2837369 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md @@ -512,7 +512,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con [sample00]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample00_UpdateDefaults.java [sample01]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample01_AnalyzeBinary.java [sample00_update_defaults]: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample00_UpdateDefaults.java -[logging]: https://github.com/Azure/azure-sdk-for-java/wiki/Logging-in-Azure-SDK +[logging]: https://learn.microsoft.com/azure/developer/java/sdk/logging-overview [azure_core_http_client]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/core/azure-core/README.md#configuring-service-clients [azure_core_response]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/core/azure-core/README.md#accessing-http-response-details-using-responset [azure_core_lro]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/core/azure-core/README.md#long-running-operations-with-pollerfluxt From b1c4affb126325b1672fac65e3193442910453d2 Mon Sep 17 00:00:00 2001 From: aluneth Date: Sat, 25 Apr 2026 20:14:27 +0800 Subject: [PATCH 11/19] refactor: remove setup_samples.sh script and update SKILL.md for environment setup guidance --- .../.github/skills/cu-sdk-sample-run/SKILL.md | 32 +-- .../scripts/setup_samples.sh | 232 ------------------ 2 files changed, 1 insertion(+), 263 deletions(-) delete mode 100644 sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md index 27f19ffa5ac9..a2c8a228d13a 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md @@ -138,23 +138,6 @@ mvn install -DskipTests -pl sdk/contentunderstanding/azure-ai-contentunderstandi > - Missing Maven: ensure `mvn -version` works > - Parent POM not found: run `mvn install -DskipTests -f ../../parents/azure-client-sdk-parent/pom.xml` first -
-Alternative: use the setup script (optional) - -The `setup_samples.sh` script automates this — it checks Maven Central first and falls back to a local build: - -```bash -.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh -``` - -Use `--local` to force local build: - -```bash -.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh --local -``` - -
- ### Step 3: Configure Environment Variables > **[ASK USER] Configuration check:** @@ -389,20 +372,7 @@ After the sample completes, the skill **must** do the following for the user (do Helper scripts are provided in `scripts/` as a convenience. They are **not required** — you can always use `mvn exec:java` directly. -### `setup_samples.sh` -- Automated Environment Setup - -Checks Maven Central for the published package, falls back to local build, and creates a `.env` template. - -```bash -# Default: try Maven Central, fall back to local build -.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh - -# Force local build (e.g., testing local changes) -.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh --local - -# Local mode: skip build if already built -.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh --local --skip-build -``` +> **Note:** For first-time environment setup (installing JDK/Maven, building the SDK, creating `.env`), use the `cu-sdk-setup` skill. ### `run_sample.sh` -- Run a Sample with Conveniences diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh deleted file mode 100644 index 91abb34da038..000000000000 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/setup_samples.sh +++ /dev/null @@ -1,232 +0,0 @@ -#!/usr/bin/env bash -set -euo pipefail -# cspell:ignore esac ENVEOF - -# setup_samples.sh -# Sets up the environment for running Azure AI Content Understanding Java SDK samples. -# This includes: -# 1. Check Java and Maven are installed -# 2. Try resolving the SDK package from Maven Central (if published) -# 3. If not available, fall back to building locally with mvn install -# 4. Create a sample .env file if none exists -# -# Usage: -# setup_samples.sh [--local] [--skip-build] -# Options: -# --local Force local build (skip Maven Central check) -# --skip-build Skip building even in local mode (assumes already built) - -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -PACKAGE_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)" - -GROUP_ID="com.azure" -ARTIFACT_ID="azure-ai-contentunderstanding" -# Extract version from pom.xml -VERSION="$(grep -m1 '' "$PACKAGE_ROOT/pom.xml" | sed 's/.*\(.*\)<\/version>.*/\1/' | sed 's/\s*//g')" - -FORCE_LOCAL=0 -SKIP_BUILD=0 - -# Colors for output -RED='\033[0;31m' -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -BLUE='\033[0;34m' -NC='\033[0m' - -print_info() { echo -e "${BLUE}$1${NC}"; } -print_success() { echo -e "${GREEN}$1${NC}"; } -print_warning() { echo -e "${YELLOW}$1${NC}"; } -print_error() { echo -e "${RED}$1${NC}"; } - -print_help() { - cat </dev/null; then - print_error "Error: Java is not installed or not on PATH." - echo " Install JDK 8 or later: https://learn.microsoft.com/java/openjdk/download" - exit 1 -fi -JAVA_VER="$(java -version 2>&1 | head -1)" -echo " Java: $JAVA_VER" - -if ! command -v mvn &>/dev/null; then - print_error "Error: Maven is not installed or not on PATH." - echo " Install Maven: https://maven.apache.org/install.html" - exit 1 -fi -MVN_VER="$(mvn -version 2>&1 | head -1)" -echo " Maven: $MVN_VER" - -print_success "✓ Prerequisites OK" -echo "" - -# ========================================= -# Step 1: Install the SDK package -# ========================================= -# Default: check if published on Maven Central. If available, no build needed -# (Maven will download it automatically when running samples). -# If not published, fall back to local build. - -check_maven_central() { - echo "Step 1: Checking Maven Central for ${GROUP_ID}:${ARTIFACT_ID}:${VERSION}..." - local group_path="${GROUP_ID//\.//}" - local url="https://repo1.maven.org/maven2/${group_path}/${ARTIFACT_ID}/${VERSION}/${ARTIFACT_ID}-${VERSION}.pom" - - if curl -sf --head "$url" &>/dev/null; then - print_success "✓ Package is available on Maven Central" - echo " Maven will download it automatically when running samples." - return 0 - else - echo " Package not yet published on Maven Central" - return 1 - fi -} - -build_local() { - echo "Step 1: Building package locally..." - cd "$PACKAGE_ROOT" - - if [[ $SKIP_BUILD -eq 1 ]]; then - echo " Skipping build (--skip-build)" - # Verify the artifact exists in local Maven repo - local local_jar="$HOME/.m2/repository/${GROUP_ID//\.//}/${ARTIFACT_ID}/${VERSION}/${ARTIFACT_ID}-${VERSION}.jar" - if [[ -f "$local_jar" ]]; then - print_success "✓ Package found in local Maven repository" - else - print_warning "⚠ Package not found in local Maven repository: $local_jar" - echo " Run without --skip-build to build it." - fi - return 0 - fi - - echo " Building with: mvn install -DskipTests" - if mvn install -DskipTests; then - print_success "✓ Package built and installed to local Maven repository" - else - print_error "Error: Build failed." - echo " Common fixes:" - echo " - Ensure JDK 8+ is installed: java -version" - echo " - Build parent POM first: mvn install -DskipTests -f ../../parents/azure-client-sdk-parent/pom.xml" - exit 1 - fi -} - -if [[ $FORCE_LOCAL -eq 1 ]]; then - echo "(--local flag: skipping Maven Central check, building locally)" - echo "" - build_local -else - if ! check_maven_central; then - echo " Falling back to local build..." - echo "" - build_local - fi -fi -echo "" - -# ========================================= -# Step 2: Create .env file if needed -# ========================================= -echo "Step 2: Checking .env file..." -ENV_FILE="$PACKAGE_ROOT/.env" - -if [[ -f "$ENV_FILE" ]]; then - print_success "✓ .env file already exists at $ENV_FILE" -else - print_info "Creating sample .env file..." - cat > "$ENV_FILE" <<'ENVEOF' -# Azure AI Content Understanding - Environment Variables -# Fill in your values below. See SKILL.md for details. - -# Required: Your Microsoft Foundry resource endpoint -CONTENTUNDERSTANDING_ENDPOINT=https://your-foundry.services.ai.azure.com/ - -# Optional: API key (leave empty to use DefaultAzureCredential via az login) -CONTENTUNDERSTANDING_KEY= - -# Model deployment names (used by Sample00_UpdateDefaults) -GPT_4_1_DEPLOYMENT=gpt-4.1 -GPT_4_1_MINI_DEPLOYMENT=gpt-4.1-mini -TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=text-embedding-3-large - -# Cross-resource copy (only needed for Sample15_GrantCopyAuth) -# CONTENTUNDERSTANDING_TARGET_ENDPOINT=https://your-target-foundry.services.ai.azure.com/ -# CONTENTUNDERSTANDING_TARGET_RESOURCE_ID=/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.CognitiveServices/accounts/{targetAccountName} -ENVEOF - print_success "✓ Created .env file at $ENV_FILE" - print_warning "⚠ Please edit $ENV_FILE and fill in your actual values before running samples" -fi -echo "" - -echo "=========================================" -echo "✓ Setup complete!" -echo "=========================================" -echo "" -echo "Next steps:" -echo " 1. Edit .env with your endpoint and credentials (if not done already):" -echo " $ENV_FILE" -echo "" -echo " 2. Run a sample:" -echo " cd $PACKAGE_ROOT" -echo " .github/skills/cu-sdk-sample-run/scripts/run_sample.sh Sample02_AnalyzeUrl --env .env" -echo "" -echo " Or export variables manually and use Maven directly:" -echo " export CONTENTUNDERSTANDING_ENDPOINT=\"https://...\"" -echo " mvn exec:java -Dexec.mainClass=\"com.azure.ai.contentunderstanding.samples.Sample02_AnalyzeUrl\"" -echo "" From 997e2bcd67c99a0c405ee643f8fc4bd9a36ecb42 Mon Sep 17 00:00:00 2001 From: aluneth Date: Sun, 26 Apr 2026 15:26:26 +0800 Subject: [PATCH 12/19] feat: add Sample07_ListAnalyzers to SKILL.md for analyzer enumeration --- .../.github/skills/cu-sdk-sample-run/SKILL.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md index a2c8a228d13a..9c3b22c0c40a 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md @@ -84,6 +84,10 @@ Builds analyzers with training labels (labeled data from Azure Blob Storage). #### `Sample06_GetAnalyzer` Retrieves analyzer details and configuration. +#### `Sample07_ListAnalyzers` +Lists all analyzers in the Content Understanding resource. +- Key concepts: Paginated listing, analyzer enumeration + #### `Sample08_UpdateAnalyzer` Updates analyzer description and tags. From efce265ea96809d97ca170c3386a18f47863924d Mon Sep 17 00:00:00 2001 From: Changjian Wang Date: Fri, 8 May 2026 10:26:02 +0800 Subject: [PATCH 13/19] Address PR review feedback for cu-sdk-sample-run skill - run_sample.sh: parse .env without eval; only accept NAME=VALUE assignments and warn on malformed lines, removing the command-injection risk. - run_sample.sh: run 'mvn -DskipTests test-compile exec:java' so sample classes are compiled on a clean checkout before being executed. - SKILL.md: soften the 'must build from repo root' wording; building from the package directory works for most users since the parent POM resolves via relativePath and dependencies come from published artifacts. --- .../.github/skills/cu-sdk-sample-run/SKILL.md | 4 +- .../cu-sdk-sample-run/scripts/run_sample.sh | 41 +++++++++++++++---- 2 files changed, 34 insertions(+), 11 deletions(-) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md index 9c3b22c0c40a..5263f57f5b1d 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md @@ -123,14 +123,14 @@ cd sdk/contentunderstanding/azure-ai-contentunderstanding The SDK package must be available for Maven to resolve. It will be published to **Maven Central** — if it's already available there, Maven will download it automatically and you can **skip this step**. -If the package is **not yet published** (or you want to test local changes), build and install it to your local Maven repository. **Run from the azure-sdk-for-java repo root:** +If the package is **not yet published** (or you want to test local changes), build and install it to your local Maven repository. The recommended command (run from the azure-sdk-for-java repo root) is: ```bash cd ~/repos/azure-sdk-for-java # or wherever you cloned the repo mvn install -DskipTests -pl sdk/contentunderstanding/azure-ai-contentunderstanding -am ``` -> **Important:** You must build from the repo root with `-pl` and `-am` flags. Building from within the package directory will fail because in-repo dependencies cannot be resolved without the `-am` (also-make) flag. +> **Tip:** Building from the repo root with `-pl ... -am` is preferred when you are contributing across modules or testing in-repo dependency changes (e.g., a local `azure-core` patch). For most users, `mvn install -DskipTests` from within `sdk/contentunderstanding/azure-ai-contentunderstanding` also works, since this module's parent POM is resolved via `relativePath` and its runtime dependencies (e.g., `azure-core`) come from published artifacts. > **[ASK USER] Build check:** > Ask: "Is the package already published on Maven Central, or do you need to build locally?" diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh index 7d1f21bb7a60..261434d27a3e 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh @@ -84,7 +84,14 @@ list_samples() { echo "" } -# Load environment variables from a .env file +# Load environment variables from a .env file. +# +# Only simple NAME=VALUE assignments are accepted (with an optional leading +# `export `). Names must be valid shell identifiers ([A-Za-z_][A-Za-z0-9_]*). +# A single matching pair of surrounding double or single quotes is stripped +# from the value. Anything else is skipped with a warning. We deliberately +# avoid `eval` so a malicious or malformed .env file cannot execute arbitrary +# commands or trigger command substitution. load_env_file() { local envfile="$1" if [[ ! -f "$envfile" ]]; then @@ -92,15 +99,29 @@ load_env_file() { exit 1 fi print_info "Loading environment variables from: $envfile" - set -o allexport - # Read .env, skip comments and blank lines + local line name value lineno=0 while IFS= read -r line || [[ -n "$line" ]]; do + lineno=$((lineno + 1)) # Skip empty lines and comments [[ -z "$line" || "$line" =~ ^[[:space:]]*# ]] && continue - # Remove surrounding quotes from values - eval "export $line" 2>/dev/null || true + # Strip optional leading `export ` (with surrounding whitespace) + line="${line#"${line%%[![:space:]]*}"}" # ltrim + line="${line#export }" + # Require NAME=VALUE with a valid identifier on the left + if [[ ! "$line" =~ ^([A-Za-z_][A-Za-z0-9_]*)=(.*)$ ]]; then + print_warning " Skipping line $lineno (not a NAME=VALUE assignment)" + continue + fi + name="${BASH_REMATCH[1]}" + value="${BASH_REMATCH[2]}" + # Strip a single matching pair of surrounding double or single quotes + if [[ "$value" =~ ^\"(.*)\"$ ]]; then + value="${BASH_REMATCH[1]}" + elif [[ "$value" =~ ^\'(.*)\'$ ]]; then + value="${BASH_REMATCH[1]}" + fi + export "$name=$value" done < "$envfile" - set +o allexport print_success "✓ Environment variables loaded" } @@ -196,8 +217,10 @@ if [[ -z "${CONTENTUNDERSTANDING_ENDPOINT:-}" ]]; then echo "" fi -# Build command -MVN_CMD="mvn exec:java -Dexec.mainClass=\"${FULL_CLASS}\" -Dexec.classpathScope=test" +# Build command. Sample classes live under src/samples/java and are compiled +# as test sources, so we must run test-compile before exec:java; otherwise on +# a clean checkout the sample class will not exist on the classpath. +MVN_CMD="mvn -DskipTests test-compile exec:java -Dexec.mainClass=\"${FULL_CLASS}\" -Dexec.classpathScope=test" if [[ $DRY_RUN -eq 1 ]]; then echo "DRY RUN: would execute:" @@ -210,7 +233,7 @@ fi # Run the sample print_info "Running: $SAMPLE_NAME" echo "" -mvn exec:java -Dexec.mainClass="${FULL_CLASS}" -Dexec.classpathScope=test +mvn -DskipTests test-compile exec:java -Dexec.mainClass="${FULL_CLASS}" -Dexec.classpathScope=test echo "" print_success "✓ Sample completed: $SAMPLE_NAME" From d83299ab7d2e04666726d98935e892e549944e19 Mon Sep 17 00:00:00 2001 From: Changjian Wang Date: Fri, 8 May 2026 11:01:22 +0800 Subject: [PATCH 14/19] Fix cspell error in run_sample.sh Replace 'ltrim' inline comment with 'strip leading whitespace' so cspell no longer flags it as an unknown word. --- .../.github/skills/cu-sdk-sample-run/scripts/run_sample.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh index 261434d27a3e..75cf05210e37 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh @@ -105,7 +105,7 @@ load_env_file() { # Skip empty lines and comments [[ -z "$line" || "$line" =~ ^[[:space:]]*# ]] && continue # Strip optional leading `export ` (with surrounding whitespace) - line="${line#"${line%%[![:space:]]*}"}" # ltrim + line="${line#"${line%%[![:space:]]*}"}" # strip leading whitespace line="${line#export }" # Require NAME=VALUE with a valid identifier on the left if [[ ! "$line" =~ ^([A-Za-z_][A-Za-z0-9_]*)=(.*)$ ]]; then From 3fa032154bfdbb37325e19adc00cf2d2d5c096ab Mon Sep 17 00:00:00 2001 From: Changjian Wang Date: Fri, 8 May 2026 15:34:27 +0800 Subject: [PATCH 15/19] [Java CU SDK] Align cu-sdk-sample-run skill structure with Python; add Sample16 coverage - Restructure sample-run SKILL: 4 -> 6 numbered steps (Navigate, Build, Configure Env, Choose Sample, Configure Sample-Specific Settings, Run) - Add Sample16_CreateAnalyzerWithLabels coverage: explicit Manual upload steps, training data file inventory (2 receipts x 3 files), 5-question ASK USER flow, no-training fallback, troubleshooting rows - Add Sample_Advanced_ToLlmInput entry under Advanced Helpers - Settings table: add Sample15 cross-resource env vars (SOURCE_RESOURCE_ID, SOURCE_REGION, TARGET_REGION, TARGET_KEY) so it matches what Sample15_GrantCopyAuth.java actually reads - Sample16 section: clarify SAS prefix trailing slash behavior; note Java parity with Python (Option A only, no auto-upload); Portal navigation hint - cu-sdk-setup: add ASK USER for Sample16 training data and add SAS URL/PREFIX template add-on --- .../.github/skills/cu-sdk-sample-run/SKILL.md | 170 ++++++++++++++---- .../.github/skills/cu-sdk-setup/SKILL.md | 20 +++ 2 files changed, 157 insertions(+), 33 deletions(-) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md index 5263f57f5b1d..34ac9456f4a9 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md @@ -23,6 +23,7 @@ Run a specific sample from the Azure AI Content Understanding Java SDK. > 2. "Have you **built the SDK** or is it available on Maven Central?" -- If no, direct them to Step 2 below. > 3. "Have you configured your **environment variables** (endpoint and credentials)?" -- If no, direct them to Step 3. > 4. "Have you run `Sample00_UpdateDefaults` to configure model defaults?" -- If no and they want to use prebuilt analyzers, guide them to run it first. +> 5. *(Deferred — only if the user later picks `Sample16_CreateAnalyzerWithLabels`.)* "Do you plan to **train with labeled data**? If yes, you'll need an Azure Blob container with the receipt label files uploaded and a SAS URL." Walk them through Step 5's Sample16 subsection when relevant. ## Package Directory @@ -76,8 +77,10 @@ Creates classifier to categorize documents (Loan_Application, Invoice, Bank_Stat - Key concepts: Content categories, segmentation, document routing #### `Sample16_CreateAnalyzerWithLabels` -Builds analyzers with training labels (labeled data from Azure Blob Storage). -- Key concepts: Labeled data, knowledge sources, Blob Storage SAS URIs +Builds an analyzer using **labeled training data** loaded from Azure Blob Storage. The repo ships labeled receipt data at `src/samples/resources/receipt_labels/` (`*.jpg`, `*.jpg.labels.json`, optional `*.jpg.result.json`). +- Key concepts: `LabeledDataKnowledgeSource`, knowledge sources on `ContentAnalyzerConfig`, container SAS URLs, optional path prefix, falls back to creating analyzer **without** training data if SAS URL is unset +- Requires either: (a) a SAS URL for an Azure Blob container with labeled data uploaded, or (b) accepting that no training data is used +- For an easier labeling workflow, use [Azure AI Content Understanding Studio](https://contentunderstanding.ai.azure.com/) ### Analyzer Management @@ -99,7 +102,7 @@ Copies analyzer within the same resource. #### `Sample15_GrantCopyAuth` Cross-resource copying between different Azure resources/regions. -- Requires additional env vars: `CONTENTUNDERSTANDING_TARGET_ENDPOINT`, `CONTENTUNDERSTANDING_TARGET_RESOURCE_ID` +- Requires additional env vars: `CONTENTUNDERSTANDING_SOURCE_RESOURCE_ID`, `CONTENTUNDERSTANDING_SOURCE_REGION`, `CONTENTUNDERSTANDING_TARGET_ENDPOINT`, `CONTENTUNDERSTANDING_TARGET_RESOURCE_ID`, `CONTENTUNDERSTANDING_TARGET_REGION`, `CONTENTUNDERSTANDING_TARGET_KEY` (optional) ### Result Management @@ -111,6 +114,12 @@ Retrieves keyframe images from video analysis. Deletes analysis results for data cleanup. - Key concepts: Result retention (24-hour auto-deletion), compliance +### Advanced Helpers + +#### `Sample_Advanced_ToLlmInput` +Advanced usage of the `LlmInputHelper.toLlmInput` helper that converts an `AnalysisResult` into LLM-ready text. For introductory usage, see `Sample01_AnalyzeBinary`, `Sample03_AnalyzeInvoice`, and `Sample05_CreateClassifier`. +- Key concepts: `ToLlmInputOptions`, content ranges, multi-modal flattening, prompt-friendly formatting + ## Workflow ### Step 1: Navigate to Package Directory @@ -215,23 +224,72 @@ $env:TEXT_EMBEDDING_3_LARGE_DEPLOYMENT = "text-embedding-3-large" > **[ASK USER] Authentication method:** > Ask the user: "How would you like to **authenticate** with Azure?" > - **Option A: DefaultAzureCredential (recommended)** — Uses `az login` or managed identity. No API key needed. Make sure you have run `az login`. -> - **Option B: API Key** — Provide your `CONTENTUNDERSTANDING_KEY` from the Azure Portal → Keys and Endpoint → Key1 or Key2. +> - **Option B: API Key** — Provide your `CONTENTUNDERSTANDING_KEY` from the Azure Portal → Keys and Endpoint → Key1 or Key2. Update `.env` so `CONTENTUNDERSTANDING_KEY=` (replace the empty default). > **[ASK USER] Confirm env vars:** > After the user sets their variables, ask: "Does this configuration look correct?" Wait for confirmation before proceeding. -#### Settings by sample +### Step 4: Choose the Sample + +> **[ASK USER] Which sample?:** +> Ask the user: "Which sample would you like to run?" with options: +> - `Sample00_UpdateDefaults` — Configure model defaults (one-time setup, required first) +> - `Sample02_AnalyzeUrl` — Analyze content from a URL (recommended for first-time users) +> - `Sample01_AnalyzeBinary` — Analyze a local PDF/image file +> - `Sample03_AnalyzeInvoice` — Extract structured fields from an invoice +> - `Sample04_CreateAnalyzer` — Create a custom analyzer +> - `Sample16_CreateAnalyzerWithLabels` — Create an analyzer with labeled training data +> - Other — Let me see the full list + +> **[ASK USER] Sync or async?:** +> Ask: "Would you like to run the **sync** or **async** version of this sample?" +> - Sync (default) — e.g., `Sample02_AnalyzeUrl` +> - Async — e.g., `Sample02_AnalyzeUrlAsync` -| Setting | Required By | Description | -| ----------------------------------- | ---------------------- | ------------------------------------------------------------------------------------------------------------ | -| `CONTENTUNDERSTANDING_ENDPOINT` | **All samples** | Your Microsoft Foundry resource endpoint URL | -| `CONTENTUNDERSTANDING_KEY` | All samples (optional) | API key for key-based auth. If empty, `DefaultAzureCredential` is used (recommended — run `az login` first) | -| `GPT_4_1_DEPLOYMENT` | Sample00_UpdateDefaults| Deployment name for gpt-4.1 model (default: `gpt-4.1`) | -| `GPT_4_1_MINI_DEPLOYMENT` | Sample00_UpdateDefaults| Deployment name for gpt-4.1-mini model (default: `gpt-4.1-mini`) | -| `TEXT_EMBEDDING_3_LARGE_DEPLOYMENT` | Sample00_UpdateDefaults| Deployment name for text-embedding-3-large model (default: `text-embedding-3-large`) | +### Step 5: Configure Sample-Specific Settings + +Most samples only need the base environment variables from Step 3. The following samples require **additional configuration** before running. + +> **[ASK USER] Sample-specific config:** +> Based on the sample chosen in Step 4, walk the user through the matching subsection below: +> - **Prebuilt-analyzer samples** — `Sample02_AnalyzeUrl`, `Sample01_AnalyzeBinary`, `Sample03_AnalyzeInvoice`, `Sample10_AnalyzeConfigs`, `Sample11_AnalyzeReturnRawJson`, `Sample12_GetResultFile`, `Sample13_DeleteResult` → "Have you run `Sample00_UpdateDefaults`?" subsection +> - `Sample01_AnalyzeBinary`, `Sample10_AnalyzeConfigs` → also "Samples that need a local file" subsection +> - `Sample15_GrantCopyAuth` → "Sample15_GrantCopyAuth cross-resource environment" subsection +> - `Sample16_CreateAnalyzerWithLabels` → "Sample16_CreateAnalyzerWithLabels training data" subsection +> - `Sample00_UpdateDefaults` — sets up the model defaults itself; only the base env vars from Step 3 are needed +> - Custom-analyzer samples (`Sample04_CreateAnalyzer`, `Sample05_CreateClassifier`) and management samples (`Sample06`–`Sample09`, `Sample14`) — only the base env vars from Step 3 are needed +> +> If none apply, proceed directly to Step 6. + +#### Settings by sample -| `CONTENTUNDERSTANDING_TARGET_ENDPOINT` | Sample15_GrantCopyAuth | Target Foundry resource endpoint for cross-resource copy | -| `CONTENTUNDERSTANDING_TARGET_RESOURCE_ID` | Sample15_GrantCopyAuth | Target ARM resource ID for cross-resource copy | +| Setting | Required By | Description | +| -------------------------------------------- | --------------------------------- | ------------------------------------------------------------------------------------------------------------ | +| `CONTENTUNDERSTANDING_ENDPOINT` | **All samples** | Your Microsoft Foundry resource endpoint URL | +| `CONTENTUNDERSTANDING_KEY` | All samples (optional) | API key for key-based auth. If empty, `DefaultAzureCredential` is used (recommended — run `az login` first) | +| `GPT_4_1_DEPLOYMENT` | Sample00_UpdateDefaults | Deployment name for gpt-4.1 model (default: `gpt-4.1`) | +| `GPT_4_1_MINI_DEPLOYMENT` | Sample00_UpdateDefaults | Deployment name for gpt-4.1-mini model (default: `gpt-4.1-mini`) | +| `TEXT_EMBEDDING_3_LARGE_DEPLOYMENT` | Sample00_UpdateDefaults | Deployment name for text-embedding-3-large model (default: `text-embedding-3-large`) | +| `CONTENTUNDERSTANDING_SOURCE_RESOURCE_ID` | Sample15_GrantCopyAuth | Source ARM resource ID for cross-resource copy | +| `CONTENTUNDERSTANDING_SOURCE_REGION` | Sample15_GrantCopyAuth | Region of the source Foundry resource (e.g., `westus`) | +| `CONTENTUNDERSTANDING_TARGET_ENDPOINT` | Sample15_GrantCopyAuth | Target Foundry resource endpoint for cross-resource copy | +| `CONTENTUNDERSTANDING_TARGET_RESOURCE_ID` | Sample15_GrantCopyAuth | Target ARM resource ID for cross-resource copy | +| `CONTENTUNDERSTANDING_TARGET_REGION` | Sample15_GrantCopyAuth | Region of the target Foundry resource (e.g., `eastus`) | +| `CONTENTUNDERSTANDING_TARGET_KEY` | Sample15_GrantCopyAuth (optional) | API key for the target resource. If empty, `DefaultAzureCredential` is used | +| `CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL` | Sample16_CreateAnalyzerWithLabels | Optional SAS URL for the Azure Blob container with labeled training data. If unset, the analyzer is created **without** training data | +| `CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX` | Sample16_CreateAnalyzerWithLabels | Optional path prefix within the container (e.g., `receipt_labels/`). Omit if files are at the container root | + +#### Have you run `Sample00_UpdateDefaults`? + +Most samples that use prebuilt analyzers (e.g., `Sample02_AnalyzeUrl`, `Sample03_AnalyzeInvoice`, `Sample10_AnalyzeConfigs`, `Sample11_AnalyzeReturnRawJson`) require model deployments to be configured. `Sample00_UpdateDefaults` writes a one-time mapping from logical model names (gpt-4.1, gpt-4.1-mini, text-embedding-3-large) to your Foundry resource's actual deployment names. Without it, prebuilt analyzers fail with `Model deployment not found`. + +> **[ASK USER] Update defaults check:** +> Ask: "Have you previously run `Sample00_UpdateDefaults` for this Foundry resource?" +> - If yes: Continue to the next subsection (or Step 6 if none apply). +> - If no and the chosen sample uses prebuilt analyzers: +> 1. Run `Sample00_UpdateDefaults` now using the command in Step 6: `mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample00_UpdateDefaults" -Dexec.classpathScope=test` +> 2. Wait for it to print success. +> 3. Then come back to Step 4, re-select the **original** sample the user wanted, and continue from Step 5. #### Samples that need a local file @@ -247,34 +305,74 @@ The `Sample01_AnalyzeBinary` and `Sample10_AnalyzeConfigs` samples load a local The `Sample15_GrantCopyAuth` sample requires **two separate Microsoft Foundry resources** (source and target). -Add the following environment variables: +Add the following environment variables to your `.env` file: -```bash -export CONTENTUNDERSTANDING_TARGET_ENDPOINT="https://your-target-foundry.services.ai.azure.com/" -export CONTENTUNDERSTANDING_TARGET_RESOURCE_ID="/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.CognitiveServices/accounts/{targetAccountName}" ``` +CONTENTUNDERSTANDING_SOURCE_RESOURCE_ID=/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.CognitiveServices/accounts/{sourceAccountName} +CONTENTUNDERSTANDING_SOURCE_REGION=westus +CONTENTUNDERSTANDING_TARGET_ENDPOINT=https://your-target-foundry.services.ai.azure.com/ +CONTENTUNDERSTANDING_TARGET_RESOURCE_ID=/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.CognitiveServices/accounts/{targetAccountName} +CONTENTUNDERSTANDING_TARGET_REGION=eastus +# Optional — only if you want key-based auth for the target resource: +# CONTENTUNDERSTANDING_TARGET_KEY= +``` + +Then reload your shell: `set -a && source .env && set +a`. > **[ASK USER] Cross-resource setup (Sample15_GrantCopyAuth only):** > If the user chose Sample15_GrantCopyAuth, ask: > 1. "Do you have **two separate Microsoft Foundry resources** (source and target) set up?" — If no, guide them to create a second resource. -> 2. "Please provide the **target** resource endpoint URL and ARM Resource ID." -> 3. Confirm: "Both resources must have the **Cognitive Services User** role assigned if using `DefaultAzureCredential`. Is this configured?" +> 2. "Please provide the **source** ARM Resource ID and region, and the **target** endpoint URL, ARM Resource ID, and region." +> 3. "Will you authenticate the target resource with `DefaultAzureCredential` (recommended) or with `CONTENTUNDERSTANDING_TARGET_KEY`?" +> 4. Confirm: "Both resources must have the **Cognitive Services User** role assigned if using `DefaultAzureCredential`. Is this configured?" -### Step 4: Choose and Run the Sample +#### Setting up Sample16_CreateAnalyzerWithLabels training data -> **[ASK USER] Which sample?:** -> Ask the user: "Which sample would you like to run?" with options: -> - `Sample00_UpdateDefaults` — Configure model defaults (one-time setup, required first) -> - `Sample02_AnalyzeUrl` — Analyze content from a URL (recommended for first-time users) -> - `Sample01_AnalyzeBinary` — Analyze a local PDF/image file -> - `Sample03_AnalyzeInvoice` — Extract structured fields from an invoice -> - `Sample04_CreateAnalyzer` — Create a custom analyzer -> - Other — Let me see the full list +The `Sample16_CreateAnalyzerWithLabels` sample creates an analyzer with **labeled training data** loaded from Azure Blob Storage via a SAS URL. -> **[ASK USER] Sync or async?:** -> Ask: "Would you like to run the **sync** or **async** version of this sample?" -> - Sync (default) — e.g., `Sample02_AnalyzeUrl` -> - Async — e.g., `Sample02_AnalyzeUrlAsync` +> **Note (Java vs. Python parity):** The Java sample only supports providing a pre-uploaded SAS URL ("Option A"). Unlike the Python equivalent (`sample_create_analyzer_with_labels.py`), the Java sample does **not** auto-upload local files using `DefaultAzureCredential`. You must upload the labeled receipts manually before running. + +> **Note:** If `CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL` is **not set**, the sample still runs but creates an analyzer **without** training data. To exercise the labeled-data path, follow the steps below. + +The repo ships labeled receipt training data at `src/samples/resources/receipt_labels/`. Two labeled receipts are included; each receipt has three associated files: + +``` +17a84146-e910-460c-bf80-a625e6f64fea.jpg # original image +17a84146-e910-460c-bf80-a625e6f64fea.jpg.labels.json # labeled fields (required) +17a84146-e910-460c-bf80-a625e6f64fea.jpg.result.json # OCR result (optional) +29d60394-3da1-4714-abdc-ff0993009872.jpg +29d60394-3da1-4714-abdc-ff0993009872.jpg.labels.json +29d60394-3da1-4714-abdc-ff0993009872.jpg.result.json +``` + +Upload these into an Azure Blob container and provide a SAS URL. + +> **Manual upload steps:** +> 1. Create an Azure Blob Storage container (or use an existing one). +> 2. Upload **all** files from `src/samples/resources/receipt_labels/` (the `.jpg`, `.jpg.labels.json`, and optional `.jpg.result.json` files listed above) into the container. You may upload them at the container root or inside a subfolder (e.g., `receipt_labels/`). +> 3. In Azure Portal: open the storage account, then either navigate Storage account → Containers → your container → **Shared access tokens**, or use the Portal search bar to find "Shared access tokens" (the exact UI path varies by Portal version). Set an expiry, grant at least **List** and **Read** permissions, then generate the SAS URL. +> 4. Add the SAS URL to your `.env` file: +> ``` +> CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL=https://.blob.core.windows.net/?sv=...&se=... +> # Only if you uploaded into a subfolder: +> CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX=receipt_labels/ +> ``` +> *(Both `receipt_labels` and `receipt_labels/` work as prefix values — the SDK handles the trailing slash either way.)* +> 5. Reload your shell: `set -a && source .env && set +a`. + +> **[ASK USER] Sample16 training data (Sample16_CreateAnalyzerWithLabels only):** +> If the user chose `Sample16_CreateAnalyzerWithLabels`, ask: +> 1. "Do you want to **train with labeled data** (recommended) or **create the analyzer without training data**?" +> - If **without training data**: **Leave `CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL` empty or unset in `.env`** (this is the implicit switch). Skip the next questions and proceed to Step 6 — the sample will still run. +> - If **with training data**: Continue. +> 2. "Have you uploaded the contents of `src/samples/resources/receipt_labels/` to an Azure Blob container and generated a SAS URL?" — If no, walk them through the manual upload steps above. +> 3. "Did you upload the files at the **container root** or inside a **subfolder**?" +> - If root: leave `CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX` unset. +> - If subfolder: ask for the prefix path (e.g., `receipt_labels/`). +> 4. "Please provide the **SAS URL**." +> 5. Confirm: "The SAS token must have at least **List** and **Read** permissions and must **not be expired**." + +### Step 6: Run the Sample Run the sample with Maven directly: @@ -297,6 +395,9 @@ mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample # Run invoice extraction mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample03_AnalyzeInvoice" -Dexec.classpathScope=test + +# Run analyzer with labeled training data +mvn exec:java -Dexec.mainClass="com.azure.ai.contentunderstanding.samples.Sample16_CreateAnalyzerWithLabels" -Dexec.classpathScope=test ``` > **Note:** The `-Dexec.classpathScope=test` flag is **required**. Samples live in `src/samples/`, which is compiled as a test source root — not part of the main classpath. This is an Azure SDK for Java convention: samples are not shipped in the published JAR, and they depend on test-scoped dependencies (e.g., `azure-identity`). Without this flag, Maven cannot find the sample classes and will fail with `ClassNotFoundException`. @@ -408,6 +509,9 @@ Wraps `mvn exec:java` with sample name resolution, validation, and optional `.en | `FileNotFoundException` for binary samples | Run samples from the package root directory (`sdk/contentunderstanding/azure-ai-contentunderstanding`) | | `Parent POM not resolved` | Run `mvn install -DskipTests -f ../../parents/azure-client-sdk-parent/pom.xml` first | | `Permission denied` when running scripts | Make scripts executable: `chmod +x .github/skills/cu-sdk-sample-run/scripts/*.sh` | +| Sample16: `AuthenticationFailed` / `403` reading training data | The SAS URL is invalid, expired, or missing required permissions. Regenerate the SAS with at least **List** and **Read** and a fresh expiry, then re-source `.env` | +| Sample16: `BlobNotFound` or empty training set | The `CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX` does not match where you uploaded the files. Either upload files at the container root and unset the prefix, or set the prefix to the actual subfolder (e.g., `receipt_labels/`) | +| Sample16: created analyzer has no training data | `CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL` was empty when the sample ran. Set it in `.env`, re-run `set -a && source .env && set +a`, then re-run the sample | ## Related Skills diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md index 52be6f6748e4..2dcfcc91bc85 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-setup/SKILL.md @@ -246,6 +246,13 @@ if (Test-Path ".env") { > 5. Target ARM resource ID (same format as above, for the target Foundry resource) > 6. Target region (e.g., `swedencentral`) +> **[ASK USER] Labeled training data (optional):** +> Ask: "Do you plan to use **labeled training data** (`Sample16_CreateAnalyzerWithLabels`)?" +> - If no: Skip this section. The sample still runs but creates an analyzer **without** training data. +> - If yes: Gather the following additional values: +> 1. SAS URL for the Azure Blob container that holds your uploaded label files (full URL including the `?sv=...&se=...` query). The repo ships labeled receipts at `src/samples/resources/receipt_labels/` — you upload these into a container, then generate a SAS with at least **List** and **Read** permissions (Azure Portal → Storage account → Containers → your container → **Shared access tokens**). +> 2. (Optional) Path prefix within the container (e.g., `receipt_labels/`). Leave empty if files sit at the container root. + #### 4.3 Validate Configuration > **[ASK USER] Validate configuration:** @@ -310,6 +317,19 @@ CONTENTUNDERSTANDING_TARGET_RESOURCE_ID=/subscriptions/{subscriptionId}/resource CONTENTUNDERSTANDING_TARGET_REGION=swedencentral ``` +**Optional add-on for `Sample16_CreateAnalyzerWithLabels`:** + +Append the following lines to your `.env` if you want Sample16 to train with labeled data. If unset, the sample still runs but creates an analyzer **without** training data. + +```bash +# Labeled training data (only for Sample16_CreateAnalyzerWithLabels) +# Full container SAS URL (must include ?sv=...&se=...). Required for labeled training. +CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL=https://.blob.core.windows.net/?sv=...&se=... + +# Optional path prefix within the container. Omit if files are at the container root. +CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX=receipt_labels/ +``` + ### Step 5: Azure Resource Setup (if not done) > **[NOTE]:** Only guide the user through this step if they indicated during the prerequisites check that they do NOT yet have a Microsoft Foundry resource. Otherwise, skip to Step 6. From c29be382c5e1cc6fa20938631b7e711513b97921 Mon Sep 17 00:00:00 2001 From: Changjian Wang Date: Sat, 9 May 2026 10:30:45 +0800 Subject: [PATCH 16/19] Enhance Sample16_CreateAnalyzerWithLabelsAsync and related tests for improved training data handling - Updated the sample to support two options for configuring training data: - Option A: Use a pre-generated SAS URL for Azure Blob Storage. - Option B: Automatically upload local label files and generate a User Delegation SAS URL using DefaultAzureCredential. - Renamed "Total" field to "TotalPrice" in the field schema for consistency. - Improved documentation to clarify the setup process for training data. - Enhanced test cases to reflect changes in the sample and ensure proper functionality with both training data options. Co-authored-by: Copilot --- .../.github/skills/cu-sdk-sample-run/SKILL.md | 66 +++-- .../cu-sdk-sample-run/scripts/run_sample.sh | 20 ++ .../CHANGELOG.md | 2 + .../azure-ai-contentunderstanding/assets.json | 2 +- .../azure-ai-contentunderstanding/pom.xml | 12 + .../Sample16_CreateAnalyzerWithLabels.java | 275 +++++++++++++++--- ...ample16_CreateAnalyzerWithLabelsAsync.java | 267 ++++++++++++++--- ...e16_CreateAnalyzerWithLabelsAsyncTest.java | 101 ++++--- ...Sample16_CreateAnalyzerWithLabelsTest.java | 101 ++++--- 9 files changed, 676 insertions(+), 170 deletions(-) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md index 34ac9456f4a9..ecf80d5593e6 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md @@ -276,8 +276,11 @@ Most samples only need the base environment variables from Step 3. The following | `CONTENTUNDERSTANDING_TARGET_RESOURCE_ID` | Sample15_GrantCopyAuth | Target ARM resource ID for cross-resource copy | | `CONTENTUNDERSTANDING_TARGET_REGION` | Sample15_GrantCopyAuth | Region of the target Foundry resource (e.g., `eastus`) | | `CONTENTUNDERSTANDING_TARGET_KEY` | Sample15_GrantCopyAuth (optional) | API key for the target resource. If empty, `DefaultAzureCredential` is used | -| `CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL` | Sample16_CreateAnalyzerWithLabels | Optional SAS URL for the Azure Blob container with labeled training data. If unset, the analyzer is created **without** training data | -| `CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX` | Sample16_CreateAnalyzerWithLabels | Optional path prefix within the container (e.g., `receipt_labels/`). Omit if files are at the container root | +| `CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL` | Sample16 (Option A) | Pre-generated container-level SAS URL pointing at your labeled training data. If set, the sample uses it directly and skips Option B. | +| `CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX` | Sample16 (optional) | Optional prefix (e.g., `receipt_labels` or `receipt_labels/`) that scopes the labeled data within the container. Both forms work — the SDK normalises the trailing slash. | +| `CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT` | Sample16 (Option B) | Storage account name (e.g., `mystorageacct`). Used by Option B (auto-upload) to upload the bundled `src/samples/resources/receipt_labels/` files via `DefaultAzureCredential` and mint a User Delegation SAS URL. | +| `CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER` | Sample16 (Option B) | Container name (e.g., `cu-training-data`). Created on demand by Option B. | +| `CONTENTUNDERSTANDING_TRAINING_DATA_LOCAL_DIR` | Sample16 (Option B, optional) | Override the local folder of label files to upload. Defaults to `src/samples/resources/receipt_labels`. | #### Have you run `Sample00_UpdateDefaults`? @@ -328,11 +331,12 @@ Then reload your shell: `set -a && source .env && set +a`. #### Setting up Sample16_CreateAnalyzerWithLabels training data -The `Sample16_CreateAnalyzerWithLabels` sample creates an analyzer with **labeled training data** loaded from Azure Blob Storage via a SAS URL. +The `Sample16_CreateAnalyzerWithLabels` sample creates an analyzer backed by **labeled training data** loaded from Azure Blob Storage via a SAS URL. You can configure training data in two ways: -> **Note (Java vs. Python parity):** The Java sample only supports providing a pre-uploaded SAS URL ("Option A"). Unlike the Python equivalent (`sample_create_analyzer_with_labels.py`), the Java sample does **not** auto-upload local files using `DefaultAzureCredential`. You must upload the labeled receipts manually before running. +- **Option A — Manual upload**: you upload the labeled triplets (image + `.labels.json` + `.result.json`) yourself and provide a container SAS URL via `CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL`. +- **Option B — Auto-upload via `DefaultAzureCredential`**: the sample uses your `az login` identity to upload the bundled receipt files from `src/samples/resources/receipt_labels/` into your storage account and mint a short-lived User Delegation SAS — set `CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT` and `CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER`. The signed-in identity must have **Storage Blob Data Contributor** on the container. -> **Note:** If `CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL` is **not set**, the sample still runs but creates an analyzer **without** training data. To exercise the labeled-data path, follow the steps below. +> **Note:** If neither option is configured, the sample runs in **demo mode**: it still creates the analyzer (without labeled data) so you can see the API surface. To fully exercise the labeled-data path you must pick Option A or Option B. The repo ships labeled receipt training data at `src/samples/resources/receipt_labels/`. Two labeled receipts are included; each receipt has three associated files: @@ -345,9 +349,7 @@ The repo ships labeled receipt training data at `src/samples/resources/receipt_l 29d60394-3da1-4714-abdc-ff0993009872.jpg.result.json ``` -Upload these into an Azure Blob container and provide a SAS URL. - -> **Manual upload steps:** +> **Option A — manual upload steps:** > 1. Create an Azure Blob Storage container (or use an existing one). > 2. Upload **all** files from `src/samples/resources/receipt_labels/` (the `.jpg`, `.jpg.labels.json`, and optional `.jpg.result.json` files listed above) into the container. You may upload them at the container root or inside a subfolder (e.g., `receipt_labels/`). > 3. In Azure Portal: open the storage account, then either navigate Storage account → Containers → your container → **Shared access tokens**, or use the Portal search bar to find "Shared access tokens" (the exact UI path varies by Portal version). Set an expiry, grant at least **List** and **Read** permissions, then generate the SAS URL. @@ -355,22 +357,50 @@ Upload these into an Azure Blob container and provide a SAS URL. > ``` > CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL=https://.blob.core.windows.net/?sv=...&se=... > # Only if you uploaded into a subfolder: -> CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX=receipt_labels/ +> CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX=receipt_labels > ``` > *(Both `receipt_labels` and `receipt_labels/` work as prefix values — the SDK handles the trailing slash either way.)* > 5. Reload your shell: `set -a && source .env && set +a`. -> **[ASK USER] Sample16 training data (Sample16_CreateAnalyzerWithLabels only):** -> If the user chose `Sample16_CreateAnalyzerWithLabels`, ask: -> 1. "Do you want to **train with labeled data** (recommended) or **create the analyzer without training data**?" -> - If **without training data**: **Leave `CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL` empty or unset in `.env`** (this is the implicit switch). Skip the next questions and proceed to Step 6 — the sample will still run. -> - If **with training data**: Continue. -> 2. "Have you uploaded the contents of `src/samples/resources/receipt_labels/` to an Azure Blob container and generated a SAS URL?" — If no, walk them through the manual upload steps above. +> **Option B — auto-upload via `DefaultAzureCredential`:** +> 1. Ensure `az login` has been completed and your account has **Storage Blob Data Contributor** on the target container (or the parent storage account). +> 2. Add to your `.env`: +> ``` +> CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT=mystorageacct +> CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER=cu-training-data +> # Leave CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL empty/unset to trigger Option B. +> # Optional: override the local folder uploaded (defaults to src/samples/resources/receipt_labels). +> # CONTENTUNDERSTANDING_TRAINING_DATA_LOCAL_DIR=/path/to/your/labels +> # Optional: upload into a sub-folder (e.g., receipt_labels) instead of the container root. +> # CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX=receipt_labels +> ``` +> 3. Reload your shell: `set -a && source .env && set +a`. +> 4. The sample will create the container if it does not exist, upload the bundled files, mint a 1-hour User Delegation SAS, and use it to create the analyzer. + +> **[REQUIRED GATE] Sample16 training data (Sample16_CreateAnalyzerWithLabels only):** +> Sample16 silently falls back to "create analyzer without labeled data" when no training-data +> source is configured. That fall-back path completes end-to-end and prints `✓ Sample completed` +> even though the labeled-data API surface is **not** actually exercised. Before invoking +> `mvn exec:java` (or `run_sample.sh Sample16_CreateAnalyzerWithLabels --run`), the agent +> **must** ask the user the questions below and act on the answer: +> +> 1. "Do you want to **train with labeled data** (recommended), or **create the analyzer without training data** (demo mode)?" +> - If **demo mode**: confirm explicitly — "I will run Sample16 *without* training data. The output will say `Knowledge srcs: 0` and you will see a `DEMO MODE` banner. The labeled-data API path will **not** be exercised. OK to proceed?" Only continue after the user says yes; leave both Option A and Option B env vars empty/unset. +> - If **with training data**: continue with one of the next two questions. +> 2. "Will you use **Option A (pre-generated SAS URL)** or **Option B (auto-upload via `DefaultAzureCredential`)**?" +> - **Option A**: ask for the SAS URL and (optionally) prefix; walk through the manual-upload steps above if not yet done. +> - **Option B**: ask for the storage account name and container name; remind them about the **Storage Blob Data Contributor** role and `az login`. > 3. "Did you upload the files at the **container root** or inside a **subfolder**?" > - If root: leave `CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX` unset. > - If subfolder: ask for the prefix path (e.g., `receipt_labels/`). -> 4. "Please provide the **SAS URL**." -> 5. Confirm: "The SAS token must have at least **List** and **Read** permissions and must **not be expired**." +> 4. Confirm: "For Option A, the SAS token must have at least **List** and **Read** permissions and must **not be expired**. For Option B, the signed-in identity must have **Storage Blob Data Contributor** on the container." +> +> **Belt-and-suspenders**: `run_sample.sh` itself emits a loud `DEMO MODE` banner before the +> `mvn exec:java` step when none of the four training-data env vars +> (`CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL`, `..._STORAGE_ACCOUNT`, `..._CONTAINER`, +> `..._LOCAL_DIR`) are set, so the fall-back is unmissable in the captured run output. Treat +> that banner as a signal that validation is **incomplete** unless the user explicitly opted +> into demo mode in step 1. ### Step 6: Run the Sample @@ -511,7 +541,7 @@ Wraps `mvn exec:java` with sample name resolution, validation, and optional `.en | `Permission denied` when running scripts | Make scripts executable: `chmod +x .github/skills/cu-sdk-sample-run/scripts/*.sh` | | Sample16: `AuthenticationFailed` / `403` reading training data | The SAS URL is invalid, expired, or missing required permissions. Regenerate the SAS with at least **List** and **Read** and a fresh expiry, then re-source `.env` | | Sample16: `BlobNotFound` or empty training set | The `CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX` does not match where you uploaded the files. Either upload files at the container root and unset the prefix, or set the prefix to the actual subfolder (e.g., `receipt_labels/`) | -| Sample16: created analyzer has no training data | `CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL` was empty when the sample ran. Set it in `.env`, re-run `set -a && source .env && set +a`, then re-run the sample | +| Sample16: created analyzer has no training data | Neither Option A nor Option B was configured. Set `CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL` (Option A), or set `CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT` + `CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER` (Option B), then re-run `set -a && source .env && set +a` and re-run the sample. | ## Related Skills diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh index 75cf05210e37..2084f89e3d4b 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh @@ -217,6 +217,26 @@ if [[ -z "${CONTENTUNDERSTANDING_ENDPOINT:-}" ]]; then echo "" fi +# Sample16 demo-mode banner: warn if the user is about to run the labeled-data +# sample without configuring either Option A (SAS URL) or Option B (storage +# account + container) — the sample will still run but skip the labeled-data +# code path. +if [[ "$SAMPLE_NAME" == Sample16* ]]; then + if [[ -z "${CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL:-}" ]]; then + if [[ -z "${CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT:-}" \ + || -z "${CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER:-}" ]]; then + print_warning "⚠ DEMO MODE: no training data configured for $SAMPLE_NAME." + echo " The analyzer will be created without labeled data ('Knowledge srcs: 0')." + echo " To exercise the labeled-data API path, configure ONE of:" + echo " Option A: CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL=" + echo " Option B: CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT=" + echo " CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER=" + echo " then re-run: set -a && source .env && set +a" + echo "" + fi + fi +fi + # Build command. Sample classes live under src/samples/java and are compiled # as test sources, so we must run test-compile before exec:java; otherwise on # a clean checkout the sample class will not exist on the classpath. diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/CHANGELOG.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/CHANGELOG.md index dd6a421cec16..695fcfde3438 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/CHANGELOG.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/CHANGELOG.md @@ -10,6 +10,8 @@ ### Other Changes +- `Sample16_CreateAnalyzerWithLabels`: aligned with the .NET parity sample. The labeled-receipt field schema now uses `TotalPrice` (was `Total`), and the sample supports auto-uploading the bundled label files via `DefaultAzureCredential` (Option B \u2014 set `CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT` and `CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER`) in addition to the existing pre-generated SAS URL flow (Option A \u2014 `CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL`). When neither option is configured the sample now prints a clear `DEMO MODE` banner. + ## 1.1.0-beta.1 (2026-05-01) ### Features Added diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json b/sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json index 9c12b8328768..a968ced10de0 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json @@ -2,5 +2,5 @@ "AssetsRepo": "Azure/azure-sdk-assets", "AssetsRepoPrefixPath": "java", "TagPrefix": "java/contentunderstanding/azure-ai-contentunderstanding", - "Tag": "java/contentunderstanding/azure-ai-contentunderstanding_3775d156e8" + "Tag": "java/contentunderstanding/azure-ai-contentunderstanding_bbad9ba96f" } diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/pom.xml b/sdk/contentunderstanding/azure-ai-contentunderstanding/pom.xml index b3e2abe9e084..45d362d94d5b 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/pom.xml +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/pom.xml @@ -68,6 +68,18 @@ azure-identity 1.18.3 + + + com.azure + azure-storage-blob + 12.33.3 + diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabels.java b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabels.java index 09e9a58dff21..6bc9888743b3 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabels.java +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabels.java @@ -6,19 +6,34 @@ import com.azure.ai.contentunderstanding.ContentUnderstandingClient; import com.azure.ai.contentunderstanding.ContentUnderstandingClientBuilder; +import com.azure.ai.contentunderstanding.models.AnalysisInput; +import com.azure.ai.contentunderstanding.models.AnalysisResult; import com.azure.ai.contentunderstanding.models.ContentAnalyzer; import com.azure.ai.contentunderstanding.models.ContentAnalyzerConfig; +import com.azure.ai.contentunderstanding.models.ContentField; import com.azure.ai.contentunderstanding.models.ContentFieldDefinition; import com.azure.ai.contentunderstanding.models.ContentFieldSchema; import com.azure.ai.contentunderstanding.models.ContentFieldType; +import com.azure.ai.contentunderstanding.models.DocumentContent; import com.azure.ai.contentunderstanding.models.GenerationMethod; import com.azure.ai.contentunderstanding.models.KnowledgeSource; import com.azure.ai.contentunderstanding.models.LabeledDataKnowledgeSource; import com.azure.core.credential.AzureKeyCredential; +import com.azure.core.credential.TokenCredential; import com.azure.core.util.polling.SyncPoller; import com.azure.identity.DefaultAzureCredentialBuilder; - +import com.azure.storage.blob.BlobContainerClient; +import com.azure.storage.blob.BlobContainerClientBuilder; +import com.azure.storage.blob.BlobServiceClient; +import com.azure.storage.blob.BlobServiceClientBuilder; +import com.azure.storage.blob.models.UserDelegationKey; +import com.azure.storage.blob.sas.BlobContainerSasPermission; +import com.azure.storage.blob.sas.BlobServiceSasSignatureValues; + +import java.io.File; +import java.time.OffsetDateTime; import java.util.ArrayList; +import java.util.Arrays; import java.util.HashMap; import java.util.List; import java.util.Map; @@ -31,24 +46,24 @@ * For an easier labeling workflow, use Azure AI Content Understanding Studio at * https://contentunderstanding.ai.azure.com/ * - * Labeled receipt data is available in this repo at {@code src/samples/resources/receipt_labels} - * (images and corresponding .labels.json files). To use it for training: + *

Labeled receipt data is bundled in this repo at + * {@code src/samples/resources/receipt_labels} (images and corresponding {@code .labels.json} / + * {@code .result.json} files).

* - *

Manual instructions to upload labels into Azure Blob Storage:

+ *

You can configure training data in two ways:

*
    - *
  1. Create an Azure Blob Storage container (or use an existing one).
  2. - *
  3. Upload the contents of {@code src/samples/resources/receipt_labels} into the container. - * You may upload into the container root or into a subfolder (e.g., "receipt_labels/").
  4. - *
  5. Generate a SAS (Shared Access Signature) URL for the container with at least List and Read - * permissions. In Azure Portal: Storage account → Containers → your container → Shared access - * token; set expiry and permissions, then generate the SAS URL.
  6. - *
  7. Set {@code CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL} to the full SAS URL - * (e.g., https://<account>.blob.core.windows.net/<container>?sv=...&se=...).
  8. - *
  9. If you uploaded into a subfolder, set {@code CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX} to - * that path (e.g., "receipt_labels/"). If files are at the container root, omit the prefix - * or leave it unset.
  10. + *
  11. Option A — pre-generated SAS URL: upload the label files yourself and supply the + * container SAS URL via {@code CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL}.
  12. + *
  13. Option B — auto-upload via DefaultAzureCredential: set + * {@code CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT} and + * {@code CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER}; the sample uploads the bundled + * label files and generates a User Delegation SAS URL automatically. The signed-in + * identity must have Storage Blob Data Contributor on the container.
  14. *
* + *

If neither option is configured the sample runs in demo mode: it creates the + * analyzer without labeled data so you can still see the API surface and shape of the response.

+ * *

Each labeled document in the training folder includes:

*
    *
  • The original file (e.g., PDF or image).
  • @@ -65,12 +80,16 @@ * *

    Optional environment variables (for labeled training data):

    *
      - *
    • {@code CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL} – SAS URL for the Azure Blob container - * with labeled training data. If set, the analyzer is created with a labeled-data knowledge - * source; otherwise, created without training data.
    • + *
    • {@code CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL} – Option A: SAS URL for the Azure Blob + * container with labeled training data.
    • *
    • {@code CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX} – Path prefix within the container - * (e.g., "receipt_labels/" or "CreateAnalyzerWithLabels/"). Omit or leave unset if files - * are at the container root.
    • + * (e.g., "receipt_labels/"). Omit if files are at the container root. + *
    • {@code CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT} – Option B: storage account + * name (without {@code .blob.core.windows.net}).
    • + *
    • {@code CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER} – Option B: container name (created + * if missing).
    • + *
    • {@code CONTENTUNDERSTANDING_TRAINING_DATA_LOCAL_DIR} – Option B: local directory of label + * files to upload (defaults to {@code src/samples/resources/receipt_labels}).
    • *
    */ public class Sample16_CreateAnalyzerWithLabels { @@ -79,9 +98,15 @@ public static void main(String[] args) { // BEGIN: com.azure.ai.contentunderstanding.sample16.buildClient String endpoint = System.getenv("CONTENTUNDERSTANDING_ENDPOINT"); String key = System.getenv("CONTENTUNDERSTANDING_KEY"); + + // Option A: pre-generated SAS URL with Read + List permissions String sasUrl = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL"); String sasUrlPrefix = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX"); + // Option B: auto-upload local label files and generate a User Delegation SAS URL + String storageAccount = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT"); + String containerName = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER"); + // Build the client with appropriate authentication ContentUnderstandingClientBuilder builder = new ContentUnderstandingClientBuilder().endpoint(endpoint); @@ -147,31 +172,54 @@ public static void main(String[] args) { itemsField.setItemDefinition(itemDefinition); fields.put("Items", itemsField); - // Total field - ContentFieldDefinition totalField = new ContentFieldDefinition(); - totalField.setType(ContentFieldType.STRING); - totalField.setMethod(GenerationMethod.EXTRACT); - totalField.setDescription("Total amount"); - fields.put("TotalPrice", totalField); + // TotalPrice field + ContentFieldDefinition totalPriceField = new ContentFieldDefinition(); + totalPriceField.setType(ContentFieldType.STRING); + totalPriceField.setMethod(GenerationMethod.EXTRACT); + totalPriceField.setDescription("Total amount"); + fields.put("TotalPrice", totalPriceField); ContentFieldSchema fieldSchema = new ContentFieldSchema(); fieldSchema.setName("receipt_schema"); fieldSchema.setDescription("Schema for receipt extraction with items"); fieldSchema.setFields(fields); - // Step 2: Create labeled data knowledge source (optional, based on environment variable) + // Step 2: Resolve the training-data SAS URL. + // Option A — pre-generated SAS URL was already read above. + // Option B — if Option A is not set but a storage account + container are configured, + // upload the bundled label files and generate a User Delegation SAS URL. + if ((sasUrl == null || sasUrl.trim().isEmpty()) + && storageAccount != null && !storageAccount.trim().isEmpty() + && containerName != null && !containerName.trim().isEmpty()) { + TokenCredential credential = new DefaultAzureCredentialBuilder().build(); + String localLabelDir = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_LOCAL_DIR"); + if (localLabelDir == null || localLabelDir.trim().isEmpty()) { + localLabelDir = "src/samples/resources/receipt_labels"; + } + localLabelDir = resolveLocalLabelDir(localLabelDir); + uploadTrainingData(storageAccount, containerName, credential, localLabelDir, sasUrlPrefix); + sasUrl = generateUserDelegationSasUrl(storageAccount, containerName, credential); + } + + // Step 3: Create knowledge source from labeled data (if available) List knowledgeSources = new ArrayList<>(); if (sasUrl != null && !sasUrl.trim().isEmpty()) { LabeledDataKnowledgeSource knowledgeSource = new LabeledDataKnowledgeSource() - .setContainerUrl(sasUrl) - .setPrefix(sasUrlPrefix); + .setContainerUrl(sasUrl); + if (sasUrlPrefix != null && !sasUrlPrefix.trim().isEmpty()) { + knowledgeSource.setPrefix(sasUrlPrefix); + } knowledgeSources.add(knowledgeSource); - System.out.println("Using labeled training data from: " + sasUrl.substring(0, Math.min(50, sasUrl.length())) + "..."); + System.out.println("Using labeled training data from: " + + sasUrl.substring(0, Math.min(50, sasUrl.length())) + "..."); } else { - System.out.println("No CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL set, creating analyzer without labeled training data"); + System.out.println("DEMO MODE: no training data configured. The analyzer will be created without labeled data."); + System.out.println(" Set CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL (Option A), or both"); + System.out.println(" CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT and CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER (Option B),"); + System.out.println(" to fully exercise the labeled-data API path."); } - // Step 3: Create analyzer (with or without labeled data) + // Step 4: Create analyzer (with or without labeled data) Map models = new HashMap<>(); models.put("completion", "gpt-4.1"); models.put("embedding", "text-embedding-3-large"); @@ -189,7 +237,6 @@ public static void main(String[] args) { analyzer.setKnowledgeSources(knowledgeSources); } - // For demonstration without actual training data, create analyzer without knowledge sources SyncPoller createPoller = client.beginCreateAnalyzer(analyzerId, analyzer, true); ContentAnalyzer result = createPoller.getFinalResult(); @@ -198,6 +245,8 @@ public static void main(String[] args) { System.out.println(" Description: " + result.getDescription()); System.out.println(" Base analyzer: " + result.getBaseAnalyzerId()); System.out.println(" Fields: " + result.getFieldSchema().getFields().size()); + System.out.println(" Knowledge srcs: " + + (result.getKnowledgeSources() == null ? 0 : result.getKnowledgeSources().size())); // END: com.azure.ai.contentunderstanding.createAnalyzerWithLabels // Verify analyzer creation @@ -210,27 +259,67 @@ public static void main(String[] args) { System.out.println(" MerchantName: String (Extract)"); System.out.println(" Items: Array of Objects (Generate)"); System.out.println(" - Quantity, Name, Price"); - System.out.println(" Total: String (Extract)"); + System.out.println(" TotalPrice: String (Extract)"); ContentFieldDefinition itemsFieldResult = resultFields.get("Items"); System.out.println("Items field verified:"); System.out.println(" Type: " + itemsFieldResult.getType()); System.out.println(" Item properties: " + itemsFieldResult.getItemDefinition().getProperties().size()); + // If training data was provided, test the analyzer with a sample document. + if (sasUrl != null && !sasUrl.trim().isEmpty()) { + System.out.println("\nTesting analyzer with sample document..."); + String testDocUrl + = "https://github.com/Azure-Samples/cognitive-services-REST-api-samples/raw/master/curl/form-recognizer/sample-invoice.pdf"; + + AnalysisInput input = new AnalysisInput(); + input.setUrl(testDocUrl); + + AnalysisResult analyzeResult + = client.beginAnalyze(analyzerId, Arrays.asList(input)).getFinalResult(); + + System.out.println("Analysis completed!"); + + if (analyzeResult.getContents() != null + && !analyzeResult.getContents().isEmpty() + && analyzeResult.getContents().get(0) instanceof DocumentContent) { + DocumentContent docContent = (DocumentContent) analyzeResult.getContents().get(0); + System.out.println("Extracted fields: " + docContent.getFields().size()); + + if (docContent.getFields().containsKey("MerchantName")) { + ContentField merchantField = docContent.getFields().get("MerchantName"); + if (merchantField != null && merchantField.getValue() instanceof String) { + System.out.println(" MerchantName: " + merchantField.getValue()); + } + } + if (docContent.getFields().containsKey("TotalPrice")) { + ContentField totalField = docContent.getFields().get("TotalPrice"); + if (totalField != null && totalField.getValue() instanceof String) { + System.out.println(" TotalPrice: " + totalField.getValue()); + } + } + } + } + // Display API pattern information System.out.println("\nCreateAnalyzerWithLabels API Pattern:"); System.out.println(" 1. Define field schema with nested structures (arrays, objects)"); System.out.println(" 2. Upload training data to Azure Blob Storage:"); - System.out.println(" - Documents: receipt1.pdf, receipt2.pdf, ..."); - System.out.println(" - Labels: receipt1.pdf.labels.json, receipt2.pdf.labels.json, ..."); - System.out.println(" - OCR: receipt1.pdf.result.json, receipt2.pdf.result.json, ..."); + System.out.println(" - Documents: receipt1.jpg, receipt2.jpg, ..."); + System.out.println(" - Labels: receipt1.jpg.labels.json, receipt2.jpg.labels.json, ..."); + System.out.println(" - OCR: receipt1.jpg.result.json, receipt2.jpg.result.json, ..."); System.out.println(" 3. Create LabeledDataKnowledgeSource with storage SAS URL"); System.out.println(" 4. Create analyzer with field schema and knowledge sources"); System.out.println(" 5. Use analyzer for document analysis"); System.out.println("\nCreateAnalyzerWithLabels pattern demonstration completed"); - System.out.println(" Note: This sample demonstrates the API pattern."); - System.out.println(" For actual training, provide CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL with labeled data."); + if (sasUrl == null || sasUrl.trim().isEmpty()) { + System.out.println(" Note: This sample demonstrates the API pattern."); + System.out.println( + " For actual training, provide CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL (Option A)"); + System.out.println( + " or CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT + ..._CONTAINER (Option B)."); + } } catch (Exception e) { System.err.println("Error: " + e.getMessage()); @@ -245,4 +334,112 @@ public static void main(String[] args) { } } } + + /** + * Resolves the configured local label directory so the sample works whether the JVM is + * launched from the package directory or from the repository root. Tries the path as-is, then + * the path resolved under {@code sdk/contentunderstanding/azure-ai-contentunderstanding/} (the + * package root). The first path that exists is returned; otherwise the original input is + * returned unchanged so the downstream error message points at the user-supplied location. + * + * @param input candidate path (relative or absolute) + * @return the resolved path + */ + static String resolveLocalLabelDir(String input) { + if (input == null) { + return null; + } + File asIs = new File(input); + if (asIs.isAbsolute() || asIs.isDirectory()) { + return input; + } + File underPackage = new File("sdk/contentunderstanding/azure-ai-contentunderstanding", input); + if (underPackage.isDirectory()) { + return underPackage.getPath(); + } + return input; + } + + /** + * Uploads local training data files (images, .labels.json, .result.json) to an Azure Blob + * container. Existing blobs with the same name are overwritten. + * + * @param storageAccountName storage account name (no {@code .blob.core.windows.net} suffix) + * @param containerName container name (created if it does not exist) + * @param credential credential with write access to the container + * @param localDirectory local folder containing the label files + * @param prefix optional blob prefix (virtual folder) to prepend, e.g. {@code "receipt_labels/"} + */ + public static void uploadTrainingData( + String storageAccountName, + String containerName, + TokenCredential credential, + String localDirectory, + String prefix) { + BlobContainerClient containerClient = new BlobContainerClientBuilder() + .endpoint("https://" + storageAccountName + ".blob.core.windows.net") + .containerName(containerName) + .credential(credential) + .buildClient(); + + if (!containerClient.exists()) { + containerClient.create(); + } + + File dir = new File(localDirectory); + File[] files = dir.listFiles(File::isFile); + if (files == null || files.length == 0) { + throw new IllegalStateException( + "No training data files found under '" + dir.getAbsolutePath() + "'." + + " Set CONTENTUNDERSTANDING_TRAINING_DATA_LOCAL_DIR to a folder containing your label files."); + } + + String normalizedPrefix = (prefix == null || prefix.trim().isEmpty()) + ? null + : prefix.replaceAll("/+$", ""); + + for (File file : files) { + String blobName = normalizedPrefix == null + ? file.getName() + : normalizedPrefix + "/" + file.getName(); + System.out.println("Uploading " + file.getName() + " -> " + blobName); + containerClient.getBlobClient(blobName).uploadFromFile(file.getAbsolutePath(), true); + } + } + + /** + * Generates a User Delegation SAS URL (Read + List) for an Azure Blob container, using a + * {@link TokenCredential} so no storage account key is required. + * + * @param storageAccountName storage account name + * @param containerName container name + * @param credential credential to obtain a user delegation key + * @return container-scoped SAS URL valid for 1 hour + */ + public static String generateUserDelegationSasUrl( + String storageAccountName, + String containerName, + TokenCredential credential) { + BlobServiceClient blobServiceClient = new BlobServiceClientBuilder() + .endpoint("https://" + storageAccountName + ".blob.core.windows.net") + .credential(credential) + .buildClient(); + + OffsetDateTime startsOn = OffsetDateTime.now(); + OffsetDateTime expiresOn = startsOn.plusHours(1); + + UserDelegationKey userDelegationKey = blobServiceClient.getUserDelegationKey(startsOn, expiresOn); + + BlobContainerSasPermission permissions = new BlobContainerSasPermission() + .setReadPermission(true) + .setListPermission(true); + + BlobServiceSasSignatureValues sasValues = new BlobServiceSasSignatureValues(expiresOn, permissions) + .setStartTime(startsOn); + + BlobContainerClient containerClient = blobServiceClient.getBlobContainerClient(containerName); + String sasToken = containerClient.generateUserDelegationSas(sasValues, userDelegationKey); + + return "https://" + storageAccountName + ".blob.core.windows.net/" + containerName + "?" + sasToken; + } } diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabelsAsync.java b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabelsAsync.java index a16699fd9bf4..b01c0d0c9da5 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabelsAsync.java +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabelsAsync.java @@ -6,20 +6,35 @@ import com.azure.ai.contentunderstanding.ContentUnderstandingAsyncClient; import com.azure.ai.contentunderstanding.ContentUnderstandingClientBuilder; +import com.azure.ai.contentunderstanding.models.AnalysisInput; +import com.azure.ai.contentunderstanding.models.AnalysisResult; import com.azure.ai.contentunderstanding.models.ContentAnalyzer; import com.azure.ai.contentunderstanding.models.ContentAnalyzerConfig; +import com.azure.ai.contentunderstanding.models.ContentField; import com.azure.ai.contentunderstanding.models.ContentFieldDefinition; import com.azure.ai.contentunderstanding.models.ContentFieldSchema; import com.azure.ai.contentunderstanding.models.ContentFieldType; +import com.azure.ai.contentunderstanding.models.DocumentContent; import com.azure.ai.contentunderstanding.models.GenerationMethod; import com.azure.ai.contentunderstanding.models.KnowledgeSource; import com.azure.ai.contentunderstanding.models.LabeledDataKnowledgeSource; import com.azure.core.credential.AzureKeyCredential; +import com.azure.core.credential.TokenCredential; import com.azure.core.util.polling.PollerFlux; import com.azure.identity.DefaultAzureCredentialBuilder; +import com.azure.storage.blob.BlobContainerClient; +import com.azure.storage.blob.BlobContainerClientBuilder; +import com.azure.storage.blob.BlobServiceClient; +import com.azure.storage.blob.BlobServiceClientBuilder; +import com.azure.storage.blob.models.UserDelegationKey; +import com.azure.storage.blob.sas.BlobContainerSasPermission; +import com.azure.storage.blob.sas.BlobServiceSasSignatureValues; import reactor.core.publisher.Mono; +import java.io.File; +import java.time.OffsetDateTime; import java.util.ArrayList; +import java.util.Arrays; import java.util.HashMap; import java.util.List; import java.util.Map; @@ -34,24 +49,24 @@ * For an easier labeling workflow, use Azure AI Content Understanding Studio at * https://contentunderstanding.ai.azure.com/ * - * Labeled receipt data is available in this repo at {@code src/samples/resources/receipt_labels} - * (images and corresponding .labels.json files). To use it for training: + *

    Labeled receipt data is bundled in this repo at + * {@code src/samples/resources/receipt_labels} (images and corresponding {@code .labels.json} / + * {@code .result.json} files).

    * - *

    Manual instructions to upload labels into Azure Blob Storage:

    + *

    You can configure training data in two ways:

    *
      - *
    1. Create an Azure Blob Storage container (or use an existing one).
    2. - *
    3. Upload the contents of {@code src/samples/resources/receipt_labels} into the container. - * You may upload into the container root or into a subfolder (e.g., "receipt_labels/").
    4. - *
    5. Generate a SAS (Shared Access Signature) URL for the container with at least List and Read - * permissions. In Azure Portal: Storage account → Containers → your container → Shared access - * token; set expiry and permissions, then generate the SAS URL.
    6. - *
    7. Set {@code CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL} to the full SAS URL - * (e.g., https://<account>.blob.core.windows.net/<container>?sv=...&se=...).
    8. - *
    9. If you uploaded into a subfolder, set {@code CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX} to - * that path (e.g., "receipt_labels/"). If files are at the container root, omit the prefix - * or leave it unset.
    10. + *
    11. Option A — pre-generated SAS URL: upload the label files yourself and supply the + * container SAS URL via {@code CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL}.
    12. + *
    13. Option B — auto-upload via DefaultAzureCredential: set + * {@code CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT} and + * {@code CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER}; the sample uploads the bundled + * label files and generates a User Delegation SAS URL automatically. The signed-in + * identity must have Storage Blob Data Contributor on the container.
    14. *
    * + *

    If neither option is configured the sample runs in demo mode: it creates the + * analyzer without labeled data so you can still see the API surface and shape of the response.

    + * *

    Each labeled document in the training folder includes:

    *
      *
    • The original file (e.g., PDF or image).
    • @@ -68,12 +83,16 @@ * *

      Optional environment variables (for labeled training data):

      *
        - *
      • {@code CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL} – SAS URL for the Azure Blob container - * with labeled training data. If set, the analyzer is created with a labeled-data knowledge - * source; otherwise, created without training data.
      • + *
      • {@code CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL} – Option A: SAS URL for the Azure Blob + * container with labeled training data.
      • *
      • {@code CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX} – Path prefix within the container - * (e.g., "receipt_labels/" or "CreateAnalyzerWithLabels/"). Omit or leave unset if files - * are at the container root.
      • + * (e.g., "receipt_labels/"). Omit if files are at the container root. + *
      • {@code CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT} – Option B: storage account + * name (without {@code .blob.core.windows.net}).
      • + *
      • {@code CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER} – Option B: container name (created + * if missing).
      • + *
      • {@code CONTENTUNDERSTANDING_TRAINING_DATA_LOCAL_DIR} – Option B: local directory of label + * files to upload (defaults to {@code src/samples/resources/receipt_labels}).
      • *
      */ public class Sample16_CreateAnalyzerWithLabelsAsync { @@ -82,9 +101,15 @@ public static void main(String[] args) throws InterruptedException { // BEGIN: com.azure.ai.contentunderstanding.sample16Async.buildClient String endpoint = System.getenv("CONTENTUNDERSTANDING_ENDPOINT"); String key = System.getenv("CONTENTUNDERSTANDING_KEY"); - String sasUrl = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL"); + + // Option A: pre-generated SAS URL with Read + List permissions + String sasUrlEnv = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL"); String sasUrlPrefix = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX"); + // Option B: auto-upload local label files and generate a User Delegation SAS URL + String storageAccount = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT"); + String containerName = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER"); + // Build the async client with appropriate authentication ContentUnderstandingClientBuilder builder = new ContentUnderstandingClientBuilder().endpoint(endpoint); @@ -150,31 +175,56 @@ public static void main(String[] args) throws InterruptedException { itemsField.setItemDefinition(itemDefinition); fields.put("Items", itemsField); - // Total field - ContentFieldDefinition totalField = new ContentFieldDefinition(); - totalField.setType(ContentFieldType.STRING); - totalField.setMethod(GenerationMethod.EXTRACT); - totalField.setDescription("Total amount"); - fields.put("Total", totalField); + // TotalPrice field + ContentFieldDefinition totalPriceField = new ContentFieldDefinition(); + totalPriceField.setType(ContentFieldType.STRING); + totalPriceField.setMethod(GenerationMethod.EXTRACT); + totalPriceField.setDescription("Total amount"); + fields.put("TotalPrice", totalPriceField); ContentFieldSchema fieldSchema = new ContentFieldSchema(); fieldSchema.setName("receipt_schema"); fieldSchema.setDescription("Schema for receipt extraction with items"); fieldSchema.setFields(fields); - // Step 2: Create labeled data knowledge source (optional, based on environment variable) + // Step 2: Resolve the training-data SAS URL. + // Option A — pre-generated SAS URL was already read above. + // Option B — if Option A is not set but a storage account + container are configured, + // upload the bundled label files and generate a User Delegation SAS URL. + String sasUrl = sasUrlEnv; + if ((sasUrl == null || sasUrl.trim().isEmpty()) + && storageAccount != null && !storageAccount.trim().isEmpty() + && containerName != null && !containerName.trim().isEmpty()) { + TokenCredential credential = new DefaultAzureCredentialBuilder().build(); + String localLabelDir = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_LOCAL_DIR"); + if (localLabelDir == null || localLabelDir.trim().isEmpty()) { + localLabelDir = "src/samples/resources/receipt_labels"; + } + localLabelDir = Sample16_CreateAnalyzerWithLabels.resolveLocalLabelDir(localLabelDir); + uploadTrainingData(storageAccount, containerName, credential, localLabelDir, sasUrlPrefix); + sasUrl = generateUserDelegationSasUrl(storageAccount, containerName, credential); + } + final String resolvedSasUrl = sasUrl; + + // Step 3: Create knowledge source from labeled data (if available) List knowledgeSources = new ArrayList<>(); - if (sasUrl != null && !sasUrl.trim().isEmpty()) { + if (resolvedSasUrl != null && !resolvedSasUrl.trim().isEmpty()) { LabeledDataKnowledgeSource knowledgeSource = new LabeledDataKnowledgeSource() - .setContainerUrl(sasUrl) - .setPrefix(sasUrlPrefix); + .setContainerUrl(resolvedSasUrl); + if (sasUrlPrefix != null && !sasUrlPrefix.trim().isEmpty()) { + knowledgeSource.setPrefix(sasUrlPrefix); + } knowledgeSources.add(knowledgeSource); - System.out.println("Using labeled training data from: " + sasUrl.substring(0, Math.min(50, sasUrl.length())) + "..."); + System.out.println("Using labeled training data from: " + + resolvedSasUrl.substring(0, Math.min(50, resolvedSasUrl.length())) + "..."); } else { - System.out.println("No CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL set, creating analyzer without labeled training data"); + System.out.println("DEMO MODE: no training data configured. The analyzer will be created without labeled data."); + System.out.println(" Set CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL (Option A), or both"); + System.out.println(" CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT and CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER (Option B),"); + System.out.println(" to fully exercise the labeled-data API path."); } - // Step 3: Create analyzer (with or without labeled data) + // Step 4: Create analyzer (with or without labeled data) Map models = new HashMap<>(); models.put("completion", "gpt-4.1"); models.put("embedding", "text-embedding-3-large"); @@ -213,6 +263,8 @@ public static void main(String[] args) throws InterruptedException { System.out.println(" Description: " + result.getDescription()); System.out.println(" Base analyzer: " + result.getBaseAnalyzerId()); System.out.println(" Fields: " + result.getFieldSchema().getFields().size()); + System.out.println(" Knowledge srcs: " + + (result.getKnowledgeSources() == null ? 0 : result.getKnowledgeSources().size())); // END: com.azure.ai.contentunderstanding.createAnalyzerWithLabelsAsync // Verify analyzer creation @@ -225,27 +277,79 @@ public static void main(String[] args) throws InterruptedException { System.out.println(" MerchantName: String (Extract)"); System.out.println(" Items: Array of Objects (Generate)"); System.out.println(" - Quantity, Name, Price"); - System.out.println(" Total: String (Extract)"); + System.out.println(" TotalPrice: String (Extract)"); ContentFieldDefinition itemsFieldResult = resultFields.get("Items"); System.out.println("Items field verified:"); System.out.println(" Type: " + itemsFieldResult.getType()); System.out.println(" Item properties: " + itemsFieldResult.getItemDefinition().getProperties().size()); - + }) + .flatMap(result -> { + // If training data was provided, test the analyzer with a sample document. + if (resolvedSasUrl != null && !resolvedSasUrl.trim().isEmpty()) { + System.out.println("\nTesting analyzer with sample document..."); + String testDocUrl + = "https://github.com/Azure-Samples/cognitive-services-REST-api-samples/raw/master/curl/form-recognizer/sample-invoice.pdf"; + + AnalysisInput input = new AnalysisInput(); + input.setUrl(testDocUrl); + + return client.beginAnalyze(finalAnalyzerId, Arrays.asList(input)) + .last() + .flatMap(pollResponse -> { + if (pollResponse.getStatus().isComplete()) { + return pollResponse.getFinalResult(); + } else { + return Mono.error(new RuntimeException( + "Polling completed unsuccessfully with status: " + pollResponse.getStatus())); + } + }) + .doOnNext(analyzeResult -> { + System.out.println("Analysis completed!"); + if (analyzeResult.getContents() != null + && !analyzeResult.getContents().isEmpty() + && analyzeResult.getContents().get(0) instanceof DocumentContent) { + DocumentContent docContent = (DocumentContent) analyzeResult.getContents().get(0); + System.out.println("Extracted fields: " + docContent.getFields().size()); + + if (docContent.getFields().containsKey("MerchantName")) { + ContentField merchantField = docContent.getFields().get("MerchantName"); + if (merchantField != null && merchantField.getValue() instanceof String) { + System.out.println(" MerchantName: " + merchantField.getValue()); + } + } + if (docContent.getFields().containsKey("TotalPrice")) { + ContentField totalField = docContent.getFields().get("TotalPrice"); + if (totalField != null && totalField.getValue() instanceof String) { + System.out.println(" TotalPrice: " + totalField.getValue()); + } + } + } + }) + .thenReturn(result); + } + return Mono.just(result); + }) + .doOnNext(result -> { // Display API pattern information System.out.println("\nCreateAnalyzerWithLabels API Pattern:"); System.out.println(" 1. Define field schema with nested structures (arrays, objects)"); System.out.println(" 2. Upload training data to Azure Blob Storage:"); - System.out.println(" - Documents: receipt1.pdf, receipt2.pdf, ..."); - System.out.println(" - Labels: receipt1.pdf.labels.json, receipt2.pdf.labels.json, ..."); - System.out.println(" - OCR: receipt1.pdf.result.json, receipt2.pdf.result.json, ..."); + System.out.println(" - Documents: receipt1.jpg, receipt2.jpg, ..."); + System.out.println(" - Labels: receipt1.jpg.labels.json, receipt2.jpg.labels.json, ..."); + System.out.println(" - OCR: receipt1.jpg.result.json, receipt2.jpg.result.json, ..."); System.out.println(" 3. Create LabeledDataKnowledgeSource with storage SAS URL"); System.out.println(" 4. Create analyzer with field schema and knowledge sources"); System.out.println(" 5. Use analyzer for document analysis"); System.out.println("\nCreateAnalyzerWithLabels pattern demonstration completed"); - System.out.println(" Note: This sample demonstrates the API pattern."); - System.out.println(" For actual training, provide CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL with labeled data."); + if (resolvedSasUrl == null || resolvedSasUrl.trim().isEmpty()) { + System.out.println(" Note: This sample demonstrates the API pattern."); + System.out.println( + " For actual training, provide CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL (Option A)"); + System.out.println( + " or CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT + ..._CONTAINER (Option B)."); + } }) .doFinally(signalType -> { // Cleanup using reactive pattern @@ -275,4 +379,87 @@ public static void main(String[] args) throws InterruptedException { // Wait for async operations to complete latch.await(3, TimeUnit.MINUTES); } + + /** + * Uploads local training data files (images, .labels.json, .result.json) to an Azure Blob + * container. Existing blobs with the same name are overwritten. + * + * @param storageAccountName storage account name (no {@code .blob.core.windows.net} suffix) + * @param containerName container name (created if it does not exist) + * @param credential credential with write access to the container + * @param localDirectory local folder containing the label files + * @param prefix optional blob prefix (virtual folder) to prepend, e.g. {@code "receipt_labels/"} + */ + public static void uploadTrainingData( + String storageAccountName, + String containerName, + TokenCredential credential, + String localDirectory, + String prefix) { + BlobContainerClient containerClient = new BlobContainerClientBuilder() + .endpoint("https://" + storageAccountName + ".blob.core.windows.net") + .containerName(containerName) + .credential(credential) + .buildClient(); + + if (!containerClient.exists()) { + containerClient.create(); + } + + File dir = new File(localDirectory); + File[] files = dir.listFiles(File::isFile); + if (files == null || files.length == 0) { + throw new IllegalStateException( + "No training data files found under '" + dir.getAbsolutePath() + "'." + + " Set CONTENTUNDERSTANDING_TRAINING_DATA_LOCAL_DIR to a folder containing your label files."); + } + + String normalizedPrefix = (prefix == null || prefix.trim().isEmpty()) + ? null + : prefix.replaceAll("/+$", ""); + + for (File file : files) { + String blobName = normalizedPrefix == null + ? file.getName() + : normalizedPrefix + "/" + file.getName(); + System.out.println("Uploading " + file.getName() + " -> " + blobName); + containerClient.getBlobClient(blobName).uploadFromFile(file.getAbsolutePath(), true); + } + } + + /** + * Generates a User Delegation SAS URL (Read + List) for an Azure Blob container, using a + * {@link TokenCredential} so no storage account key is required. + * + * @param storageAccountName storage account name + * @param containerName container name + * @param credential credential to obtain a user delegation key + * @return container-scoped SAS URL valid for 1 hour + */ + public static String generateUserDelegationSasUrl( + String storageAccountName, + String containerName, + TokenCredential credential) { + BlobServiceClient blobServiceClient = new BlobServiceClientBuilder() + .endpoint("https://" + storageAccountName + ".blob.core.windows.net") + .credential(credential) + .buildClient(); + + OffsetDateTime startsOn = OffsetDateTime.now(); + OffsetDateTime expiresOn = startsOn.plusHours(1); + + UserDelegationKey userDelegationKey = blobServiceClient.getUserDelegationKey(startsOn, expiresOn); + + BlobContainerSasPermission permissions = new BlobContainerSasPermission() + .setReadPermission(true) + .setListPermission(true); + + BlobServiceSasSignatureValues sasValues = new BlobServiceSasSignatureValues(expiresOn, permissions) + .setStartTime(startsOn); + + BlobContainerClient containerClient = blobServiceClient.getBlobContainerClient(containerName); + String sasToken = containerClient.generateUserDelegationSas(sasValues, userDelegationKey); + + return "https://" + storageAccountName + ".blob.core.windows.net/" + containerName + "?" + sasToken; + } } diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample16_CreateAnalyzerWithLabelsAsyncTest.java b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample16_CreateAnalyzerWithLabelsAsyncTest.java index a8ff69adff65..86f1f794cad4 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample16_CreateAnalyzerWithLabelsAsyncTest.java +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample16_CreateAnalyzerWithLabelsAsyncTest.java @@ -16,7 +16,10 @@ import com.azure.ai.contentunderstanding.models.GenerationMethod; import com.azure.ai.contentunderstanding.models.KnowledgeSource; import com.azure.ai.contentunderstanding.models.LabeledDataKnowledgeSource; +import com.azure.ai.contentunderstanding.samples.Sample16_CreateAnalyzerWithLabels; +import com.azure.core.credential.TokenCredential; import com.azure.core.util.polling.PollerFlux; +import com.azure.identity.DefaultAzureCredentialBuilder; import reactor.core.publisher.Mono; import org.junit.jupiter.api.Test; @@ -39,24 +42,18 @@ * For an easier labeling workflow, use Azure AI Content Understanding Studio at * https://contentunderstanding.ai.azure.com/ * - * Labeled receipt data is available in this repo at {@code src/samples/resources/receipt_labels}. - * For LIVE mode with real training data: upload that folder to Azure Blob Storage, generate a - * container SAS URL with List/Read permissions, then set the environment variables below. Use - * {@code CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX} if you uploaded into a subfolder - * (e.g., "receipt_labels/"); omit or leave unset if files are at the container root. - * - *

      Required environment variables:

      + *

      Labeled receipt data is bundled at {@code src/samples/resources/receipt_labels}. To use it + * for training in LIVE / RECORD modes, choose one of:

      *
        - *
      • {@code CONTENTUNDERSTANDING_ENDPOINT} – Azure Content Understanding endpoint URL
      • + *
      • Option A: provide a pre-generated container SAS URL via + * {@code CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL}.
      • + *
      • Option B: set {@code CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT} and + * {@code CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER}; the test will upload the bundled + * label files via DefaultAzureCredential and generate a User Delegation SAS URL.
      • *
      * - *

      Optional environment variables (for labeled training data; used in LIVE mode):

      - *
        - *
      • {@code CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL} – SAS URL for the Azure Blob container - * with labeled training data.
      • - *
      • {@code CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX} – Path prefix within the container - * (e.g., "receipt_labels/"). Omit or leave unset if files are at the container root.
      • - *
      + *

      Use {@code CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX} if files live in a subfolder + * (e.g., "receipt_labels/"); omit if files are at the container root.

      */ public class Sample16_CreateAnalyzerWithLabelsAsyncTest extends ContentUnderstandingClientTestBase { @@ -71,10 +68,33 @@ public class Sample16_CreateAnalyzerWithLabelsAsyncTest extends ContentUnderstan public void testCreateAnalyzerWithLabelsAsync() { String analyzerId = testResourceNamer.randomName("test_receipt_analyzer_", 50); - // In PLAYBACK mode, use a placeholder URL to ensure consistent test behavior - String trainingDataSasUrl = getTestMode() == TestMode.PLAYBACK - ? "https://placeholder.blob.core.windows.net/container?sv=placeholder" - : System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL"); + // Resolve the training-data SAS URL. + // PLAYBACK uses a placeholder so the recorded request body matches. + // RECORD / LIVE: try Option A (SAS URL env), then Option B (storage account + container env). + String trainingDataSasUrl; + if (getTestMode() == TestMode.PLAYBACK) { + trainingDataSasUrl = "https://placeholder.blob.core.windows.net/container?sv=placeholder"; + } else { + trainingDataSasUrl = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL"); + String storageAccount = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT"); + String container = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER"); + if ((trainingDataSasUrl == null || trainingDataSasUrl.trim().isEmpty()) + && storageAccount != null + && !storageAccount.trim().isEmpty() + && container != null + && !container.trim().isEmpty()) { + TokenCredential credential = new DefaultAzureCredentialBuilder().build(); + String localLabelDir = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_LOCAL_DIR"); + if (localLabelDir == null || localLabelDir.trim().isEmpty()) { + localLabelDir = "src/samples/resources/receipt_labels"; + } + String trainingDataPrefixForUpload = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX"); + Sample16_CreateAnalyzerWithLabels.uploadTrainingData(storageAccount, container, credential, + localLabelDir, trainingDataPrefixForUpload); + trainingDataSasUrl = Sample16_CreateAnalyzerWithLabels.generateUserDelegationSasUrl(storageAccount, + container, credential); + } + } // Save prefix in test proxy variable during RECORD, load back during PLAYBACK so request bodies match. String trainingDataPrefix; if (getTestMode() == TestMode.PLAYBACK) { @@ -135,12 +155,12 @@ public void testCreateAnalyzerWithLabelsAsync() { itemsField.setItemDefinition(itemDefinition); fields.put("Items", itemsField); - // Total field - ContentFieldDefinition totalField = new ContentFieldDefinition(); - totalField.setType(ContentFieldType.STRING); - totalField.setMethod(GenerationMethod.EXTRACT); - totalField.setDescription("Total amount"); - fields.put("Total", totalField); + // TotalPrice field + ContentFieldDefinition totalPriceField = new ContentFieldDefinition(); + totalPriceField.setType(ContentFieldType.STRING); + totalPriceField.setMethod(GenerationMethod.EXTRACT); + totalPriceField.setDescription("Total amount"); + fields.put("TotalPrice", totalPriceField); ContentFieldSchema fieldSchema = new ContentFieldSchema(); fieldSchema.setName("receipt_schema"); @@ -159,7 +179,12 @@ public void testCreateAnalyzerWithLabelsAsync() { System.out.println("Using labeled training data from: " + trainingDataSasUrl.substring(0, Math.min(50, trainingDataSasUrl.length())) + "..."); } else { - System.out.println("No TRAINING_DATA_SAS_URL set, creating analyzer without labeled training data"); + System.out.println( + "DEMO MODE: no training data configured. The analyzer will be created without labeled data."); + System.out.println(" Set CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL (Option A), or both"); + System.out.println( + " CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT and CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER (Option B),"); + System.out.println(" to fully exercise the labeled-data API path."); } // Step 3: Create analyzer (with or without labeled data) @@ -195,6 +220,8 @@ public void testCreateAnalyzerWithLabelsAsync() { System.out.println(" Description: " + result.getDescription()); System.out.println(" Base analyzer: " + result.getBaseAnalyzerId()); System.out.println(" Fields: " + result.getFieldSchema().getFields().size()); + System.out.println(" Knowledge srcs: " + + (result.getKnowledgeSources() == null ? 0 : result.getKnowledgeSources().size())); // END: com.azure.ai.contentunderstanding.createAnalyzerWithLabelsAsync // BEGIN: Assertion_ContentUnderstandingCreateAnalyzerWithLabelsAsync @@ -212,7 +239,7 @@ public void testCreateAnalyzerWithLabelsAsync() { Map resultFields = result.getFieldSchema().getFields(); assertTrue(resultFields.containsKey("MerchantName"), "Should have MerchantName field"); assertTrue(resultFields.containsKey("Items"), "Should have Items field"); - assertTrue(resultFields.containsKey("Total"), "Should have Total field"); + assertTrue(resultFields.containsKey("TotalPrice"), "Should have TotalPrice field"); ContentFieldDefinition itemsFieldResult = resultFields.get("Items"); assertEquals(ContentFieldType.ARRAY, itemsFieldResult.getType()); @@ -223,7 +250,7 @@ public void testCreateAnalyzerWithLabelsAsync() { System.out.println(" MerchantName: String (Extract)"); System.out.println(" Items: Array of Objects (Generate)"); System.out.println(" - Quantity, Name, Price"); - System.out.println(" Total: String (Extract)"); + System.out.println(" TotalPrice: String (Extract)"); // END: Assertion_ContentUnderstandingCreateAnalyzerWithLabelsAsync // If training data was provided, test the analyzer with a sample document @@ -265,11 +292,11 @@ public void testCreateAnalyzerWithLabelsAsync() { System.out.println(" MerchantName: " + merchantName); } } - if (docContent.getFields().containsKey("Total")) { - ContentField totalFieldValue = docContent.getFields().get("Total"); + if (docContent.getFields().containsKey("TotalPrice")) { + ContentField totalFieldValue = docContent.getFields().get("TotalPrice"); if (totalFieldValue != null) { String total = (String) totalFieldValue.getValue(); - System.out.println(" Total: " + total); + System.out.println(" TotalPrice: " + total); } } } @@ -279,9 +306,9 @@ public void testCreateAnalyzerWithLabelsAsync() { System.out.println("\nCreateAnalyzerWithLabels API Pattern:"); System.out.println(" 1. Define field schema with nested structures (arrays, objects)"); System.out.println(" 2. Upload training data to Azure Blob Storage:"); - System.out.println(" - Documents: receipt1.pdf, receipt2.pdf, ..."); - System.out.println(" - Labels: receipt1.pdf.labels.json, receipt2.pdf.labels.json, ..."); - System.out.println(" - OCR: receipt1.pdf.result.json, receipt2.pdf.result.json, ..."); + System.out.println(" - Documents: receipt1.jpg, receipt2.jpg, ..."); + System.out.println(" - Labels: receipt1.jpg.labels.json, receipt2.jpg.labels.json, ..."); + System.out.println(" - OCR: receipt1.jpg.result.json, receipt2.jpg.result.json, ..."); System.out.println(" 3. Create LabeledDataKnowledgeSource with storage SAS URL"); System.out.println(" 4. Create analyzer with field schema and knowledge sources"); System.out.println(" 5. Use analyzer for document analysis"); @@ -289,8 +316,10 @@ public void testCreateAnalyzerWithLabelsAsync() { System.out.println("\nCreateAnalyzerWithLabels pattern demonstration completed"); if (trainingDataSasUrl == null || trainingDataSasUrl.trim().isEmpty()) { System.out.println(" Note: This sample demonstrates the API pattern."); - System.out.println( - " For actual training, provide CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL with labeled data."); + System.out + .println(" For actual training, provide CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL (Option A)"); + System.out + .println(" or CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT + ..._CONTAINER (Option B)."); } } finally { diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample16_CreateAnalyzerWithLabelsTest.java b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample16_CreateAnalyzerWithLabelsTest.java index 9a7813712e7c..ed51bc6d960f 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample16_CreateAnalyzerWithLabelsTest.java +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample16_CreateAnalyzerWithLabelsTest.java @@ -16,7 +16,10 @@ import com.azure.ai.contentunderstanding.models.GenerationMethod; import com.azure.ai.contentunderstanding.models.KnowledgeSource; import com.azure.ai.contentunderstanding.models.LabeledDataKnowledgeSource; +import com.azure.ai.contentunderstanding.samples.Sample16_CreateAnalyzerWithLabels; +import com.azure.core.credential.TokenCredential; import com.azure.core.util.polling.SyncPoller; +import com.azure.identity.DefaultAzureCredentialBuilder; import org.junit.jupiter.api.Test; import com.azure.core.test.TestMode; @@ -38,24 +41,18 @@ * For an easier labeling workflow, use Azure AI Content Understanding Studio at * https://contentunderstanding.ai.azure.com/ * - * Labeled receipt data is available in this repo at {@code src/samples/resources/receipt_labels}. - * For LIVE mode with real training data: upload that folder to Azure Blob Storage, generate a - * container SAS URL with List/Read permissions, then set the environment variables below. Use - * {@code CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX} if you uploaded into a subfolder - * (e.g., "receipt_labels/"); omit or leave unset if files are at the container root. - * - *

      Required environment variables:

      + *

      Labeled receipt data is bundled at {@code src/samples/resources/receipt_labels}. To use it + * for training in LIVE / RECORD modes, choose one of:

      *
        - *
      • {@code CONTENTUNDERSTANDING_ENDPOINT} – Azure Content Understanding endpoint URL
      • + *
      • Option A: provide a pre-generated container SAS URL via + * {@code CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL}.
      • + *
      • Option B: set {@code CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT} and + * {@code CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER}; the test will upload the bundled + * label files via DefaultAzureCredential and generate a User Delegation SAS URL.
      • *
      * - *

      Optional environment variables (for labeled training data; used in LIVE mode):

      - *
        - *
      • {@code CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL} – SAS URL for the Azure Blob container - * with labeled training data.
      • - *
      • {@code CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX} – Path prefix within the container - * (e.g., "receipt_labels/"). Omit or leave unset if files are at the container root.
      • - *
      + *

      Use {@code CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX} if files live in a subfolder + * (e.g., "receipt_labels/"); omit if files are at the container root.

      */ public class Sample16_CreateAnalyzerWithLabelsTest extends ContentUnderstandingClientTestBase { @@ -70,10 +67,33 @@ public class Sample16_CreateAnalyzerWithLabelsTest extends ContentUnderstandingC public void testCreateAnalyzerWithLabels() { String analyzerId = testResourceNamer.randomName("test_receipt_analyzer_", 50); - // In PLAYBACK mode, use a placeholder URL to ensure consistent test behavior - String trainingDataSasUrl = getTestMode() == TestMode.PLAYBACK - ? "https://placeholder.blob.core.windows.net/container?sv=placeholder" - : System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL"); + // Resolve the training-data SAS URL. + // PLAYBACK uses a placeholder so the recorded request body matches. + // RECORD / LIVE: try Option A (SAS URL env), then Option B (storage account + container env). + String trainingDataSasUrl; + if (getTestMode() == TestMode.PLAYBACK) { + trainingDataSasUrl = "https://placeholder.blob.core.windows.net/container?sv=placeholder"; + } else { + trainingDataSasUrl = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL"); + String storageAccount = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT"); + String container = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER"); + if ((trainingDataSasUrl == null || trainingDataSasUrl.trim().isEmpty()) + && storageAccount != null + && !storageAccount.trim().isEmpty() + && container != null + && !container.trim().isEmpty()) { + TokenCredential credential = new DefaultAzureCredentialBuilder().build(); + String localLabelDir = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_LOCAL_DIR"); + if (localLabelDir == null || localLabelDir.trim().isEmpty()) { + localLabelDir = "src/samples/resources/receipt_labels"; + } + String trainingDataPrefixForUpload = System.getenv("CONTENTUNDERSTANDING_TRAINING_DATA_PREFIX"); + Sample16_CreateAnalyzerWithLabels.uploadTrainingData(storageAccount, container, credential, + localLabelDir, trainingDataPrefixForUpload); + trainingDataSasUrl = Sample16_CreateAnalyzerWithLabels.generateUserDelegationSasUrl(storageAccount, + container, credential); + } + } // Save prefix in test proxy variable during RECORD, load back during PLAYBACK so request bodies match. String trainingDataPrefix; if (getTestMode() == TestMode.PLAYBACK) { @@ -134,12 +154,12 @@ public void testCreateAnalyzerWithLabels() { itemsField.setItemDefinition(itemDefinition); fields.put("Items", itemsField); - // Total field - ContentFieldDefinition totalField = new ContentFieldDefinition(); - totalField.setType(ContentFieldType.STRING); - totalField.setMethod(GenerationMethod.EXTRACT); - totalField.setDescription("Total amount"); - fields.put("Total", totalField); + // TotalPrice field + ContentFieldDefinition totalPriceField = new ContentFieldDefinition(); + totalPriceField.setType(ContentFieldType.STRING); + totalPriceField.setMethod(GenerationMethod.EXTRACT); + totalPriceField.setDescription("Total amount"); + fields.put("TotalPrice", totalPriceField); ContentFieldSchema fieldSchema = new ContentFieldSchema(); fieldSchema.setName("receipt_schema"); @@ -158,7 +178,12 @@ public void testCreateAnalyzerWithLabels() { System.out.println("Using labeled training data from: " + trainingDataSasUrl.substring(0, Math.min(50, trainingDataSasUrl.length())) + "..."); } else { - System.out.println("No TRAINING_DATA_SAS_URL set, creating analyzer without labeled training data"); + System.out.println( + "DEMO MODE: no training data configured. The analyzer will be created without labeled data."); + System.out.println(" Set CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL (Option A), or both"); + System.out.println( + " CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT and CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER (Option B),"); + System.out.println(" to fully exercise the labeled-data API path."); } // Step 3: Create analyzer (with or without labeled data) @@ -184,6 +209,8 @@ public void testCreateAnalyzerWithLabels() { System.out.println(" Description: " + result.getDescription()); System.out.println(" Base analyzer: " + result.getBaseAnalyzerId()); System.out.println(" Fields: " + result.getFieldSchema().getFields().size()); + System.out.println(" Knowledge srcs: " + + (result.getKnowledgeSources() == null ? 0 : result.getKnowledgeSources().size())); // END: com.azure.ai.contentunderstanding.createAnalyzerWithLabels // BEGIN: Assertion_ContentUnderstandingCreateAnalyzerWithLabels @@ -201,7 +228,7 @@ public void testCreateAnalyzerWithLabels() { Map resultFields = result.getFieldSchema().getFields(); assertTrue(resultFields.containsKey("MerchantName"), "Should have MerchantName field"); assertTrue(resultFields.containsKey("Items"), "Should have Items field"); - assertTrue(resultFields.containsKey("Total"), "Should have Total field"); + assertTrue(resultFields.containsKey("TotalPrice"), "Should have TotalPrice field"); ContentFieldDefinition itemsFieldResult = resultFields.get("Items"); assertEquals(ContentFieldType.ARRAY, itemsFieldResult.getType()); @@ -212,7 +239,7 @@ public void testCreateAnalyzerWithLabels() { System.out.println(" MerchantName: String (Extract)"); System.out.println(" Items: Array of Objects (Generate)"); System.out.println(" - Quantity, Name, Price"); - System.out.println(" Total: String (Extract)"); + System.out.println(" TotalPrice: String (Extract)"); // END: Assertion_ContentUnderstandingCreateAnalyzerWithLabels // If training data was provided, test the analyzer with a sample document @@ -244,11 +271,11 @@ public void testCreateAnalyzerWithLabels() { System.out.println(" MerchantName: " + merchantName); } } - if (docContent.getFields().containsKey("Total")) { - ContentField totalFieldValue = docContent.getFields().get("Total"); + if (docContent.getFields().containsKey("TotalPrice")) { + ContentField totalFieldValue = docContent.getFields().get("TotalPrice"); if (totalFieldValue != null) { String total = (String) totalFieldValue.getValue(); - System.out.println(" Total: " + total); + System.out.println(" TotalPrice: " + total); } } } @@ -258,9 +285,9 @@ public void testCreateAnalyzerWithLabels() { System.out.println("\nCreateAnalyzerWithLabels API Pattern:"); System.out.println(" 1. Define field schema with nested structures (arrays, objects)"); System.out.println(" 2. Upload training data to Azure Blob Storage:"); - System.out.println(" - Documents: receipt1.pdf, receipt2.pdf, ..."); - System.out.println(" - Labels: receipt1.pdf.labels.json, receipt2.pdf.labels.json, ..."); - System.out.println(" - OCR: receipt1.pdf.result.json, receipt2.pdf.result.json, ..."); + System.out.println(" - Documents: receipt1.jpg, receipt2.jpg, ..."); + System.out.println(" - Labels: receipt1.jpg.labels.json, receipt2.jpg.labels.json, ..."); + System.out.println(" - OCR: receipt1.jpg.result.json, receipt2.jpg.result.json, ..."); System.out.println(" 3. Create LabeledDataKnowledgeSource with storage SAS URL"); System.out.println(" 4. Create analyzer with field schema and knowledge sources"); System.out.println(" 5. Use analyzer for document analysis"); @@ -268,8 +295,10 @@ public void testCreateAnalyzerWithLabels() { System.out.println("\nCreateAnalyzerWithLabels pattern demonstration completed"); if (trainingDataSasUrl == null || trainingDataSasUrl.trim().isEmpty()) { System.out.println(" Note: This sample demonstrates the API pattern."); - System.out.println( - " For actual training, provide CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL with labeled data."); + System.out + .println(" For actual training, provide CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL (Option A)"); + System.out + .println(" or CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT + ..._CONTAINER (Option B)."); } } finally { From 691bde4dbeb84328d81067f6bef962cf7ea24889 Mon Sep 17 00:00:00 2001 From: Changjian Wang Date: Sat, 9 May 2026 10:44:53 +0800 Subject: [PATCH 17/19] Bump azure-storage-blob to 12.33.4 to satisfy version_client.txt After merging main, eng/versioning/version_client.txt expects azure-storage-blob 12.33.4 but pom.xml still pinned 12.33.3, causing the 'Verify versions in POM files' pipeline step to fail. --- sdk/contentunderstanding/azure-ai-contentunderstanding/pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/pom.xml b/sdk/contentunderstanding/azure-ai-contentunderstanding/pom.xml index 45d362d94d5b..920446ab1d72 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/pom.xml +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/pom.xml @@ -78,7 +78,7 @@ com.azure azure-storage-blob - 12.33.3 + 12.33.4 From 32347f7ff15fe37e53189ee9346c5143fe392a45 Mon Sep 17 00:00:00 2001 From: Changjian Wang Date: Sat, 9 May 2026 10:59:59 +0800 Subject: [PATCH 18/19] Fix cspell error: rename 'Knowledge srcs' to 'Knowledge sources' cspell flagged 'srcs' as an unknown word in 4 files. Replaced 'Knowledge srcs:' with 'Knowledge sources:' in the printed label across the Sample16 sync/async samples (and matching tests for consistency), the run_sample.sh DEMO MODE message, and the SKILL.md prompt. The label now matches the underlying API name (getKnowledgeSources). --- .../.github/skills/cu-sdk-sample-run/SKILL.md | 2 +- .../.github/skills/cu-sdk-sample-run/scripts/run_sample.sh | 2 +- .../samples/Sample16_CreateAnalyzerWithLabels.java | 2 +- .../samples/Sample16_CreateAnalyzerWithLabelsAsync.java | 2 +- .../samples/Sample16_CreateAnalyzerWithLabelsAsyncTest.java | 2 +- .../tests/samples/Sample16_CreateAnalyzerWithLabelsTest.java | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md index ecf80d5593e6..9f79b8b641f7 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md @@ -385,7 +385,7 @@ The repo ships labeled receipt training data at `src/samples/resources/receipt_l > **must** ask the user the questions below and act on the answer: > > 1. "Do you want to **train with labeled data** (recommended), or **create the analyzer without training data** (demo mode)?" -> - If **demo mode**: confirm explicitly — "I will run Sample16 *without* training data. The output will say `Knowledge srcs: 0` and you will see a `DEMO MODE` banner. The labeled-data API path will **not** be exercised. OK to proceed?" Only continue after the user says yes; leave both Option A and Option B env vars empty/unset. +> - If **demo mode**: confirm explicitly — "I will run Sample16 *without* training data. The output will say `Knowledge sources: 0` and you will see a `DEMO MODE` banner. The labeled-data API path will **not** be exercised. OK to proceed?" Only continue after the user says yes; leave both Option A and Option B env vars empty/unset. > - If **with training data**: continue with one of the next two questions. > 2. "Will you use **Option A (pre-generated SAS URL)** or **Option B (auto-upload via `DefaultAzureCredential`)**?" > - **Option A**: ask for the SAS URL and (optionally) prefix; walk through the manual-upload steps above if not yet done. diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh index 2084f89e3d4b..8e9ec85441bd 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh @@ -226,7 +226,7 @@ if [[ "$SAMPLE_NAME" == Sample16* ]]; then if [[ -z "${CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT:-}" \ || -z "${CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER:-}" ]]; then print_warning "⚠ DEMO MODE: no training data configured for $SAMPLE_NAME." - echo " The analyzer will be created without labeled data ('Knowledge srcs: 0')." + echo " The analyzer will be created without labeled data ('Knowledge sources: 0')." echo " To exercise the labeled-data API path, configure ONE of:" echo " Option A: CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL=" echo " Option B: CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT=" diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabels.java b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabels.java index 6bc9888743b3..74ebb4e8abb9 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabels.java +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabels.java @@ -245,7 +245,7 @@ public static void main(String[] args) { System.out.println(" Description: " + result.getDescription()); System.out.println(" Base analyzer: " + result.getBaseAnalyzerId()); System.out.println(" Fields: " + result.getFieldSchema().getFields().size()); - System.out.println(" Knowledge srcs: " + System.out.println(" Knowledge sources: " + (result.getKnowledgeSources() == null ? 0 : result.getKnowledgeSources().size())); // END: com.azure.ai.contentunderstanding.createAnalyzerWithLabels diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabelsAsync.java b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabelsAsync.java index b01c0d0c9da5..4767b0ab27a3 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabelsAsync.java +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabelsAsync.java @@ -263,7 +263,7 @@ public static void main(String[] args) throws InterruptedException { System.out.println(" Description: " + result.getDescription()); System.out.println(" Base analyzer: " + result.getBaseAnalyzerId()); System.out.println(" Fields: " + result.getFieldSchema().getFields().size()); - System.out.println(" Knowledge srcs: " + System.out.println(" Knowledge sources: " + (result.getKnowledgeSources() == null ? 0 : result.getKnowledgeSources().size())); // END: com.azure.ai.contentunderstanding.createAnalyzerWithLabelsAsync diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample16_CreateAnalyzerWithLabelsAsyncTest.java b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample16_CreateAnalyzerWithLabelsAsyncTest.java index 86f1f794cad4..60cf5a04ad56 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample16_CreateAnalyzerWithLabelsAsyncTest.java +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample16_CreateAnalyzerWithLabelsAsyncTest.java @@ -220,7 +220,7 @@ public void testCreateAnalyzerWithLabelsAsync() { System.out.println(" Description: " + result.getDescription()); System.out.println(" Base analyzer: " + result.getBaseAnalyzerId()); System.out.println(" Fields: " + result.getFieldSchema().getFields().size()); - System.out.println(" Knowledge srcs: " + System.out.println(" Knowledge sources: " + (result.getKnowledgeSources() == null ? 0 : result.getKnowledgeSources().size())); // END: com.azure.ai.contentunderstanding.createAnalyzerWithLabelsAsync diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample16_CreateAnalyzerWithLabelsTest.java b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample16_CreateAnalyzerWithLabelsTest.java index ed51bc6d960f..529798f275ac 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample16_CreateAnalyzerWithLabelsTest.java +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample16_CreateAnalyzerWithLabelsTest.java @@ -209,7 +209,7 @@ public void testCreateAnalyzerWithLabels() { System.out.println(" Description: " + result.getDescription()); System.out.println(" Base analyzer: " + result.getBaseAnalyzerId()); System.out.println(" Fields: " + result.getFieldSchema().getFields().size()); - System.out.println(" Knowledge srcs: " + System.out.println(" Knowledge sources: " + (result.getKnowledgeSources() == null ? 0 : result.getKnowledgeSources().size())); // END: com.azure.ai.contentunderstanding.createAnalyzerWithLabels From f04c36ad60a7328640391f597053e708bb1abedd Mon Sep 17 00:00:00 2001 From: Changjian Wang Date: Sat, 9 May 2026 13:48:12 +0800 Subject: [PATCH 19/19] Address Copilot review: storage-blob test scope, SAS clock skew - pom.xml: Move azure-storage-blob from compile to test scope. The dependency is only used by Sample16 (compiled as test sources) and tests, so it should not be a transitive dependency of library consumers. - Sample16_CreateAnalyzerWithLabels.java / Async: Set SAS startsOn 5 minutes in the past to tolerate clock skew between the local machine and the storage service. Without this buffer, GetUserDelegationKey/SAS generation can intermittently fail with AuthenticationFailed. --- .../azure-ai-contentunderstanding/pom.xml | 10 +++++----- .../samples/Sample16_CreateAnalyzerWithLabels.java | 7 +++++-- .../Sample16_CreateAnalyzerWithLabelsAsync.java | 7 +++++-- 3 files changed, 15 insertions(+), 9 deletions(-) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/pom.xml b/sdk/contentunderstanding/azure-ai-contentunderstanding/pom.xml index 920446ab1d72..dc8bc6857ef3 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/pom.xml +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/pom.xml @@ -69,16 +69,16 @@ 1.18.3 com.azure azure-storage-blob 12.33.4 + test diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabels.java b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabels.java index 74ebb4e8abb9..8f9dd61425be 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabels.java +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabels.java @@ -425,8 +425,11 @@ public static String generateUserDelegationSasUrl( .credential(credential) .buildClient(); - OffsetDateTime startsOn = OffsetDateTime.now(); - OffsetDateTime expiresOn = startsOn.plusHours(1); + // Start the SAS 5 minutes in the past to tolerate clock skew between the local machine + // and the storage service. Without this buffer, SAS generation can intermittently fail + // with AuthenticationFailed ("SAS not valid yet"). + OffsetDateTime startsOn = OffsetDateTime.now().minusMinutes(5); + OffsetDateTime expiresOn = OffsetDateTime.now().plusHours(1); UserDelegationKey userDelegationKey = blobServiceClient.getUserDelegationKey(startsOn, expiresOn); diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabelsAsync.java b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabelsAsync.java index 4767b0ab27a3..9d3c27ac3b45 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabelsAsync.java +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample16_CreateAnalyzerWithLabelsAsync.java @@ -445,8 +445,11 @@ public static String generateUserDelegationSasUrl( .credential(credential) .buildClient(); - OffsetDateTime startsOn = OffsetDateTime.now(); - OffsetDateTime expiresOn = startsOn.plusHours(1); + // Start the SAS 5 minutes in the past to tolerate clock skew between the local machine + // and the storage service. Without this buffer, SAS generation can intermittently fail + // with AuthenticationFailed ("SAS not valid yet"). + OffsetDateTime startsOn = OffsetDateTime.now().minusMinutes(5); + OffsetDateTime expiresOn = OffsetDateTime.now().plusHours(1); UserDelegationKey userDelegationKey = blobServiceClient.getUserDelegationKey(startsOn, expiresOn);