-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Add setup script and skills for Azure AI Content Understanding #48893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
changjian-wang
wants to merge
27
commits into
main
Choose a base branch
from
changjian-wang/cu-sdk-skills
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
b216ad2
Add scripts for running samples and setting up environment for Azure …
changjian-wang a7f9a1c
Enhance SKILL.md files to include related skills for better user guid…
changjian-wang 7c74788
Update README.md to enhance Table of Contents with new sections for b…
changjian-wang 872dd16
Add setup and run scripts for Azure AI Content Understanding SDK samples
wangchangjian1130 3906706
Merge branch 'main' into changjian-wang/cu-sdk-skills
changjian-wang 8146dd8
Update SKILL.md to enhance user guidance for Java SDK setup and confi…
wangchangjian1130 c44e5da
Update SKILL.md to clarify Maven installation commands and add troubl…
wangchangjian1130 e0332ec
Add setup script for Azure AI Content Understanding Java SDK environment
changjian-wang 2aa38c2
Enhance setup_user_env.sh script for Azure AI Content Understanding SDK
wangchangjian1130 f96aa50
Merge branch 'main' into changjian-wang/cu-sdk-skills
changjian-wang fd5b9ce
fix: update environment variable handling and improve script comments
wangchangjian1130 b0275cf
fix: update logging link in README to point to the new documentation
wangchangjian1130 b1c4aff
refactor: remove setup_samples.sh script and update SKILL.md for envi…
wangchangjian1130 997e2bc
feat: add Sample07_ListAnalyzers to SKILL.md for analyzer enumeration
wangchangjian1130 efce265
Address PR review feedback for cu-sdk-sample-run skill
changjian-wang 8aa71e7
Merge remote-tracking branch 'origin/main' into changjian-wang/cu-sdk…
changjian-wang d83299a
Fix cspell error in run_sample.sh
changjian-wang 3fa0321
[Java CU SDK] Align cu-sdk-sample-run skill structure with Python; ad…
changjian-wang ab3d09e
Merge branch 'main' into changjian-wang/cu-sdk-skills
changjian-wang c29be38
Enhance Sample16_CreateAnalyzerWithLabelsAsync and related tests for …
changjian-wang a8687e3
Merge branch 'changjian-wang/cu-sdk-skills' of https://github.com/Azu…
changjian-wang db0c3ad
Merge branch 'main' into changjian-wang/cu-sdk-skills
changjian-wang 691bde4
Bump azure-storage-blob to 12.33.4 to satisfy version_client.txt
changjian-wang 32347f7
Fix cspell error: rename 'Knowledge srcs' to 'Knowledge sources'
changjian-wang f04c36a
Address Copilot review: storage-blob test scope, SAS clock skew
changjian-wang e95e2d1
Merge branch 'main' into changjian-wang/cu-sdk-skills
changjian-wang b6df03f
Merge branch 'main' into changjian-wang/cu-sdk-skills
changjian-wang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
55 changes: 55 additions & 0 deletions
55
...g/azure-ai-contentunderstanding/.github/skills/cu-sdk-common-knowledge/SKILL.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| --- | ||
| name: cu-sdk-common-knowledge | ||
| description: Domain knowledge for Azure AI Content Understanding. Use this skill to answer questions about Content Understanding concepts, analyzers, field schemas, API operations, and Java SDK usage. Always consult official documentation before answering. | ||
| --- | ||
|
|
||
| # Azure AI Content Understanding Domain Knowledge | ||
|
|
||
| This skill provides domain knowledge for Azure AI Content Understanding, a multimodal AI service that extracts semantic content from documents, video, audio, and image files. | ||
|
|
||
| > **[COPILOT GUIDANCE]:** Always consult the official documentation first before answering user questions. Use `fetch_webpage` to read the relevant doc page when the reference material below is insufficient or may be outdated. | ||
| > | ||
| > When a user's question is broad or ambiguous, ask them to clarify: | ||
| > - "Which modality are you working with — documents, images, audio, or video?" | ||
| > - "Are you using a prebuilt analyzer, or building a custom one?" | ||
| > - "Are you asking about the Java SDK specifically, or the service in general?" | ||
|
|
||
| ## Official Documentation | ||
|
|
||
| The authoritative source for Content Understanding is: **https://learn.microsoft.com/azure/ai-services/content-understanding/** | ||
|
|
||
| Always read the relevant page (via `fetch_webpage`) before answering if the reference material below does not cover the topic. | ||
|
|
||
| ### Key Documentation Pages | ||
|
|
||
| | Topic | URL | | ||
| |-------|-----| | ||
| | **Overview** | https://learn.microsoft.com/azure/ai-services/content-understanding/overview | | ||
| | **What's new** | https://learn.microsoft.com/azure/ai-services/content-understanding/whats-new | | ||
| | **Content Understanding Studio** | https://learn.microsoft.com/azure/ai-services/content-understanding/quickstart/content-understanding-studio?tabs=portal%2Ccu-studio | | ||
| | **Service limits** | https://learn.microsoft.com/azure/ai-services/content-understanding/service-limits | | ||
| | **Region & language support** | https://learn.microsoft.com/azure/ai-services/content-understanding/language-region-support | | ||
| | **Prebuilt analyzers** | https://learn.microsoft.com/azure/ai-services/content-understanding/concepts/prebuilt-analyzers | | ||
| | **Create custom analyzer** | https://learn.microsoft.com/azure/ai-services/content-understanding/tutorial/create-custom-analyzer?tabs=portal%2Cdocument&pivots=programming-language-java | | ||
| | **Document markdown** | https://learn.microsoft.com/azure/ai-services/content-understanding/document/markdown | | ||
| | **Document elements** | https://learn.microsoft.com/azure/ai-services/content-understanding/document/elements | | ||
| | **Video overview** | https://learn.microsoft.com/azure/ai-services/content-understanding/video/overview | | ||
| | **Video elements** | https://learn.microsoft.com/azure/ai-services/content-understanding/video/elements | | ||
| | **Audio overview** | https://learn.microsoft.com/azure/ai-services/content-understanding/audio/overview | | ||
| | **Image overview** | https://learn.microsoft.com/azure/ai-services/content-understanding/image/overview | | ||
| | **REST API reference** | https://learn.microsoft.com/rest/api/contentunderstanding/operation-groups | | ||
|
|
||
| ### Java SDK Resources | ||
|
|
||
| | Resource | URL | | ||
| |----------|-----| | ||
| | **Maven Central** | https://central.sonatype.com/artifact/com.azure/azure-ai-contentunderstanding | | ||
| | **Java SDK README** | https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md | | ||
| | **Java SDK Samples** | https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples | | ||
|
|
||
| > **Search tip:** If the above pages don't cover the user's question, search the doc tree at `https://learn.microsoft.com/azure/ai-services/content-understanding/`. | ||
|
|
||
| ## Related Skills | ||
|
|
||
| - `cu-sdk-setup` — Set up environment variables for Java SDK samples | ||
| - `cu-sdk-sample-run` — Run specific Java SDK samples interactively |
555 changes: 555 additions & 0 deletions
555
...tanding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/SKILL.md
Large diffs are not rendered by default.
Oops, something went wrong.
259 changes: 259 additions & 0 deletions
259
...ding/azure-ai-contentunderstanding/.github/skills/cu-sdk-sample-run/scripts/run_sample.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,259 @@ | ||
| #!/usr/bin/env bash | ||
| set -euo pipefail | ||
| # cspell:ignore envfile esac | ||
|
|
||
| # run_sample.sh | ||
| # Run a specific Java sample for the Azure AI Content Understanding SDK. | ||
| # Compiles the samples module (if needed) and runs the specified sample class | ||
| # using mvn exec:java. | ||
| # | ||
| # Usage: | ||
| # run_sample.sh <SampleClassName> [--env <env-file>] [--dry-run] | ||
| # Examples: | ||
| # run_sample.sh Sample02_AnalyzeUrl | ||
| # run_sample.sh Sample02_AnalyzeUrlAsync | ||
| # run_sample.sh Sample02_AnalyzeUrl --env .env | ||
| # run_sample.sh --list | ||
|
|
||
| SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | ||
| # Package root is 4 levels up from scripts: .github/skills/cu-sdk-sample-run/scripts -> package root | ||
| PACKAGE_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)" | ||
| SAMPLES_DIR="$PACKAGE_ROOT/src/samples/java/com/azure/ai/contentunderstanding/samples" | ||
| PACKAGE="com.azure.ai.contentunderstanding.samples" | ||
|
|
||
| # Colors for output | ||
| RED='\033[0;31m' | ||
| GREEN='\033[0;32m' | ||
| YELLOW='\033[1;33m' | ||
| BLUE='\033[0;34m' | ||
| NC='\033[0m' # No Color | ||
|
|
||
| # Defaults | ||
| DRY_RUN=0 | ||
| ENV_FILE="" | ||
| SAMPLE_NAME="" | ||
|
|
||
| print_info() { echo -e "${BLUE}$1${NC}"; } | ||
| print_success() { echo -e "${GREEN}$1${NC}"; } | ||
| print_warning() { echo -e "${YELLOW}$1${NC}"; } | ||
| print_error() { echo -e "${RED}$1${NC}"; } | ||
|
|
||
| print_help() { | ||
| cat <<EOF | ||
| Usage: $(basename "$0") <SampleClassName> [OPTIONS] | ||
|
|
||
| Run a specific Java sample for the Azure AI Content Understanding SDK. | ||
|
|
||
| Arguments: | ||
| <SampleClassName> Sample class name (e.g., Sample02_AnalyzeUrl). | ||
| The .java extension is optional. | ||
|
|
||
| Options: | ||
| --env <file> Load environment variables from the given .env file before running. | ||
| --dry-run Print what would be executed without running. | ||
| --list List available samples and exit. | ||
| --help, -h Show this help message. | ||
|
|
||
| Examples: | ||
| $(basename "$0") Sample02_AnalyzeUrl | ||
| $(basename "$0") Sample02_AnalyzeUrlAsync | ||
| $(basename "$0") Sample02_AnalyzeUrl --env .env | ||
| $(basename "$0") --list | ||
| EOF | ||
| } | ||
|
|
||
| list_samples() { | ||
| echo "" | ||
| print_info "=== Available Sync Samples ===" | ||
| for f in "$SAMPLES_DIR"/Sample*.java; do | ||
| [ -f "$f" ] || continue | ||
| local name | ||
| name="$(basename "$f" .java)" | ||
| # Skip async samples in this section | ||
| [[ "$name" == *Async ]] && continue | ||
| echo " $name" | ||
| done | ||
| echo "" | ||
| print_info "=== Available Async Samples ===" | ||
| for f in "$SAMPLES_DIR"/Sample*Async.java; do | ||
| [ -f "$f" ] || continue | ||
| local name | ||
| name="$(basename "$f" .java)" | ||
| echo " $name" | ||
| done | ||
| echo "" | ||
| } | ||
|
|
||
| # Load environment variables from a .env file. | ||
| # | ||
| # Only simple NAME=VALUE assignments are accepted (with an optional leading | ||
| # `export `). Names must be valid shell identifiers ([A-Za-z_][A-Za-z0-9_]*). | ||
| # A single matching pair of surrounding double or single quotes is stripped | ||
| # from the value. Anything else is skipped with a warning. We deliberately | ||
| # avoid `eval` so a malicious or malformed .env file cannot execute arbitrary | ||
| # commands or trigger command substitution. | ||
| load_env_file() { | ||
| local envfile="$1" | ||
| if [[ ! -f "$envfile" ]]; then | ||
| print_error "Error: .env file not found: $envfile" | ||
| exit 1 | ||
| fi | ||
| print_info "Loading environment variables from: $envfile" | ||
| local line name value lineno=0 | ||
| while IFS= read -r line || [[ -n "$line" ]]; do | ||
| lineno=$((lineno + 1)) | ||
| # Skip empty lines and comments | ||
| [[ -z "$line" || "$line" =~ ^[[:space:]]*# ]] && continue | ||
| # Strip optional leading `export ` (with surrounding whitespace) | ||
| line="${line#"${line%%[![:space:]]*}"}" # strip leading whitespace | ||
| line="${line#export }" | ||
| # Require NAME=VALUE with a valid identifier on the left | ||
| if [[ ! "$line" =~ ^([A-Za-z_][A-Za-z0-9_]*)=(.*)$ ]]; then | ||
| print_warning " Skipping line $lineno (not a NAME=VALUE assignment)" | ||
| continue | ||
| fi | ||
| name="${BASH_REMATCH[1]}" | ||
| value="${BASH_REMATCH[2]}" | ||
| # Strip a single matching pair of surrounding double or single quotes | ||
| if [[ "$value" =~ ^\"(.*)\"$ ]]; then | ||
| value="${BASH_REMATCH[1]}" | ||
| elif [[ "$value" =~ ^\'(.*)\'$ ]]; then | ||
| value="${BASH_REMATCH[1]}" | ||
| fi | ||
| export "$name=$value" | ||
| done < "$envfile" | ||
| print_success "✓ Environment variables loaded" | ||
| } | ||
|
|
||
| # Parse arguments | ||
| while [[ $# -gt 0 ]]; do | ||
| case "$1" in | ||
| --help|-h) | ||
| print_help | ||
| exit 0 | ||
| ;; | ||
| --list|-l) | ||
| list_samples | ||
| exit 0 | ||
| ;; | ||
| --dry-run) | ||
| DRY_RUN=1 | ||
| shift | ||
| ;; | ||
| --env) | ||
| if [[ -z "${2:-}" ]]; then | ||
| print_error "Error: --env requires a file path argument" | ||
| exit 1 | ||
| fi | ||
| ENV_FILE="$2" | ||
| shift 2 | ||
| ;; | ||
| -*) | ||
| print_error "Unknown option: $1" | ||
| print_help | ||
| exit 1 | ||
| ;; | ||
| *) | ||
| if [[ -z "$SAMPLE_NAME" ]]; then | ||
| SAMPLE_NAME="$1" | ||
| else | ||
| print_error "Error: Multiple samples specified. Only one sample is supported." | ||
| exit 1 | ||
| fi | ||
| shift | ||
| ;; | ||
| esac | ||
| done | ||
|
|
||
| if [[ -z "$SAMPLE_NAME" ]]; then | ||
| print_error "Error: No sample name provided" | ||
| echo "" | ||
| print_help | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Normalize: strip .java extension if provided | ||
| SAMPLE_NAME="${SAMPLE_NAME%.java}" | ||
|
|
||
| # Verify the sample file exists | ||
| SAMPLE_FILE="$SAMPLES_DIR/${SAMPLE_NAME}.java" | ||
| if [[ ! -f "$SAMPLE_FILE" ]]; then | ||
| print_error "Error: Sample not found: $SAMPLE_FILE" | ||
| echo "" | ||
| echo "Did you mean one of these?" | ||
| ls "$SAMPLES_DIR"/Sample*.java 2>/dev/null | xargs -n1 basename | sed 's/\.java$//' | grep -i "${SAMPLE_NAME}" | head -5 || true | ||
| echo "" | ||
| echo "Run '$(basename "$0") --list' to see all available samples" | ||
| exit 1 | ||
| fi | ||
|
|
||
| FULL_CLASS="${PACKAGE}.${SAMPLE_NAME}" | ||
|
|
||
| echo "" | ||
| print_info "=== Run Java Sample ===" | ||
| echo "Package root: $PACKAGE_ROOT" | ||
| echo "Sample class: $FULL_CLASS" | ||
| echo "Sample file: $SAMPLE_FILE" | ||
| echo "" | ||
|
|
||
| # Navigate to package root | ||
| cd "$PACKAGE_ROOT" | ||
|
|
||
| # Load .env file if specified | ||
| if [[ -n "$ENV_FILE" ]]; then | ||
| # Resolve relative path from original cwd | ||
| if [[ "$ENV_FILE" != /* ]]; then | ||
| ENV_FILE="$PACKAGE_ROOT/$ENV_FILE" | ||
| fi | ||
| load_env_file "$ENV_FILE" | ||
| echo "" | ||
| fi | ||
|
|
||
| # Check for required environment variable | ||
| if [[ -z "${CONTENTUNDERSTANDING_ENDPOINT:-}" ]]; then | ||
| print_warning "⚠ CONTENTUNDERSTANDING_ENDPOINT is not set. Most samples will fail without it." | ||
| echo " Set it with: export CONTENTUNDERSTANDING_ENDPOINT=\"https://your-foundry.services.ai.azure.com/\"" | ||
| echo " Or use: $(basename "$0") $SAMPLE_NAME --env .env" | ||
| echo "" | ||
| fi | ||
|
|
||
| # Sample16 demo-mode banner: warn if the user is about to run the labeled-data | ||
| # sample without configuring either Option A (SAS URL) or Option B (storage | ||
| # account + container) — the sample will still run but skip the labeled-data | ||
| # code path. | ||
| if [[ "$SAMPLE_NAME" == Sample16* ]]; then | ||
| if [[ -z "${CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL:-}" ]]; then | ||
| if [[ -z "${CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT:-}" \ | ||
| || -z "${CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER:-}" ]]; then | ||
| print_warning "⚠ DEMO MODE: no training data configured for $SAMPLE_NAME." | ||
| echo " The analyzer will be created without labeled data ('Knowledge sources: 0')." | ||
| echo " To exercise the labeled-data API path, configure ONE of:" | ||
| echo " Option A: CONTENTUNDERSTANDING_TRAINING_DATA_SAS_URL=<container SAS URL>" | ||
| echo " Option B: CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT=<account>" | ||
| echo " CONTENTUNDERSTANDING_TRAINING_DATA_CONTAINER=<container>" | ||
| echo " then re-run: set -a && source .env && set +a" | ||
| echo "" | ||
| fi | ||
| fi | ||
| fi | ||
|
|
||
| # Build command. Sample classes live under src/samples/java and are compiled | ||
| # as test sources, so we must run test-compile before exec:java; otherwise on | ||
| # a clean checkout the sample class will not exist on the classpath. | ||
| MVN_CMD="mvn -DskipTests test-compile exec:java -Dexec.mainClass=\"${FULL_CLASS}\" -Dexec.classpathScope=test" | ||
|
|
||
| if [[ $DRY_RUN -eq 1 ]]; then | ||
| echo "DRY RUN: would execute:" | ||
| echo " cd $PACKAGE_ROOT" | ||
| [[ -n "${ENV_FILE:-}" ]] && echo " (env loaded from $ENV_FILE)" | ||
| echo " $MVN_CMD" | ||
| exit 0 | ||
| fi | ||
|
|
||
| # Run the sample | ||
| print_info "Running: $SAMPLE_NAME" | ||
| echo "" | ||
| mvn -DskipTests test-compile exec:java -Dexec.mainClass="${FULL_CLASS}" -Dexec.classpathScope=test | ||
|
|
||
| echo "" | ||
| print_success "✓ Sample completed: $SAMPLE_NAME" | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.