Content Understanding GA SDK for Java#47952
Open
changjian-wang wants to merge 188 commits intomainfrom
Open
Conversation
…', API Version: 2025-11-01, SDK Release Type: beta, and CommitSHA: 'd0cd556bd91d2dda700e983c0d253fa025b324c0' in SpecRepo: 'https://github.com/Azure/azure-rest-api-specs' Pipeline run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=5634426 Refer to https://eng.ms/docs/products/azure-developer-experience/develop/sdk-release/sdk-release-prerequisites to prepare for SDK release.
…entunderstanding-5634426
…nts in Content Understanding SDK
… from .NET SDK - Sample00_ConfigureDefaults: Demonstrates configuration management (get/update defaults) - Sample01_AnalyzeBinary: Binary PDF analysis from local file - Sample02_AnalyzeUrl: Analyze documents from URL - Sample03_AnalyzeInvoice: Extract structured invoice fields with nested objects and arrays - Sample04_CreateAnalyzer: Create and use custom analyzer with field schema (Extract/Generate/Classify methods) Key features: - All samples use DefaultAzureCredentialBuilder for authentication - Environment variable based configuration (ENDPOINT) - Comprehensive JUnit 5 tests with assertions - GitHub public URLs for test data - Proper field access patterns with type casting (ContentField, StringField, NumberField, ObjectField, ArrayField) - All tests passing (6/6 = 100% success rate) Technical implementation: - Fixed API differences from C# SDK (ContentSpan, ContentField, 5-parameter beginAnalyze) - Proper null checking and type casting for all field access - Detailed validation assertions for all document properties - Clean resource management with @AfterEach cleanup Module-info.java formatting cleanup included.
- Sample05_CreateClassifier: Create classifier analyzer with multiple classification fields (document_type, industry, urgency) - Sample06_GetAnalyzer: Get analyzer information including configuration and field schema Key features: - Sample05: Demonstrates classification-only analyzer with 3 classifiers - Sample06: Shows how to retrieve and inspect analyzer properties including prebuilt analyzers - Fixed API usage: getAnalyzerId(), getCreatedAt(), getLastModified At() instead of getId(), getCreatedDateTime(), getUpdatedDateTime() - Comprehensive field schema inspection with all 31 prebuilt-invoice fields - All tests passing with real Azure service
- Sample07_ListAnalyzers: List and filter all available analyzers (prebuilt and custom) * testListAnalyzersAsync: Lists all 134 analyzers (87 prebuilt, 47 custom) * testListReadyAnalyzersAsync: Filters for ready analyzers only - Sample08_UpdateAnalyzer: Update existing analyzer properties * Demonstrates updating description, configuration, and field schema * Uses @beforeeach to create test analyzer and @AfterEach for cleanup * Shows how to add new fields while preserving existing ones All tests passing with real Azure service
Fixed Sample08_UpdateAnalyzer to avoid 409 conflict error: - Delete existing analyzer before recreating with updated configuration - Added note about using updateAnalyzerWithResponse for atomic updates in production - All 12 tests now passing (Sample00-08 with multiple test methods) Test results: 12/12 passed (100% success rate)
…eAnalyzerWithLabels
…iables and Improve Test Patterns - Updated environment variable names from "ENDPOINT" and "CONTENTUNDERSTANDING_API_KEY" to "CONTENTUNDERSTANDING_ENDPOINT" and "AZURE_CONTENT_UNDERSTANDING_KEY" across multiple sample test files. - Modified sample tests to load local files instead of using publicly accessible URLs for document analysis. - Enhanced assertions and logging for better clarity and debugging. - Improved API usage patterns in tests for creating, copying, and deleting analyzers, including async patterns. - Added model mappings for analyzers in relevant samples to demonstrate configuration capabilities.
…e validation of source and copied analyzers
… Azure Credential Authentication - Updated Sample03_AnalyzeInvoice, Sample04_CreateAnalyzer, Sample05_CreateClassifier, Sample06_GetAnalyzer, Sample07_ListAnalyzers, Sample08_UpdateAnalyzer, Sample09_DeleteAnalyzer, Sample10_AnalyzeConfigs, Sample11_AnalyzeReturnRawJson, Sample12_GetResultFile, Sample13_DeleteResult, Sample14_CopyAnalyzer, Sample15_GrantCopyAuth, and Sample16_CreateAnalyzerWithLabels to include logic for initializing the Content Understanding client with either an API key or the Default Azure Credential. - Added assertions to verify client initialization in each sample. - Improved code readability and maintainability by consolidating client creation logic.
…Builder Initialization
- Sample12_GetResultFile: Demonstrates how to retrieve keyframe images from video analysis operations. - Sample13_DeleteResult: Shows how to delete analysis results after they are no longer needed. - Sample14_CopyAnalyzer: Illustrates how to copy an analyzer within the same resource. - Sample15_GrantCopyAuth: Demonstrates granting copy authorization for cross-resource analyzer copying. - Sample16_CreateAnalyzerWithLabels: Shows how to create an analyzer with labeled training data from Azure Blob Storage.
- Delete 13 @disabled test files (replaced by Sample tests) - Modify Sample00-Sample16 to extend ContentUnderstandingClientTestBase - Add testResourceNamer for reproducible random IDs in PLAYBACK mode - Remove problematic sanitizers (AZSDK2003, AZSDK2030, AZSDK3423, AZSDK3430, AZSDK3493) - Configure maven-surefire-plugin to include Sample*.java - Use AZURE_CONTENT_UNDERSTANDING_ENDPOINT env var (matches .NET naming)
Exclude src/samples/.../samples/Sample*.java standalone examples from test execution.
- Fixed URI mismatch issue where recorded URLs had double slashes (//contentunderstanding) - Updated assets.json to point to new recordings tag (3de1635cfc) - All 23 tests pass in PLAYBACK mode
bojunehsu
reviewed
Feb 27, 2026
...ntentunderstanding/src/main/java/com/azure/ai/contentunderstanding/models/ContentSource.java
Outdated
Show resolved
Hide resolved
bojunehsu
reviewed
Feb 27, 2026
...ontentunderstanding/src/main/java/com/azure/ai/contentunderstanding/models/ContentField.java
Outdated
Show resolved
Hide resolved
bojunehsu
requested changes
Feb 27, 2026
- Changed the visibility of the getSource method in ContentField to package-private and updated its implementation to use the new getSources method. - Updated ContentSource to rename parseSource to parseAll for clarity and adjusted related documentation. - Modified sample and test files to reflect the new source handling, demonstrating the use of getSources for typed access to content sources. - Enhanced sample outputs to provide detailed information about document sources, including page numbers and bounding boxes.
...-ai-contentunderstanding/customization/src/main/java/ContentUnderstandingCustomizations.java
Show resolved
Hide resolved
yungshinlintw
requested changes
Feb 28, 2026
c800758 to
2de1786
Compare
yungshinlintw
approved these changes
Feb 28, 2026
bojunehsu
approved these changes
Feb 28, 2026
Member
weidongxu-microsoft
left a comment
There was a problem hiding this comment.
Typically, lib would be first released as a preview/beta? <-- as long as Arch is fine with 1.0.0
...-ai-contentunderstanding/customization/src/main/java/ContentUnderstandingCustomizations.java
Outdated
Show resolved
Hide resolved
...-ai-contentunderstanding/customization/src/main/java/ContentUnderstandingCustomizations.java
Show resolved
Hide resolved
github-merge-queue bot
pushed a commit
to microsoft/typespec
that referenced
this pull request
Feb 28, 2026
for Azure/azure-sdk-for-java#47952 (comment) <img width="563" height="100" alt="image" src="https://github.com/user-attachments/assets/06192bff-0975-4ecb-9700-2601dbbdfa03" />
…t oepration Id - Removed the operationId field and its associated helper class, simplifying the model. - Updated PollingUtils and polling strategies to eliminate the need for operationId extraction. - Adjusted sample and test files to reflect changes in accessing operation ID using the getId() method instead of getOperationId(). - Enhanced documentation to clarify the new approach for retrieving operation IDs.
…tructor and updateDefaults methods - Made the ContentUnderstandingDefaults constructor public to allow the creation of instances in updateDefaults methods. - Added convenience methods for updateDefaults that accept typed objects instead of BinaryData, addressing limitations of the Java emitter. - Updated documentation to clarify the behavior of the new methods and the rationale behind the changes.
...t/java/com/azure/ai/contentunderstanding/tests/samples/Sample00_UpdateDefaultsAsyncTest.java
Outdated
Show resolved
Hide resolved
sdk/contentunderstanding/azure-ai-contentunderstanding/cspell.json
Outdated
Show resolved
Hide resolved
Member
weidongxu-microsoft
left a comment
There was a problem hiding this comment.
One may want to add a README.md in src/samples folder. E.g. https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/storage/azure-storage-blob/src/samples/README.md
This would help publish these samples to Azure Sample Browser.
This is likely not required though.
Replace 'import static org.junit.jupiter.api.Assertions.*' with explicit imports (assertEquals, assertNotNull, assertTrue, etc.) across all 37 test and sample test files per checkstyle requirements. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove Dtest, Dsurefire, dotenv, DAZURE, pytest - none are used in this project. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add src/samples/README.md following the Azure SDK for Java convention to enable publishing samples to the Azure Sample Browser. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR #47952 Review Guide: Content Understanding GA SDK for Java
Executive Summary
This PR introduces the brand new GA release of the
azure-ai-contentunderstandingJava SDK. Since this is a new package, all 191 files are additions (+43,291 lines). The package includes:ContentUnderstandingCustomizations.java— JavaParser AST transformations applied post-generation. This is the source of truth for all hand-authored classes and code modifications.Key customizations in
ContentUnderstandingCustomizations.java:ContentSource,DocumentSource,AudioVisualSource,PointF,RectangleF,Rectangle,ContentRangegetValue()on allContentFieldsubclassesDurationgetters wrapping raw millisecond fields on time-based modelsChanges This Week (since Monday Feb 24)
If you've already reviewed an earlier version of this PR, here's what changed this week across 18 commits touching 95+ files. The changes fall into 6 categories:
1. ContentSource hierarchy (new feature)
Added hand-authored
ContentSource,DocumentSource,AudioVisualSource,PointF,RectangleF,Rectangleclasses with parsing logic. AddedgetSources()toContentField(consistent with .NETSourcesproperty).customization/.../ContentUnderstandingCustomizations.java— addedContentSource,DocumentSource,AudioVisualSource,PointF,RectangleF,Rectangleclasses +getSources()customizationmodels/ContentSource.java,DocumentSource.java,AudioVisualSource.java— generated output of the above classesmodels/PointF.java,RectangleF.java,Rectangle.java— generated output of the above geometry typesmodels/ContentField.java— addedgetSources()methodtests/ContentSourceTest.java— new unit tests2. Duration property customizations (new feature)
Hid raw
*Ms()getters (package-private) onAudioVisualContent,AudioVisualContentSegment,TranscriptPhrase,TranscriptWord. AddedDuration-returning getters (getStartTime(),getEndTime(), etc.). RemovedgetTimeMs()fromAudioVisualSource.customization/.../ContentUnderstandingCustomizations.java—customizeDurationProperties()methodmodels/AudioVisualContent.java— hiddengetStartTimeMs/getEndTimeMs/getCameraShotTimesMs/getKeyFrameTimesMs, addedDurationgettersmodels/AudioVisualContentSegment.java,TranscriptPhrase.java,TranscriptWord.java— same patternmodels/ContentRange.java—Duration-based factory methods (timeRange,timeRangeFrom)tests/DurationCustomizationTest.java,ContentRangeTest.java— new tests3. Type renames from TypeSpec GA update (bulk rename)
AnalyzeInput→AnalysisInput,AnalyzeResult→AnalysisResult,MediaContent→AnalysisContent, plusContent-prefixed field types. This touched 70 files but is mostly mechanical find-and-replace across models, samples, and tests.models/Analysis*.java,Content*Field.java— renamed typesSample*.javaandSample*Test.java— updated type referencestsp-location.yaml— updated TypeSpec commit4. Property renaming & parameter fixes
ContentAnalyzerConfig.set*Enabled()— property rename (touched 36 files: config model, all samples using analyzers, all corresponding tests)*Request1bogus parameter name fix — cleaned upAnalyzeRequest1/GrantCopyAuthorizationRequest1parameter names in client convenience methods5. README update
README.md— fixedAnalyzeResult→AnalysisResult(2 occurrences)Files to focus on if you've already reviewed earlier
ContentUnderstandingCustomizations.javaContentSource.java,DocumentSource.java,AudioVisualSource.javaContentField.javagetSources()ContentRange.javaPointF.java,RectangleF.java,Rectangle.javaContentSourceTest.java,DurationCustomizationTest.java,ContentRangeTest.javaP0 — Must Review (Public API, customizations, README)
customization/.../ContentUnderstandingCustomizations.javamodels/instead (see P2 section).README.md.../ContentUnderstandingClient.java.../ContentUnderstandingAsyncClient.javaMono/Fluxreturn types.../ContentUnderstandingClientBuilder.java.../ContentUnderstandingServiceVersion.javaV2025_11_01.../models/ContentField.java.../models/AnalysisResult.java.../models/ContentAnalyzerConfig.javaCHANGELOG.mdTotal P0: 10 files, ~8,336 lines
P1 — Should Review (Key samples, key models, test infra, CI config)
Sample02_AnalyzeUrl.javaSample03_AnalyzeInvoice.javaSample04_CreateAnalyzer.javaSample16_CreateAnalyzerWithLabels.javaSample14_CopyAnalyzer.java.../models/ContentAnalyzer.java.../models/AnalysisContent.javaDocumentContent/AudioVisualContent.../models/DocumentContent.java.../models/AudioVisualContent.java.../models/ContentFieldDefinition.java.../models/ContentFieldSchema.javaci.ymltest-resources.biceptest-resources-post.ps1pom.xml(package-level)Total P1: 15 files, ~4,276 lines
P2 — Skim / Low Priority
Hand-authored model classes (7 files, 909 lines)
These classes are written as string constants inside
ContentUnderstandingCustomizations.javaand emitted as files during code generation. Review the generated output files instead — they are real.javafiles with full syntax highlighting, which is much easier to read than the escaped string constants in the customization class:models/)ContentSource.javaparse(),toRawString(), abstract hierarchyDocumentSource.javaAudioVisualSource.javaContentRange.javapage(),pages(),timeRange(Duration),combine()PointF.javaRectangleF.javaRectangle.javaRemaining generated models (68 files, ~11,794 lines)
All other files in
models/. Auto-generated from TypeSpec — spot-check a few for correctness but full review is low value.Generated implementation (11 files, 7,619 lines)
ContentUnderstandingClientImpl.java(6,574 lines) is the main REST client. Polling strategies and helpers are also generated. Low review value.Remaining samples (27 files, ~5,051 lines)
Sync + async pairs for:
Sample00_UpdateDefaults,Sample01_AnalyzeBinary,Sample05_CreateClassifier,Sample06_GetAnalyzer,Sample08_UpdateAnalyzer,Sample09_DeleteAnalyzer,Sample10_AnalyzeConfigs,Sample11_AnalyzeReturnRawJson,Sample12_GetResultFile,Sample13_DeleteResult,Sample15_GrantCopyAuth, plus async variants of P1 samples.Tests (38 files, 8,412 lines)
ContentRangeTest(185),ContentSourceTest(279),DurationCustomizationTest(146)ContentUnderstandingClientTestBase.java(73): Shared test setupInfrastructure (low-churn files)
.github/CODEOWNERSeng/versioning/version_client.txtpom.xml(root)pom.xml(service)tests.ymltsp-location.yaml.gitignorecspell.jsonassets.jsoncustomization/pom.xmlsrc/main/resources/(3 files)Sample resources (binary, not reviewable)
mixed_financial_docs.pdf,sample_document_features.pdf,sample_invoice.pdfreceipt_labels/— 2 receipt images + labels JSON + result JSONReview Tips
ContentUnderstandingCustomizations.java— this is the most critical file. It defines all hand-authored classes and AST transformations. Understanding it makes the rest of the PR much clearer.tsp-client update.