Add Sample_Advanced_ContentSource for document grounding sources#48643
Open
changjian-wang wants to merge 9 commits intomainfrom
Open
Add Sample_Advanced_ContentSource for document grounding sources#48643changjian-wang wants to merge 9 commits intomainfrom
changjian-wang wants to merge 9 commits intomainfrom
Conversation
Demonstrates ContentSource grounding from document analysis: - Part 1: DocumentSource from analysis (page, polygon, boundingBox) - Part 2: ContentSource.parseAll() round-trip, DocumentSource.parse() typed method, and D(page) page-only format Also fixes DocumentSource to support D(page) format (1 param) and variable polygon point counts (>=3 pairs).
…rceAsync to enhance multi-segment parsing examples and update documentation comments
…ng box and source format
…andling; remove page-only format examples from samples and tests
…nates; update documentation for clarity on expected format and return values.
yungshinlintw
approved these changes
May 7, 2026
yungshinlintw
approved these changes
May 7, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds new “advanced” samples (sync + async) and corresponding tests for the Content Understanding Java SDK to demonstrate and validate field grounding sources via ContentSource/DocumentSource (including wire-format round-tripping and coordinate-derived bounding boxes).
Changes:
- Added new sync and async samples demonstrating how to read
ContentField#getSources()and parse/round-trip source wire strings. - Added new sync and async sample tests that run analysis against a public invoice PDF and validate
DocumentSourcegrounding behavior. - Updated
assets.jsontag for the library’s asset metadata.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample_Advanced_ContentSourceTest.java | New test validating DocumentSource grounding and ContentSource wire-format round-trip (sync). |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample_Advanced_ContentSourceAsyncTest.java | New test validating the same grounding + parsing behaviors using the async client. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample_Advanced_ContentSourceAsync.java | New async sample demonstrating ContentSource usage and parsing patterns. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample_Advanced_ContentSource.java | New sync sample demonstrating ContentSource usage and parsing patterns. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json | Updated assets tag reference. |
| SyncPoller<ContentAnalyzerAnalyzeOperationStatus, AnalysisResult> operation | ||
| = contentUnderstandingClient.beginAnalyze("prebuilt-invoice", Arrays.asList(input)); | ||
|
|
||
| AnalysisResult result = operation.getFinalResult(); |
Comment on lines
+111
to
+122
| ContentField multiSourceField = documentContent.getFields() | ||
| .values() | ||
| .stream() | ||
| .filter(f -> f.getSources() != null && f.getSources().size() > 1) | ||
| .findFirst() | ||
| .orElseThrow(() -> new AssertionError("No field with multiple sources found")); | ||
| String multiWireFormat = ContentSource.toRawString(multiSourceField.getSources()); | ||
| System.out.println("Multi-segment wire format: " + multiWireFormat); | ||
|
|
||
| List<DocumentSource> docSources = DocumentSource.parse(multiWireFormat); | ||
| assertEquals(multiSourceField.getSources().size(), docSources.size(), | ||
| "DocumentSource.parse() count should match original source count"); |
| new RuntimeException("Polling completed unsuccessfully with status: " + pollResponse.getStatus())); | ||
| } | ||
| }).block(); | ||
|
|
Comment on lines
+125
to
+133
| // --- DocumentSource.parse() — typed method for multi-segment --- | ||
| ContentField multiSourceField = documentContent.getFields() | ||
| .values() | ||
| .stream() | ||
| .filter(f -> f.getSources() != null && f.getSources().size() > 1) | ||
| .findFirst() | ||
| .orElseThrow(() -> new AssertionError("No field with multiple sources found")); | ||
| String multiWireFormat = ContentSource.toRawString(multiSourceField.getSources()); | ||
| System.out.println("Multi-segment wire format: " + multiWireFormat); |
Comment on lines
+67
to
+89
| operation.last() | ||
| .flatMap(pollResponse -> { | ||
| if (pollResponse.getStatus().isComplete()) { | ||
| return pollResponse.getFinalResult(); | ||
| } else { | ||
| return Mono.error(new RuntimeException( | ||
| "Polling completed unsuccessfully with status: " + pollResponse.getStatus())); | ||
| } | ||
| }) | ||
| .doOnNext(result -> { | ||
| DocumentContent documentContent = (DocumentContent) result.getContents().get(0); | ||
|
|
||
| // Part 1: Document ContentSource from analysis | ||
| documentContentSourceFromAnalysis(documentContent); | ||
|
|
||
| // Part 2: DocumentSource.parse() and ContentSource.parseAll() round-trip | ||
| contentSourceParseRoundTrip(documentContent); | ||
| }) | ||
| .doFinally(signal -> latch.countDown()) | ||
| .subscribe(); | ||
|
|
||
| latch.await(5, TimeUnit.MINUTES); | ||
| } |
Comment on lines
2
to
+5
| "AssetsRepo": "Azure/azure-sdk-assets", | ||
| "AssetsRepoPrefixPath": "java", | ||
| "TagPrefix": "java/contentunderstanding/azure-ai-contentunderstanding", | ||
| "Tag": "java/contentunderstanding/azure-ai-contentunderstanding_670ad2966f" | ||
| "Tag": "java/contentunderstanding/azure-ai-contentunderstanding_940a862f7e" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Please add an informative description that covers that changes made by the pull request and link all relevant issues.
If an SDK is being regenerated based on a new swagger spec, a link to the pull request containing these swagger spec changes has been included above.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines