Skip to content

Add Sample_Advanced_ContentSource for document grounding sources#48643

Open
changjian-wang wants to merge 9 commits intomainfrom
changjian-wang/sample-advanced-contentsource-grouding
Open

Add Sample_Advanced_ContentSource for document grounding sources#48643
changjian-wang wants to merge 9 commits intomainfrom
changjian-wang/sample-advanced-contentsource-grouding

Conversation

@changjian-wang
Copy link
Copy Markdown
Member

@changjian-wang changjian-wang commented Mar 31, 2026

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new swagger spec, a link to the pull request containing these swagger spec changes has been included above.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Changjian Wang and others added 8 commits March 31, 2026 15:21
Demonstrates ContentSource grounding from document analysis:
- Part 1: DocumentSource from analysis (page, polygon, boundingBox)
- Part 2: ContentSource.parseAll() round-trip, DocumentSource.parse()
  typed method, and D(page) page-only format

Also fixes DocumentSource to support D(page) format (1 param) and
variable polygon point counts (>=3 pairs).
…rceAsync to enhance multi-segment parsing examples and update documentation comments
…andling; remove page-only format examples from samples and tests
…nates; update documentation for clarity on expected format and return values.
@yungshinlintw yungshinlintw marked this pull request as ready for review May 7, 2026 19:01
@yungshinlintw yungshinlintw requested a review from bojunehsu as a code owner May 7, 2026 19:01
Copilot AI review requested due to automatic review settings May 7, 2026 19:01
@yungshinlintw yungshinlintw requested review from a team and chienyuanchang as code owners May 7, 2026 19:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds new “advanced” samples (sync + async) and corresponding tests for the Content Understanding Java SDK to demonstrate and validate field grounding sources via ContentSource/DocumentSource (including wire-format round-tripping and coordinate-derived bounding boxes).

Changes:

  • Added new sync and async samples demonstrating how to read ContentField#getSources() and parse/round-trip source wire strings.
  • Added new sync and async sample tests that run analysis against a public invoice PDF and validate DocumentSource grounding behavior.
  • Updated assets.json tag for the library’s asset metadata.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample_Advanced_ContentSourceTest.java New test validating DocumentSource grounding and ContentSource wire-format round-trip (sync).
sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample_Advanced_ContentSourceAsyncTest.java New test validating the same grounding + parsing behaviors using the async client.
sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample_Advanced_ContentSourceAsync.java New async sample demonstrating ContentSource usage and parsing patterns.
sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample_Advanced_ContentSource.java New sync sample demonstrating ContentSource usage and parsing patterns.
sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json Updated assets tag reference.

SyncPoller<ContentAnalyzerAnalyzeOperationStatus, AnalysisResult> operation
= contentUnderstandingClient.beginAnalyze("prebuilt-invoice", Arrays.asList(input));

AnalysisResult result = operation.getFinalResult();
Comment on lines +111 to +122
ContentField multiSourceField = documentContent.getFields()
.values()
.stream()
.filter(f -> f.getSources() != null && f.getSources().size() > 1)
.findFirst()
.orElseThrow(() -> new AssertionError("No field with multiple sources found"));
String multiWireFormat = ContentSource.toRawString(multiSourceField.getSources());
System.out.println("Multi-segment wire format: " + multiWireFormat);

List<DocumentSource> docSources = DocumentSource.parse(multiWireFormat);
assertEquals(multiSourceField.getSources().size(), docSources.size(),
"DocumentSource.parse() count should match original source count");
new RuntimeException("Polling completed unsuccessfully with status: " + pollResponse.getStatus()));
}
}).block();

Comment on lines +125 to +133
// --- DocumentSource.parse() — typed method for multi-segment ---
ContentField multiSourceField = documentContent.getFields()
.values()
.stream()
.filter(f -> f.getSources() != null && f.getSources().size() > 1)
.findFirst()
.orElseThrow(() -> new AssertionError("No field with multiple sources found"));
String multiWireFormat = ContentSource.toRawString(multiSourceField.getSources());
System.out.println("Multi-segment wire format: " + multiWireFormat);
Comment on lines +67 to +89
operation.last()
.flatMap(pollResponse -> {
if (pollResponse.getStatus().isComplete()) {
return pollResponse.getFinalResult();
} else {
return Mono.error(new RuntimeException(
"Polling completed unsuccessfully with status: " + pollResponse.getStatus()));
}
})
.doOnNext(result -> {
DocumentContent documentContent = (DocumentContent) result.getContents().get(0);

// Part 1: Document ContentSource from analysis
documentContentSourceFromAnalysis(documentContent);

// Part 2: DocumentSource.parse() and ContentSource.parseAll() round-trip
contentSourceParseRoundTrip(documentContent);
})
.doFinally(signal -> latch.countDown())
.subscribe();

latch.await(5, TimeUnit.MINUTES);
}
Comment on lines 2 to +5
"AssetsRepo": "Azure/azure-sdk-assets",
"AssetsRepoPrefixPath": "java",
"TagPrefix": "java/contentunderstanding/azure-ai-contentunderstanding",
"Tag": "java/contentunderstanding/azure-ai-contentunderstanding_670ad2966f"
"Tag": "java/contentunderstanding/azure-ai-contentunderstanding_940a862f7e"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants