[ContentUnderstanding] Add LlmInputHelper for converting analysis results to LLM-friendly text#49000
Merged
yungshinlintw merged 15 commits intomainfrom May 2, 2026
Merged
[ContentUnderstanding] Add LlmInputHelper for converting analysis results to LLM-friendly text#49000yungshinlintw merged 15 commits intomainfrom
yungshinlintw merged 15 commits intomainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a new public helper API in the Content Understanding Java SDK to convert AnalysisResult into LLM-ready text (YAML front matter + markdown), and updates samples/tests/docs to demonstrate and validate the feature.
Changes:
- Introduces
LlmInputHelperandToLlmInputOptionsto render analysis results into a prompt-friendly text format. - Adds unit tests and sample tests covering documents, content ranges, multi-segment audio/video, and classification scenarios.
- Updates samples, README, and changelog to document and demonstrate
toLlmInput.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/main/java/com/azure/ai/contentunderstanding/models/LlmInputHelper.java | New helper that renders AnalysisResult into YAML front matter + markdown, including segmentation/classification logic. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/main/java/com/azure/ai/contentunderstanding/models/ToLlmInputOptions.java | Options bag controlling whether fields and/or markdown are included. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/LlmInputHelperTest.java | Unit tests validating YAML/markdown rendering across content types and edge cases. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample_Advanced_ToLlmInputTest.java | New sample test validating advanced sync scenarios (options, content ranges, segments, metadata). |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample_Advanced_ToLlmInputAsyncTest.java | New sample test validating advanced async scenarios (options, content ranges, segments, metadata). |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample01_AnalyzeBinaryTest.java | Extends existing sample test to exercise toLlmInput on document analysis results. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample01_AnalyzeBinaryAsyncTest.java | Extends existing async sample test to exercise toLlmInput. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample03_AnalyzeInvoiceTest.java | Extends existing invoice sample test to exercise toLlmInput for field extraction results. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample03_AnalyzeInvoiceAsyncTest.java | Extends existing async invoice sample test to exercise toLlmInput. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample_Advanced_ToLlmInput.java | New advanced sync sample demonstrating options, content ranges, multi-segment media, and metadata. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample_Advanced_ToLlmInputAsync.java | New advanced async sample demonstrating the same scenarios as the sync sample. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample01_AnalyzeBinary.java | Updates sample to show generating LLM-ready output via toLlmInput. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample01_AnalyzeBinaryAsync.java | Updates async sample to show generating LLM-ready output via toLlmInput. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample03_AnalyzeInvoice.java | Updates sample to show generating LLM-ready output via toLlmInput. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample03_AnalyzeInvoiceAsync.java | Updates async sample to show generating LLM-ready output via toLlmInput. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample05_CreateClassifier.java | Updates classifier sample to route Invoice segments and demonstrate toLlmInput on classification output. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample05_CreateClassifierAsync.java | Updates async classifier sample to analyze and print LLM-ready text for classification output. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/README.md | Adds README section documenting LlmInputHelper.toLlmInput() with example output and link to advanced sample. |
| sdk/contentunderstanding/azure-ai-contentunderstanding/CHANGELOG.md | Adds feature entry for toLlmInput and updates the release header. |
yungshinlintw
requested changes
May 1, 2026
Member
yungshinlintw
left a comment
There was a problem hiding this comment.
Please address the review comments.
This was referenced May 1, 2026
Merged
6 tasks
yungshinlintw
approved these changes
May 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a
toLlmInputhelper that convertsAnalysisResultinto text that LLMs can consume directly — YAML front matter (content type, pages, extracted fields, optional metadata) followed by the markdown body. Useful for injecting into LLM prompts, storing in vector databases, or passing as agentic tool output.Works with all content types (documents, images, audio, video), multi-segment results (e.g., video scenes), and classification hierarchies (auto-expands parent into per-segment blocks with category labels).
API Surface
Three overloads follow Java conventions (no optional parameters). The simpler overloads delegate to the full one with
nulldefaults.Changes
LlmInputHelper,ToLlmInputOptionsSample_Advanced_ToLlmInput(sync + async) for standalone demos; updatedSample01,Sample03, andSample05withtoLlmInputintegration1.1.0-beta.1Cross-language alignment
This feature is aligned across all four Azure SDK languages:
to_llm_input(result, *, include_fields=True, include_markdown=True, metadata=None)result.ToLlmInput(metadata?, options?)(extension method)toLlmInput(result),toLlmInput(result, metadata),toLlmInput(result, metadata, options)toLlmInput(result, options?)(single options bag)Each language follows its idiomatic patterns — Python uses keyword-only args, C# uses extension methods with optional params, Java uses type-disambiguated overloads, JS uses a single options bag.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines