Skip to content

[ContentUnderstanding] Add LlmInputHelper for converting analysis results to LLM-friendly text#49000

Merged
yungshinlintw merged 15 commits intomainfrom
cu_sdk/llm_input_helper
May 2, 2026
Merged

[ContentUnderstanding] Add LlmInputHelper for converting analysis results to LLM-friendly text#49000
yungshinlintw merged 15 commits intomainfrom
cu_sdk/llm_input_helper

Conversation

@chienyuanchang
Copy link
Copy Markdown
Member

Description

Adds a toLlmInput helper that converts AnalysisResult into text that LLMs can consume directly — YAML front matter (content type, pages, extracted fields, optional metadata) followed by the markdown body. Useful for injecting into LLM prompts, storing in vector databases, or passing as agentic tool output.

Works with all content types (documents, images, audio, video), multi-segment results (e.g., video scenes), and classification hierarchies (auto-expands parent into per-segment blocks with category labels).

API Surface

public final class LlmInputHelper {
    public static String toLlmInput(AnalysisResult result);
    public static String toLlmInput(AnalysisResult result, Map<String, Object> metadata);
    public static String toLlmInput(AnalysisResult result, Map<String, Object> metadata, ToLlmInputOptions options);
}

public final class ToLlmInputOptions {
public boolean isIncludeFields(); // default: true
public ToLlmInputOptions setIncludeFields(boolean includeFields);
public boolean isIncludeMarkdown(); // default: true
public ToLlmInputOptions setIncludeMarkdown(boolean includeMarkdown);
}

Three overloads follow Java conventions (no optional parameters). The simpler overloads delegate to the full one with null defaults.

Changes

  • New classes: LlmInputHelper, ToLlmInputOptions
  • Unit tests: 23 tests covering single/multi-page documents, audio/visual segments, classification, field extraction, metadata validation, page markers, warnings, and edge cases
  • Samples: Added Sample_Advanced_ToLlmInput (sync + async) for standalone demos; updated Sample01, Sample03, and Sample05 with toLlmInput integration
  • Version: Bumped to 1.1.0-beta.1

Cross-language alignment

This feature is aligned across all four Azure SDK languages:

Language API Shape
Python to_llm_input(result, *, include_fields=True, include_markdown=True, metadata=None)
C# result.ToLlmInput(metadata?, options?) (extension method)
Java 3 overloads: toLlmInput(result), toLlmInput(result, metadata), toLlmInput(result, metadata, options)
JS toLlmInput(result, options?) (single options bag)

Each language follows its idiomatic patterns — Python uses keyword-only args, C# uses extension methods with optional params, Java uses type-disambiguated overloads, JS uses a single options bag.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new public helper API in the Content Understanding Java SDK to convert AnalysisResult into LLM-ready text (YAML front matter + markdown), and updates samples/tests/docs to demonstrate and validate the feature.

Changes:

  • Introduces LlmInputHelper and ToLlmInputOptions to render analysis results into a prompt-friendly text format.
  • Adds unit tests and sample tests covering documents, content ranges, multi-segment audio/video, and classification scenarios.
  • Updates samples, README, and changelog to document and demonstrate toLlmInput.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
sdk/contentunderstanding/azure-ai-contentunderstanding/src/main/java/com/azure/ai/contentunderstanding/models/LlmInputHelper.java New helper that renders AnalysisResult into YAML front matter + markdown, including segmentation/classification logic.
sdk/contentunderstanding/azure-ai-contentunderstanding/src/main/java/com/azure/ai/contentunderstanding/models/ToLlmInputOptions.java Options bag controlling whether fields and/or markdown are included.
sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/LlmInputHelperTest.java Unit tests validating YAML/markdown rendering across content types and edge cases.
sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample_Advanced_ToLlmInputTest.java New sample test validating advanced sync scenarios (options, content ranges, segments, metadata).
sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample_Advanced_ToLlmInputAsyncTest.java New sample test validating advanced async scenarios (options, content ranges, segments, metadata).
sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample01_AnalyzeBinaryTest.java Extends existing sample test to exercise toLlmInput on document analysis results.
sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample01_AnalyzeBinaryAsyncTest.java Extends existing async sample test to exercise toLlmInput.
sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample03_AnalyzeInvoiceTest.java Extends existing invoice sample test to exercise toLlmInput for field extraction results.
sdk/contentunderstanding/azure-ai-contentunderstanding/src/test/java/com/azure/ai/contentunderstanding/tests/samples/Sample03_AnalyzeInvoiceAsyncTest.java Extends existing async invoice sample test to exercise toLlmInput.
sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample_Advanced_ToLlmInput.java New advanced sync sample demonstrating options, content ranges, multi-segment media, and metadata.
sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample_Advanced_ToLlmInputAsync.java New advanced async sample demonstrating the same scenarios as the sync sample.
sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample01_AnalyzeBinary.java Updates sample to show generating LLM-ready output via toLlmInput.
sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample01_AnalyzeBinaryAsync.java Updates async sample to show generating LLM-ready output via toLlmInput.
sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample03_AnalyzeInvoice.java Updates sample to show generating LLM-ready output via toLlmInput.
sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample03_AnalyzeInvoiceAsync.java Updates async sample to show generating LLM-ready output via toLlmInput.
sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample05_CreateClassifier.java Updates classifier sample to route Invoice segments and demonstrate toLlmInput on classification output.
sdk/contentunderstanding/azure-ai-contentunderstanding/src/samples/java/com/azure/ai/contentunderstanding/samples/Sample05_CreateClassifierAsync.java Updates async classifier sample to analyze and print LLM-ready text for classification output.
sdk/contentunderstanding/azure-ai-contentunderstanding/README.md Adds README section documenting LlmInputHelper.toLlmInput() with example output and link to advanced sample.
sdk/contentunderstanding/azure-ai-contentunderstanding/CHANGELOG.md Adds feature entry for toLlmInput and updates the release header.

Comment thread sdk/contentunderstanding/azure-ai-contentunderstanding/README.md Outdated
Comment thread sdk/contentunderstanding/azure-ai-contentunderstanding/README.md
Copy link
Copy Markdown
Member

@yungshinlintw yungshinlintw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the review comments.

@yungshinlintw yungshinlintw merged commit 170ca39 into main May 2, 2026
17 checks passed
@yungshinlintw yungshinlintw deleted the cu_sdk/llm_input_helper branch May 2, 2026 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants