refactor: Migrate Content Understanding from preview to GA and consolidate AI Services account#575
refactor: Migrate Content Understanding from preview to GA and consolidate AI Services account#575Harsh-Microsoft wants to merge 9 commits intodevfrom
Conversation
…ervices account Migrate Azure AI Content Understanding from 2024-12-01-preview to GA 2025-11-01 (ADO 41641), and consolidate the standalone Content Understanding Cognitive Services account into the existing unified Azure AI Services account (now hosting both Azure OpenAI and CU). Infra - Drop avmAiServices_cu module, contentUnderstandingPrivateEndpoint, and the contentUnderstandingLocation parameter from main.bicep and main_custom.bicep; mirror the changes in main.json. - Restrict azureAiServiceLocation @Allowed to the 11-region intersection where both CU GA and gpt-5.1 GlobalStandard are available. - Add two Cognitive Services User role assignments (API and Workflow managed identities) on the unified account so CU calls don't 403. - Re-route APP_CONTENT_UNDERSTANDING_ENDPOINT to the unified account. - Drop AZURE_ENV_CU_LOCATION mapping from main.parameters.json and main.waf.parameters.json. - Remove contentUnderstandingLocation override from .github/workflows/deploy.yml. Application code - Bump api-version to 2025-11-01 and switch to the GA REST surface: :analyzeBinary for stream payloads, knowledgeSources[] for training data, and /files/{id} for figure retrieval. - Update Pydantic models for GA: add Warning, relax Page optionals (angle/spans/words/lines), and surface the new top-level DocumentContent.paragraphs field. - Add unit tests for the new Warning model and relaxed Page optionals; bump existing apiVersion fixtures. Docs - CustomizingAzdParameters.md: drop AZURE_ENV_CU_LOCATION row, rewrite AZURE_ENV_AI_SERVICE_LOCATION row, and append a usageName note for the Standard deployment type. - LocalDevelopmentSetup.md: replace stale aicu-{suffix} reference. - TroubleShootingSteps.md: update the CU 403 row for the consolidated account name and DNS zones. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Coverage Report •
|
||||||||||||||||||||||||||||||
Replace /docs/re-use-*.md with relative paths so the lychee link checker resolves them. Pre-existing links flagged on this PR because the file was modified by the GA migration commit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR migrates the Content Processor’s Azure AI Content Understanding integration from the preview API to the GA API version and updates the infrastructure to consolidate Content Understanding into the existing unified Azure AI Services account (hosting both Azure OpenAI and Content Understanding).
Changes:
- Update the Content Understanding REST client + Pydantic response models to the GA API surface (
2025-11-01) and new endpoints/actions. - Refactor infra parameters/templates to remove the standalone CU account + location override and route CU endpoint to the unified AI Services account, including required RBAC assignments.
- Update tests and docs to reflect GA model shape changes and the consolidated account/private networking guidance.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/ContentProcessor/src/libs/azure_helper/content_understanding.py | Switch to GA API version and new :analyzeBinary/knowledgeSources//files/{id} behaviors. |
| src/ContentProcessor/src/libs/azure_helper/model/content_understanding.py | Update response models for GA (structured warnings, relaxed Page fields, add top-level paragraphs). |
| src/ContentProcessor/tests/unit/azure_helper/test_content_understanding_model.py | Update unit tests for GA apiVersion and structured warnings / optional Page fields. |
| src/tests/ContentProcessor/azure_helper/test_content_understanding_model.py | Mirror GA model/unit test updates in the parallel test suite. |
| infra/main.bicep | Remove CU-specific account/location + private endpoint, restrict AI Services regions, add RBAC on unified account, reroute CU endpoint. |
| infra/main_custom.bicep | Same as main.bicep for custom deployment path. |
| infra/main.json | Regenerated ARM template reflecting consolidation/removals and updated parameters/role assignments. |
| infra/main.parameters.json | Remove contentUnderstandingLocation parameter mapping. |
| infra/main.waf.parameters.json | Remove contentUnderstandingLocation parameter mapping for WAF deployments. |
| .github/workflows/deploy.yml | Stop passing contentUnderstandingLocation override during deployments. |
| docs/CustomizingAzdParameters.md | Remove AZURE_ENV_CU_LOCATION, update AI Services location guidance and Standard/GlobalStandard note. |
| docs/LocalDevelopmentSetup.md | Update example CU endpoint host to aif-{suffix} (unified account). |
| docs/TroubleShootingSteps.md | Update CU 403 troubleshooting guidance for unified account and DNS/private endpoint expectations. |
Comments suppressed due to low confidence (1)
src/ContentProcessor/src/libs/azure_helper/content_understanding.py:327
get_image_from_analyze_operation()is now documented as retrieving a generic generated file via/files/{id}, but it still (a) uses the parameter nameimage_idand (b) assertsContent-Type == "image/jpeg". That assertion can fail for non-JPEG file types (or if the service changes MIME types) and would raise unexpectedly in production. Either narrow the method back to images-only (and document that guarantee) or relax the check / return the content without asserting a specific MIME type.
def get_image_from_analyze_operation(
self, analyze_response: Response, image_id: str
):
"""Retrieves a generated file (e.g., a rendered page image) from a
completed analyze operation by its file id / path.
In Content Understanding GA the file-retrieval URL changed from
``{operationLocation}/images/{imageId}`` to
``{operationLocation}/files/{fileId}`` (where ``operationLocation`` now
ends in ``/analyzerResults/{operationId}``).
Args:
analyze_response (Response): The response object from the analyze operation.
image_id (str): The id (or path) of the file to retrieve.
Returns:
bytes: The file content as a byte string.
"""
operation_location = analyze_response.headers.get("operation-location", "")
if not operation_location:
raise ValueError(
"Operation location not found in the analyzer response header."
)
operation_location = operation_location.split("?api-version")[0]
image_retrieval_url = (
f"{operation_location}/files/{image_id}?api-version={self._api_version}"
)
try:
response = requests.get(url=image_retrieval_url, headers=self._headers)
response.raise_for_status()
assert response.headers.get("Content-Type") == "image/jpeg"
return response.content
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The existing Foundry must support both gpt-5.1 (GlobalStandard) and Content Understanding GA, otherwise deployment will fail with downstream model/analyzer errors. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Deploy may fail at provisioning time for unsupported gpt-5.1 region, or appear to succeed but break at runtime when Content Understanding GA is unavailable in the existing Foundry's region. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…oject.md Lychee CI rejects root-relative paths when no base dir is configured. Switch to relative ./DeploymentGuide.md paths matching the fix already applied to CustomizingAzdParameters.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 14 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/ContentProcessor/src/libs/azure_helper/content_understanding.py:327
get_image_from_analyze_operation()now targets the GA/files/{id}endpoint and the docstring describes retrieving a generic generated file, but the code still assertsContent-Type == image/jpeg. If the service returns a different media type (or omits the header), this will raiseAssertionErrorand bypass theRequestExceptionhandler. Consider removing the assert or validating/handling content types more robustly (and raising a normal exception on unexpected types).
try:
response = requests.get(url=image_retrieval_url, headers=self._headers)
response.raise_for_status()
assert response.headers.get("Content-Type") == "image/jpeg"
return response.content
Align the listed regions with the @Allowed list enforced by infra/main.bicep and infra/main_custom.bicep for the unified AI Services account (japaneast, southeastasia, uksouth added; northcentralus, switzerlandnorth, westus2 removed). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace Optional[...] = None and mutable [] defaults on Page.spans, Page.words, DocumentContent.paragraphs, and ResultData.warnings with Field(default_factory=list). Removes a possible TypeError when the confidence evaluator iterates page.words and aligns with the repo's Pydantic convention. Tests updated to assert empty-list defaults. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 14 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/ContentProcessor/src/libs/azure_helper/content_understanding.py:326
get_image_from_analyze_operation()now describes retrieving a generic generated file via the GA/files/{id}route, but it still assertsContent-Type == 'image/jpeg'. This will raise anAssertionErrorif the service returns a different image type (or any non-JPEG file), even when the download is successful. Consider removing the strict assert or relaxing it (e.g., validate animage/prefix only when the caller expects an image).
response = requests.get(url=image_retrieval_url, headers=self._headers)
response.raise_for_status()
assert response.headers.get("Content-Type") == "image/jpeg"
Pre-existing Optional[List[...]] = [] defaults silently parsed an explicit JSON null to None, which would cause a TypeError when the confidence evaluator iterates page.lines (line 132). Switch to the same Field(default_factory=list) pattern used for the other Page collections so these fields fail loudly at parse time on a malformed response and remain safe to iterate at every call site. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Migrate Azure AI Content Understanding from preview
2024-12-01-previewto GA2025-11-01, and consolidate the standalone Content Understanding Cognitive Services account into the existing unified Azure AI Services account (which now hosts both Azure OpenAI and Content Understanding).Why two changes in one PR
In preview, CU was only available in 3 regions, which forced us to provision it as a separate
Microsoft.CognitiveServices/accounts(aicu-*) alongside the OpenAI account (aif-*). With GA, CU's region footprint expanded enough that a single unified account can host both services in 11 common regions, so the migration and the consolidation make sense to ship together.Infra changes
avmAiServices_cumodule,contentUnderstandingPrivateEndpoint, and thecontentUnderstandingLocationparameter frommain.bicepandmain_custom.bicep; mirror inmain.json.azureAiServiceLocation@allowedto the 11-region intersection where both CU GA andgpt-5.1GlobalStandardare available.Cognitive Services Userrole assignments (API and Workflow managed identities) on the unified account so CU calls don't 403.APP_CONTENT_UNDERSTANDING_ENDPOINTto the unified account (CU lives at the samecognitiveservices.azure.comhost, only the path differs).AZURE_ENV_CU_LOCATIONmapping frommain.parameters.jsonandmain.waf.parameters.json.contentUnderstandingLocationoverride from.github/workflows/deploy.yml.Application code changes
2025-11-01and switch to the GA REST surface insrc/ContentProcessor/src/libs/azure_helper/content_understanding.py::analyzeBinaryfor stream payloads,knowledgeSources[]for training data, and/files/{id}for figure retrieval.Warning, relaxPageoptionals (angle/spans/words/lines), and surface the new top-levelDocumentContent.paragraphsfield.Warningmodel and relaxedPageoptionals; bump existingapiVersionfixtures.Docs
CustomizingAzdParameters.md: dropAZURE_ENV_CU_LOCATIONrow, rewriteAZURE_ENV_AI_SERVICE_LOCATIONrow, and append ausageNamenote for theStandarddeployment type.LocalDevelopmentSetup.md: replace staleaicu-{suffix}reference withaif-{suffix}.TroubleShootingSteps.md: update the CU 403 row for the consolidated account name and DNS zones.Validation
ruff checkpasses on touched files.pytest tests/unit/azure_helper/test_content_understanding_model.py— 15/15 passed.az bicep buildclean on bothmain.bicepandmain_custom.bicep.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com