feat: improve app insights telemetry#226
Open
Shreyas-Microsoft wants to merge 5 commits intodevfrom
Open
Conversation
Adds the small foundation needed for the rest of the App Insights
wiring on this branch:
- New `libs/logging/event_utils.py` with `track_event_if_configured`,
a tiny gated wrapper around `azure.monitor.events.extension.track_event`
that no-ops when `APPLICATIONINSIGHTS_CONNECTION_STRING` is unset.
Lazy-imports the SDK and swallows export errors so telemetry can
never break a request. Single warn-once latch keeps logs clean in
unit tests / local dev.
- New `libs/logging/span_filters.py` with two custom `SpanProcessor`
implementations:
* `DropASGIResponseBodySpanProcessor` strips `http.response.body`
child spans (one per streamed chunk) so streamed downloads / SSE
do not flood Application Insights.
* `DropCosmosDependencySpanProcessor` drops Cosmos DB dependency
spans (matched on `db.system==cosmosdb` and the
`documents.azure.com` peer host) so the Application Map is not
dominated by per-call Cosmos operations.
- Adds `azure-monitor-events-extension` and (explicitly)
`opentelemetry-instrumentation-fastapi` to `pyproject.toml`, then
refreshes `uv.lock` via `uv lock` (no hand edits).
No call sites change in this commit; the new helpers are wired into
`Application.initialize` and the routers in the next two commits.
Implements groundwork for AC #1, #2, #3, #4 of AB#37816.
Work item: https://dev.azure.com/CSACTOSOL/CSA%20Solutioning/_workitems/edit/37816
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ation.initialize Glues the helpers from the previous commit into the FastAPI startup path so AC #1, #2, #3, #4 of AB#37816 are satisfied at the code layer: `Application.initialize` - New `_configure_azure_monitor`: when `APPLICATIONINSIGHTS_CONNECTION_STRING` is set, calls `azure.monitor.opentelemetry.configure_azure_monitor` with `enable_live_metrics=True` and the two custom span processors (`DropASGIResponseBodySpanProcessor`, `DropCosmosDependencySpanProcessor`) added in the previous commit. Connection string is read from env, never logged. - New `_instrument_fastapi`: attaches `FastAPIInstrumentor.instrument_app(self.app, excluded_urls="health,startup")` AFTER all routers are registered so every business route is instrumented but the liveness / startup probes do not flood Application Insights with empty request rows. - Both helpers fail-soft: any exception during telemetry setup is logged and swallowed, so a misconfigured App Insights instance can never block backend startup. - Both helpers are no-ops when the connection string is unset, so local dev / unit tests behave exactly as they did before. `Application_Base` - Adds a `_NOISY_LOGGER_PACKAGES` hard-suppression list (`azure.core.pipeline.policies.http_logging_policy`, `azure.cosmos`, `opentelemetry.sdk`, `azure.monitor.opentelemetry.exporter.export._base`) clamped to WARNING regardless of the operator-supplied `AZURE_LOGGING_PACKAGES`. Without this clamp, the App Insights logs view is dominated by per- request HTTP policy logs and per-call Cosmos diagnostics. - The clamp is one-way: never lowers a level the operator has explicitly raised below WARNING. `.env.example` - Documents `APPLICATIONINSIGHTS_CONNECTION_STRING` (commented) so local dev mirrors the Bicep-injected production env. Validation - `uv run pytest src/tests --no-cov -c /dev/null` -> 44 passed (with `PYTHONPATH=src/app`, matching the existing repo convention). - Manual `Application()` smoke with the env var unset -> single INFO line, 26 routes registered, no exceptions. - Manual `Application()` smoke with a fake conn string -> Azure Monitor configured, FastAPIInstrumentor attached, exporter fails-soft on DNS resolution, app continues. Implements AC #1, #2, #3, #4 of AB#37816. Work item: https://dev.azure.com/CSACTOSOL/CSA%20Solutioning/_workitems/edit/37816 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds Application Insights custom-event emission and per-request span annotation to every business handler in: - routers/router_process.py - routers/router_files.py - routers/router_debug.py For each handler: - On success, emits a `<RouteName>Success` event via `track_event_if_configured` with the relevant domain ids (process_id, file_id, filename, counts, kill_state, ...). - On exception, emits a `<RouteName>Error` event carrying `error`, `error_type`, and (when available) `status_code`. Then calls `current_span.record_exception(e)` and `set_status(StatusCode.ERROR)` so the failure is also visible in the App Insights end-to-end transaction view. - Stamps domain ids onto the active span via `set_attribute` so custom queries on `requests | extend customDimensions` can pivot by `process_id` / `file_id` directly without re-parsing message strings. Two small private helpers `_annotate_span` and `_record_exception_on_span` are duplicated into both `router_process.py` and `router_files.py` (rather than placed in `libs/logging/`) to keep the OpenTelemetry import surface visible at the call site and to avoid a new public helper that the rest of the codebase might accidentally depend on. `routers/http_probes.py` is left untouched — its routes match the `excluded_urls="health,startup"` configured on `FastAPIInstrumentor` in the previous commit and so are intentionally not telemetered. The pre-existing `ILoggerService.log_info(f"...")` / `log_error(...)` calls are preserved as-is. The new code emitted by this commit uses `%s` lazy formatting in its log strings. Validation - `uv run pytest src/tests --no-cov -c /dev/null` -> 44 passed (with `PYTHONPATH=src/app`). - `Application()` smoke-init -> 26 routes registered, all routers import cleanly. Implements AC #3 of AB#37816 (telemetry visibility) and contributes to AC #4 (continuous log collection). Work item: https://dev.azure.com/CSACTOSOL/CSA%20Solutioning/_workitems/edit/37816 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ts gating
Adds the first dedicated test package for the new
`libs/logging/event_utils.py` helper at
`src/tests/logging/test_event_utils.py` (mirroring the existing
`src/tests/<area>/test_<thing>.py` layout — no new tooling, no new
config).
Coverage:
- `test_no_op_when_connection_string_unset`: with the env var absent,
the helper does NOT call `track_event` and the SDK module is never
imported (verified via a `sys.modules` fake).
- `test_warning_fires_only_once_per_process`: subsequent unconfigured
calls are silent (one-shot warning latch).
- `test_unconfigured_warning_message_is_stable`: pins the exact
warning template that the rest of the system asserts against.
- `test_forwards_to_track_event_when_configured`: with the env var
set, the helper forwards the (name, properties) pair through to
`azure.monitor.events.extension.track_event`.
- `test_none_properties_normalised_to_empty_dict`: `properties=None`
surfaces as `{}` to the SDK so `customDimensions` is always an
object.
- `test_whitespace_only_connection_string_is_treated_as_unset`:
belt-and-braces gating against accidentally-blank env values.
- `test_sdk_exception_is_swallowed`: a `RuntimeError` thrown by the
SDK is caught and logged; telemetry must never break a request.
The tests use a `sys.modules` fake for `azure.monitor.events.extension`
so they never import the real Azure Monitor SDK at runtime — keeps
the suite hermetic, fast (<1s), and CI-friendly.
Validation
- `uv run pytest src/tests --no-cov -c /dev/null` -> 51 passed
(44 pre-existing + 7 new), with `PYTHONPATH=src/app`.
Implements verification scope of AC #3 of AB#37816.
Work item: https://dev.azure.com/CSACTOSOL/CSA%20Solutioning/_workitems/edit/37816
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ts AVM module The AVM `insights/component@0.6.0` module already wires Application Insights to Log Analytics through the `workspaceResourceId` parameter (workspace-based App Insights). Adding a separate `diagnosticSettings` entry pointing at the SAME workspace causes the platform-level `AppAvailabilityResults`, `AppRequests`, etc. tables to be ingested twice — once via the workspace-based App Insights pipeline and again via the diagnostic-settings forwarder. Removes the redundant line from both: - `infra/main.bicep` (line 305) - `infra/main_custom.bicep` (line 283) The custom-event / OpenTelemetry path the rest of this branch wires up sends data straight to App Insights via `APPLICATIONINSIGHTS_CONNECTION_STRING`, which is already injected into the backend-api and processor container apps by the same Bicep files (no change required to those env-var blocks). Validation - `bicep build infra/main.bicep --outfile <tmp>` -> OK, no diagnostics. - `bicep build infra/main_custom.bicep --outfile <tmp>` -> OK, only a pre-existing BCP334 warning at line 694 (unrelated to this edit). Implements AC #4 of AB#37816 (continuous, non-duplicated log collection). Reference: microsoft/Conversation-Knowledge-Mining-Solution-Accelerator#811 Work item: https://dev.azure.com/CSACTOSOL/CSA%20Solutioning/_workitems/edit/37816 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This pull request introduces robust Application Insights and OpenTelemetry integration for the backend API, providing production-grade observability while minimizing noise and ingestion costs. It adds environment-driven configuration for telemetry, custom span processors to filter noisy spans (such as per-chunk ASGI responses and Cosmos DB dependencies), and a lightweight helper for emitting structured business events. Logging verbosity for known noisy packages is also clamped to keep logs actionable. The infrastructure Bicep templates are updated to avoid duplicate log ingestion.
Observability and Telemetry Integration:
APPLICATIONINSIGHTS_CONNECTION_STRINGenvironment variable. This includes wiring up Azure Monitor and FastAPI instrumentation, with exclusion of health/startup probe routes from telemetry to avoid flooding. [1] [2] [3] [4] [5]libs/logging/event_utils.pywith thetrack_event_if_configuredhelper for emitting custom App Insights events, which is safe to call even when telemetry is not configured. [1] [2] [3]Noise Suppression and Filtering:
SpanProcessorimplementations (DropASGIResponseBodySpanProcessor,DropCosmosDependencySpanProcessor) inlibs/logging/span_filters.pyto filter out high-volume, low-value spans before export, reducing cost and improving signal-to-noise. [1] [2] [3]Infrastructure Improvements:
infra/main.bicep,infra/main_custom.bicep) to document and prevent duplicate log ingestion by removing redundantdiagnosticSettingswhen Application Insights is already wired to Log Analytics viaworkspaceResourceId. [1] [2]These changes collectively ensure that the backend API emits actionable telemetry and business events to Application Insights with minimal configuration, while keeping the signal clean and ingestion costs predictable.
Does this introduce a breaking change?
Golden Path Validation
Deployment Validation
What to Check
Verify that the following are valid
Other Information