diff --git a/modules/ROOT/pages/chapter09/index.adoc b/modules/ROOT/pages/chapter09/index.adoc index 912de480..f0a40ab0 100644 --- a/modules/ROOT/pages/chapter09/index.adoc +++ b/modules/ROOT/pages/chapter09/index.adoc @@ -1,22 +1,26 @@ = MicroProfile Telemetry -Microservices-based applications have better scalability, flexibility, and resilience, but they suffer from additional challenges regarding availability and performance monitoring. This makes observability critical to ensure these distributed systems operate reliably. +Microservices-based applications offer scalability and resilience, but they introduce challenges around observability. Tracking requests across multiple services and diagnosing failures quickly requires structured telemetry data. -MicroProfile Telemetry specification provides a set of vendor-neutral APIs for instrumenting, collecting, and exporting telemetry data such as traces, metrics, and logs. It is built on the foundation of https://opentelemetry.io/[OpenTelemetry] from the https://www.cncf.io/[Cloud Native Computing Foundation (CNCF)] project, an open-source observability framework. +MicroProfile Telemetry 2.1 provides a vendor-neutral API for collecting and exporting the three pillars of observability: *traces*, *metrics*, and *logs*. It is built on https://opentelemetry.io/[OpenTelemetry], the https://www.cncf.io/[CNCF] observability framework, so telemetry data from MicroProfile applications integrates seamlessly with industry-standard backends. In this chapter, we will explore the fundamentals of MicroProfile Telemetry, covering topics such as tracing concepts, instrumenting Telemetry, setting up tracing providers, context propagation and correlation, analyzing traces, security considerations for tracing, and more. By the end of this chapter, you will learn how to effectively leverage distributed tracing for debugging, performance monitoring, and system optimization. +We will instrument our payment microservice with MicroProfile Telemetry, exporting all three signal types to an LGTM observability stack (Loki, Grafana, Tempo, Prometheus), and explore how to analyze that data to understand, debug, and optimize microservices. + == Topics to be covered * Introduction to MicroProfile Telemetry -* Tracing Concepts -** Spans +* Tracing Concepts +** Spans ** Traces ** Context Propagation ** Correlation -* Instrumenting OpenTelemetry +* Instrumenting Telemetry +** Traces, Metrics, and Logs +** Automatic and Manual Instrumentation * Tools for Trace Analysis -* Exporting the Traces +* Setting Up the Observability Stack * Types of Telemetry * Agent Instrumentation * Analyzing Traces @@ -28,98 +32,140 @@ MicroProfile Telemetry addresses the operational challenges inherent in modern m Some of the key challenges in microservices-based applications include: -* *Complexity due to Distributed Architecture*: Microservices are often deployed across multiple nodes, containers, or cloud environments, making it challenging to track requests as they move through the system. This lack of visibility increases debugging complexity, making it harder to identify bottlenecks and analyze system behavior. -* *Polyglot Architecture*: Microservices are developed using multiple programming languages (e.g., Java, Python, and Go) and frameworks, resulting in inconsistent telemetry data and a lack of standardization in observability. This fragmentation makes correlating logs, traces, and metrics across services difficult. -* *Latency*: Communication between Microservices involves latency, and all of this adds up as requests traverse several services. This makes it difficult to identify the root causes of issues. -Ensuring High Availability: Failures in one microservice can affect the entire system, impacting multiple dependent microservices. This can lead to downtime or degraded performance, resulting in lost revenue and diminished user trust. +* *Distributed Architecture*: Microservices are often deployed across multiple nodes, containers, or cloud environments, making it difficult to track requests as they move through the system. +* *Polyglot Architecture*: Microservices developed in multiple languages and frameworks produce inconsistent telemetry, making it hard to correlate logs, traces, and metrics across services. +* *Latency*: Communication between microservices introduces latency that compounds as requests traverse several services, making root-cause analysis difficult. +* *Cascading Failures*: A failure in one microservice can propagate through dependent services, causing downtime or degraded performance. + +MicroProfile Telemetry 2.1 provides a standardized set of CDI-injectable APIs covering all three OpenTelemetry signal types: + +[cols="1,3", options="header"] +|=== +|Signal |Purpose -To address these challenges, MicroProfile Telemetry specification provides a standardized set of APIs for capturing telemetry data, including trace information and context propagation, to improve observability in distributed systems. By enabling seamless tracing, developers can analyze system behavior, troubleshoot service interactions, and ensure application reliability. +|*Traces* +|Track request flow across services, using spans, attributes, events, and errors. -MicroProfile Telemetry is vendor-neutral. It allows developers to switch between different OpenTelemetry implementations without modifying their application code. This flexibility ensures that MicroProfile applications can easily integrate with various observability platforms, making it easier to adopt, scale, and maintain Telemetry in modern cloud-native environments. +|*Metrics* +|Measure system behavior over time with counters, gauges, and histograms + +|*Logs* +|Record structured application log events correlated with traces +|=== + +Because MicroProfile Telemetry is vendor-neutral, you can switch between observability backends (Jaeger, Grafana Tempo, Zipkin, commercial APM tools) without changing application code. == Tracing Concepts -Tracing is critical for observability. It allows developers to inspect the flow of requests as they traverse through distributed systems. Tracing provides visibility into the interactions and dependencies within a system by breaking down a request into multiple spans, and connecting them into traces with context propagated across services. +Tracing is critical for observability. It allows developers to inspect the flow of requests as they traverse distributed systems. Tracing provides visibility into service interactions by breaking down a request into *spans* and connecting them into *traces* with context propagated across service boundaries. === Spans -A *span* is the basic unit of work in tracing. It represents a single operation or task a service performs, such as an HTTP request, a database query, or a computation. Each span contains metadata, including: +A *span* is the basic unit of work in tracing. It represents a single operation a service performs, such as an HTTP request, a database query, or a computation. Each span contains: -* *Operation Name*: Describes the activity (e.g., HTTP GET /products). +* *Operation Name*: Describes the activity (e.g., `POST /payment/authorize`). * *Start Time and Duration*: Captures when the operation started and how long it took. -* *Attributes*: Key-value pairs providing context (e.g., user IDs, resource names, HTTP status codes). -* *Parent Span ID*: Indicates the parent span, forming a relationship within a trace. - -Spans may also include additional data like logs and events, which help provide a detailed view of the operation's lifecycle. Spans are connected to form a trace, which helps identify bottlenecks and performance issues. +* *Attributes*: Key-value pairs providing context (e.g., `payment.amount`, `http.status_code`). +* *Events*: Timestamped annotations attached to the span (e.g., "Payment processed successfully"). +* *Status*: Whether the operation succeeded or failed. +* *Parent Span ID*: Links this span to its parent, forming a hierarchy. === Traces -A *trace* is a collection of related spans representing the end-to-end execution of a request or transaction. It provides a holistic view of how a single request flows through the system, including service interactions. Traces often form a tree structure, where the root span represents the entry point (e.g., a user request), and child spans represent subsequent operations. - -For example: -``` -API Gateway (Root Span) + -│ -├── Order Service (Child Span) + -│ │ -│ ├── Database Query (Another Child Span) + -│ │ ├── Fetch Order Details + -│ │ ├── Process Order Data + -│ │ └── Return Data to Order Service + -│ │ -│ └── Return Response to API Gateway + -│ -└── API Gateway Sends Final Response to User -``` +A *trace* is a collection of related spans representing the end-to-end execution of a request. It provides a holistic view of how a request flows through the system. Traces form a tree, where the root span is the entry point and child spans represent subsequent operations. + +---- +POST /payment/authorize (Root Span) +│ +├── payment.process (Child Span) +│ ├── Starting payment processing [event] +│ ├── Payment processed successfully [event] +│ └── payment.status = SUCCESS [attribute] +│ +└── HTTP Response 200 +---- === Context Propagation -*Context propagation* refers to the mechanism of carrying trace-related metadata, such as *trace IDs* and *span IDs*, across service and thread boundaries. This ensures that all spans created during a request can be linked together to form a complete trace. +*Context propagation* carries trace-related metadata — trace IDs and span IDs — across service and thread boundaries. This ensures that all spans created during a request can be linked into a single complete trace, regardless of how many services the request traverses. === Correlation -Context propagation is vital for connecting distributed spans and understanding their relationship ensuring trace metadata remains correlated as it travels with requests across service boundaries. -*Correlation* is the process of associating related spans and traces across multiple services and threads to form a cohesive view of a transaction. Correlation enables developers to: +*Correlation* associates spans and traces across services to form a cohesive view of a transaction. It enables developers to: * Identify the source of bottlenecks or errors in distributed systems. -* Understand the dependencies and interactions between services. +* Understand dependencies and interactions between services. -When viewing logs, the +traceId+ and +spanId+ allow you to link specific log entries to the corresponding spans in your tracing system. +When viewing logs, the `traceId` and `spanId` link specific log lines to the corresponding spans in your tracing system: -* *Trace ID*: A unique identifier shared across all spans in a single trace. -* *Span ID*: A unique identifier for a single span. It is linked to a parent span, forming a hierarchy. - -Together, these concepts form the foundation of distributed tracing, enabling developers to monitor, analyze, and optimize the performance of their microservices effectively. +* *Trace ID*: A unique identifier shared by all spans in a single trace. +* *Span ID*: A unique identifier for a single span, linked to a parent span ID. == Instrumenting Telemetry -MicroProfile Telemetry simplifies instrumentation by integrating OpenTelemetry for distributed tracing. The following steps outline how to instrument telemetry in a MicroProfile E-Commerce application. +MicroProfile Telemetry 2.1 integrates OpenTelemetry directly into the CDI container. The following steps demonstrate how to instrument a MicroProfile payment service. -=== *Step 1: Add the MicroProfile Telemetry Dependency* +=== Step 1: Add the MicroProfile Dependency -To enable tracing and exporting of telemetry data, include the MicroProfile Telemetry API dependency in your `pom.xml` file. +The `microProfile-7.1` platform includes MicroProfile Telemetry 2.1. Add it to your `pom.xml`: [source, xml] ---- - + + + org.eclipse.microprofile + microprofile + 7.1 + pom + provided + + + - org.eclipse.microprofile.telemetry - microprofile-telemetry-api - 1.1 - provided + io.opentelemetry + opentelemetry-api + 1.48.0 + provided ---- -=== *Step 2: Create a Tracer* +Enable the `mpTelemetry` feature in `server.xml`: + +[source, xml] +---- + + microProfile-7.1 + jakartaEE-10.0 + mpTelemetry + mpFaultTolerance + + +---- + +=== Step 2: Enable the OpenTelemetry SDK + +By default, MicroProfile Telemetry tracing is disabled. Set the following in `src/main/resources/META-INF/microprofile-config.properties` to activate it: + +[source, properties] +---- +# Enable OpenTelemetry SDK +otel.sdk.disabled=false + +# Service name appears in all traces, metrics, and logs +otel.service.name=payment-service +---- -MicroProfile automatically traces requests, but you can manually instrument your code using OpenTelementry APIs. +=== Step 3: Inject Tracer and Meter -A *Tracer* is a core component of OpenTelemetry, responsible for *creating spans* and *managing trace data* within the application. To use it, inject a +Tracer+ instance into your MicroProfile service: +MicroProfile Telemetry 2.1 exposes `Tracer` and `Meter` as CDI beans. Inject them directly into your service — do not use `GlobalOpenTelemetry.get*()`: [source, java] ---- +import io.opentelemetry.api.metrics.LongCounter; +import io.opentelemetry.api.metrics.Meter; import io.opentelemetry.api.trace.Tracer; -import io.opentelemetry.api.trace.Span; + +import jakarta.annotation.PostConstruct; import jakarta.enterprise.context.ApplicationScoped; import jakarta.inject.Inject; @@ -127,305 +173,400 @@ import jakarta.inject.Inject; public class PaymentService { @Inject - Tracer tracer; + Tracer tracer; // <1> - public void processPayment(String orderId, double amount) { - // Create a custom span for tracing the payment process - Span span = tracer.spanBuilder("payment.process").startSpan(); - - try { - span.setAttribute("order.id", orderId); - span.setAttribute("payment.amount", amount); - span.setAttribute("payment.status", "IN_PROGRESS"); - - // Business logic for processing the payment - executePayment(orderId, amount); - - span.setAttribute("payment.status", "SUCCESS"); - } catch (Exception e) { - span.setAttribute("payment.status", "FAILED"); - span.recordException(e); - } finally { - span.end(); - } - } + @Inject + Meter meter; // <2> - private void executePayment(String orderId, double amount) { - System.out.println("Processing payment for Order ID: " + orderId + ", Amount: " + amount); + private LongCounter paymentAttemptsCounter; + + @PostConstruct + public void init() { + paymentAttemptsCounter = meter // <3> + .counterBuilder("payment.attempts") + .setDescription("Number of payment attempts by result") + .setUnit("1") + .build(); } } ---- +<1> Used per-request to create spans. +<2> Used at startup to create metric instruments. +<3> Build instruments in `@PostConstruct` so they are registered once. Instruments are reused for every recording. -The implementation injects a `Tracer`, which enables manual span creation and precise trace management within the application. By creating a custom span (+payment.process+), it captures detailed telemetry data related to the payment process. Additionally, custom attributes such as `order.id`, `payment.amount`, and `payment.status` are attached to the span, providing valuable metadata for trace analysis. The implementation also includes exception handling, ensuring that any failures encountered during payment processing are properly recorded in the trace. Finally, the span is explicitly ended, marking the completion of tracing for this method. - -This setup ensures that each payment transaction is fully traceable, allowing developers to monitor execution flow, debug issues, and optimize application performance effectively. - -=== *Step 3: Create a Span* +=== Step 4: Create Spans and Record Metrics -Use the Tracer to create a span that represents a specific operation or activity in your application: +The following shows the complete `processPayment` method instrumented with manual tracing and custom metrics: [source, java] ---- -Span span = tracer.spanBuilder("my-span").startSpan(); +import io.opentelemetry.api.common.AttributeKey; +import io.opentelemetry.api.common.Attributes; +import io.opentelemetry.api.trace.Span; +import io.opentelemetry.api.trace.StatusCode; +import io.opentelemetry.context.Scope; + +public CompletionStage processPayment(PaymentDetails paymentDetails) + throws PaymentProcessingException { + + Span span = tracer.spanBuilder("payment.process") // <1> + .setAttribute("payment.amount", paymentDetails.getAmount().toString()) + .setAttribute("payment.method", "credit_card") + .setAttribute("payment.service", "payment-service") + .startSpan(); + + try (Scope scope = span.makeCurrent()) { // <2> + span.setAttribute("payment.status", "IN_PROGRESS"); + span.addEvent("Starting payment processing"); // <3> + + // ... business logic ... + + paymentAttemptsCounter.add(1, + Attributes.of(AttributeKey.stringKey("result"), "success")); // <4> + span.setAttribute("payment.status", "SUCCESS"); + span.setStatus(StatusCode.OK); // <5> + span.addEvent("Payment processed successfully"); + return CompletableFuture.completedFuture("{\"status\":\"success\"}"); + } catch (Exception e) { + paymentAttemptsCounter.add(1, + Attributes.of(AttributeKey.stringKey("result"), "failed")); + span.setStatus(StatusCode.ERROR, "Payment processing failed"); + span.recordException(e); // <6> + throw e; + } finally { + span.end(); // <7> + } +} ---- +<1> `spanBuilder` names the span and pre-sets static attributes before the span starts. +<2> `makeCurrent()` activates the span on the current thread so child spans or log events link to it automatically. Use try-with-resources to ensure it is always closed. +<3> Events are timestamped annotations — useful for recording discrete steps within a span. +<4> Increment a custom metric counter with a `result` attribute so Prometheus can break it down by outcome. +<5> `setStatus` sets the OpenTelemetry span status — separate from HTTP status codes. +<6> `recordException` captures the exception stack trace inside the span for trace-based debugging. +<7> Always end the span in `finally` to guarantee trace completeness even on exceptions. + +== Types of Telemetry + +MicroProfile Telemetry supports three approaches to instrumentation. + +=== Automatic Instrumentation + +Automatic instrumentation enables distributed tracing for Jakarta RESTful Web Services and MicroProfile REST Clients *without code changes*. When the OpenTelemetry SDK is enabled, incoming HTTP requests and outgoing REST client calls are automatically traced following OpenTelemetry semantic conventions. + +For example, a POST request to `/payment/authorize` automatically creates a root span named `POST /payment/authorize` with standard HTTP attributes (`http.method`, `http.route`, `http.status_code`) — no additional code required. + +=== Manual Instrumentation -The method `spanBuilder("my-span")` creates a new named span, which represents a specific operation within the application's execution flow. This helps in tracing and monitoring the operation as part of a distributed system. Calling `startSpan()` marks the beginning of the span lifecycle, ensuring that the span is actively recorded until it is explicitly ended. This allows telemetry data to be captured for performance analysis, debugging, and observability. +Manual instrumentation gives developers fine-grained control over trace data. -=== *Step 4: Add Attributes to the Span* +==== Using the @WithSpan Annotation -Attributes enhance trace context by attaching key-value pairs to a span, providing additional metadata that helps filter and analyze traces in observability tools. This helps in contextualizing the trace data: +The `@WithSpan` annotation creates a new span for a method automatically, linked to the current trace context: [source, java] ---- -span.setAttribute("http.method", "GET"); -span.setAttribute("http.url", "/products/12345"); -span.setAttribute("user.id", "98765"); +import io.opentelemetry.instrumentation.annotations.WithSpan; +import jakarta.enterprise.context.ApplicationScoped; + +@ApplicationScoped +public class PaymentService { + + @WithSpan + public void processPayment(String orderId) { + // A span named "PaymentService.processPayment" is created automatically + } +} ---- -The above statements allow the tracing system to capture essential details about an HTTP request. +Every time `processPayment` is called, a new span is created and linked to the active trace. No lifecycle management is needed. -=== *Step 5: End the Span* +==== Using SpanBuilder for Custom Spans -When the operation completes, end the span to capture the telemetry data: +For full control over span names, attributes, and lifecycle, use the `SpanBuilder` API directly: [source, java] ---- -Span span = tracer.spanBuilder("payment.process").startSpan(); +import io.opentelemetry.api.trace.Tracer; +import io.opentelemetry.api.trace.Span; +import jakarta.inject.Inject; +import jakarta.ws.rs.GET; +import jakarta.ws.rs.Path; + +@Path("/trace") +public class TraceResource { + + @Inject + Tracer tracer; -try { - // Business logic execution -} catch (Exception e) { - span.recordException(e); - span.setAttribute("error", true); -} finally { - span.end(); + @GET + @Path("/custom") + public String customTrace() { + Span span = tracer.spanBuilder("custom-span").startSpan(); + try (var scope = span.makeCurrent()) { + span.setAttribute("custom.key", "customValue"); + return "Trace recorded"; + } finally { + span.end(); + } + } } ---- -== Tools for Trace Analysis +=== All Three Signal Types -The following tools are commonly used for trace collection, visualization, and analysis in MicroProfile applications: +MicroProfile Telemetry 2.1 exports traces, metrics, *and* logs over OTLP. Configure all three exporters in `microprofile-config.properties`: -=== OpenTelemetry Collector +[source, properties] +---- +# MicroProfile Telemetry Configuration +otel.service.name=payment-service +otel.sdk.disabled=false -The https://opentelemetry.io/docs/collector/[OpenTelemetry Collector] is an open-source telemetry processing system that acts as an intermediary between instrumented applications and observability backends such as Jaeger, Zipkin, and Prometheus. It is designed to receive, process, and export tracing data, making it a powerful tool for managing distributed traces in MicroProfile applications. +# Export all three signal types via OTLP +otel.traces.exporter=otlp +otel.metrics.exporter=otlp +otel.logs.exporter=otlp -It is vendor-agnostic, which allows for seamless integration with multiple tracing backends without requiring any changes to application instrumentation. It supports multiple data formats, enabling the ingestion of traces through several protocols, ensuring compatibility across different telemetry sources. Additionally, it offers processing pipelines that let developers filter, batch, and transform trace data before exporting it, optimizing observability workflows. +# OTLP gRPC endpoint (OTel Collector) +otel.exporter.otlp.endpoint=http://localhost:4317 -Designed for scalability, the OpenTelemetry Collector can be deployed as a standalone instance or distributed across multiple nodes, making it suitable for both small-scale applications and large enterprise-grade distributed systems. +# Sampling — always sample, respecting parent decision +otel.traces.sampler=parentbased_always_on +---- -=== Jaeger +== Tools for Trace Analysis -https://www.jaegertracing.io/[Jaeger] is an open-source distributed tracing system developed by Uber, widely used for monitoring microservices and visualizing request flows in cloud-native applications. It provides a powerful visualization interface that enables developers to inspect traces, analyze dependencies between services, and examine execution timelines, making it an essential tool for debugging performance bottlenecks. +The following tools are commonly used for trace collection, visualization, and analysis. -One of Jaeger’s key capabilities is service dependency analysis, which helps identify how microservices interact, providing insights into latency, failures, and request propagation. It also supports adaptive sampling strategies, allowing developers to control the volume of traces collected to optimize performance without overwhelming storage and processing resources. Additionally, Jaeger offers built-in storage options, allowing trace data to be stored in Elasticsearch, Cassandra, or Kafka, making it scalable and flexible for various deployment environments. +=== Grafana Tempo -=== Zipkin +https://grafana.com/oss/tempo/[Grafana Tempo] is a distributed tracing backend that uses only object storage — no indexing required. This makes it highly scalable and cost-efficient for high-volume trace data. -https://zipkin.io/[Zipkin] is a distributed tracing system designed to help developers visualize and diagnose latency issues in microservices-based applications. It provides a lightweight and fast tracing solution, making it ideal for quick deployment with minimal resource usage. Its simplicity and efficiency make it a popular choice for teams looking to implement tracing without significant infrastructure overhead. +Tempo integrates tightly with Grafana dashboards, enabling developers to correlate logs, metrics, and traces within a single observability platform. It ingests traces from OpenTelemetry, Jaeger, and Zipkin sources, and is the tracing backend used in this chapter. -One of Zipkin’s core strengths is its tag-based searching, which allows developers to filter traces based on metadata such as service name, request ID, or other custom attributes, enabling quick identification of relevant traces. It also offers dependency graph visualization, helping to uncover bottlenecks and inefficiencies in microservices interactions. To accommodate different storage needs, Zipkin supports multiple storage backends, including Elasticsearch, MySQL, and Cassandra, providing flexibility for various deployment scenarios. +=== OpenTelemetry Collector -=== Grafana Tempo +The https://opentelemetry.io/docs/collector/[OpenTelemetry Collector] acts as a vendor-agnostic telemetry gateway. It receives traces, metrics, and logs from instrumented applications and fans them out to multiple backends. It also supports processing pipelines for filtering, batching, and transforming data before export. -https://grafana.com/oss/tempo/[Grafana Tempo] is a distributed tracing backend. Unlike Jaeger and Zipkin, Tempo does not require indexing as it only requires object storage, making it highly scalable and cost-efficient for handling large volumes of trace data. This unique approach allows Tempo to store traces efficiently without increasing storage and query overhead, making it an ideal choice for high-performance microservices environments. -One of Tempo’s key advantages is its tight integration with Grafana dashboards, enabling developers to correlate logs, metrics, and traces within a unified observability platform. Additionally, Tempo offers multi-backend support, meaning it can ingest and process trace data from OpenTelemetry, Jaeger, and Zipkin sources, ensuring compatibility with existing tracing setups. Its scalability makes it well-suited for large-scale microservices architectures, where efficiently managing distributed tracing data is crucial. +=== Jaeger -== Exporting the Traces +https://www.jaegertracing.io/[Jaeger] is an open-source distributed tracing system with a powerful visualization UI. It supports adaptive sampling, service dependency analysis, and multiple storage backends (Elasticsearch, Cassandra). Jaeger is well-suited for standalone deployments where Grafana integration is not required. -To export the traces we need to configure the exporter type and endpoint in the `src/main/resources/META-INF/microprofile-config.properties`. -For using OTLP (OpenTelemetry Protocol) export, you need to add the following configuration in: +=== Zipkin -[source] ----- -# Enable OpenTelemetry -otel.traces.exporter=otlp +https://zipkin.io/[Zipkin] is a lightweight tracing system that is quick to deploy with minimal resource usage. It offers tag-based searching and dependency graph visualization. Zipkin is a good choice for teams wanting distributed tracing with minimal infrastructure overhead. -# Set the OTLP exporter endpoint -otel.exporter.otlp.endpoint=http://localhost:4317 +== Setting Up the Observability Stack -# Define the service name -otel.service.name=payment-service +This chapter uses the *LGTM stack* — Loki (logs), Grafana (dashboards), Tempo (traces), and Prometheus (metrics) — fronted by an OpenTelemetry Collector. -# Sampling rate: (1.0 = always, 0.5 = 50%, 0.0 = never) -otel.traces.sampler=parentbased_always_on +[source] +---- +Payment Service (Open Liberty) + │ MicroProfile Telemetry 2.1 + │ OTLP gRPC → otel.exporter.otlp.endpoint + ▼ +OpenTelemetry Collector + ├──► Tempo (traces) + ├──► Loki (logs) + └──► Prometheus (metrics, via scrape endpoint) + │ + ▼ + Grafana (unified dashboard) ---- -This sends traces directly to a observability tool, enabling real-time distributed tracing and performance monitoring. To ensure proper tracing, your observability tool (for e.g. Jaeger) must be running to receive trace data. - -Using OTLP is advantageous because it is the native standard for OpenTelemetry, ensuring seamless integration with a wide range of observability tools. One of its key benefits is that it allows developers to use multiple observability platforms without changing instrumentation, providing a unified and vendor-neutral tracing solution. +=== Starting the Stack -=== Verify the Traces +A `docker-compose.yml` in `code/chapter09/` defines all five services. Start from that directory: -Once tracing is enabled and the appropriate exporter is configured, the next step is to verify that traces are being captured and sent to the observability backend. This ensures that the MicroProfile Telemetry setup is functioning correctly and that distributed tracing data is available for monitoring and debugging. +[source, bash] +---- +cd code/chapter09 +docker compose up -d +docker compose ps +---- -==== Run Jaeger +All five services — `grafana`, `prometheus`, `loki`, `tempo`, and `otel-collector` — should show status `running`. -The simplest way to run Jaeger is with Docker using the command as below: +Verify each backend is healthy: [source, bash] ---- -docker run -d --name jaeger \ - -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \ - -p 5775:5775/udp \ - -p 6831:6831/udp \ - -p 6832:6832/udp \ - -p 5778:5778 \ - -p 16686:16686 \ - -p 14268:14268 \ - -p 14250:14250 \ - -p 9411:9411 \ - jaegertracing/all-in-one:latest +curl -s http://localhost:13200/ready # Tempo → "ready" +curl -s http://localhost:13100/ready # Loki → "ready" +curl -s http://localhost:19090/-/healthy # Prometheus → "Prometheus Server is Healthy." ---- -The above command runs the *all-in-one* Jaeger container, which includes the agent, collector, query service, and UI. +The ports used by each service: -The Jaeger UI can be accessed at: `https://:16686`. +[cols="2,2,1,1,3", options="header"] +|=== +|Service |Image |Host Port |Container Port |Purpose -Ensure all the services of our MicroProfile E-commerce applications are running. +|Grafana +|grafana/grafana:latest +|13000 +|3000 +|Unified dashboard -Search using parameters like operation name, time range, or service for the traces associated with different microservices and confirm that the telemetry data is visible. -View a detailed breakdown of each span within the trace, including timing and attributes. +|Prometheus +|prom/prometheus:latest +|19090 +|9090 +|Metrics storage -== Types of Telemetry +|Loki +|grafana/loki:latest +|13100 +|3100 +|Log storage -MicroProfile Telemetry supports multiple approaches to instrumentation and tracing, ensuring flexibility for developers based on their observability needs. The three primary types of telemetry in MicroProfile Telemetry are: +|Tempo +|grafana/tempo:latest +|13200, 14317 +|3200, 4317 +|Trace storage -=== Automatic Instrumentation +|OTel Collector +|otel/opentelemetry-collector-contrib:latest +|24317 +|4317 +|Telemetry gateway +|=== -Automatic Instrumentation enables distributed tracing without requiring any modifications to the application code. This is particularly beneficial for Jakarta RESTful Web Services and MicroProfile REST Clients, as it enables seamless integration into distributed tracing systems following the semantic conventions of OpenTelemetry. This ensures compatibility across different tracing tools. +=== Configuring Grafana Data Sources -For example, in the ProductService, which exposes a RESTful endpoint, automatic instrumentation ensures that incoming and outgoing HTTP requests are traced with minimal configuration, without requiring any additional code changes. +Open Grafana at `http://localhost:13000` (login: `admin` / `admin`). -By default, MicroProfile Telemetry tracing is disabled. To activate it, set the following property in `microprofile-config.properties`: +Go to *Connections → Data sources → Add data source* and add: -[source] ----- -otel.sdk.disabled=false ----- -This ensures that OpenTelemetry's tracing capabilities are enabled for the application. +[cols="1,1,2", options="header"] +|=== +|Type |Name |URL -=== Manual Instrumentation -Manual Instrumentation provides developers with fine-grained control over how telemetry data is collected and structured within a MicroProfile application. By explicitly defining spans, attributes, and trace propagation, developers can gain greater insight into application behavior beyond what automatic instrumentation provides. +|Prometheus +|Prometheus +|`http://prometheus:9090` -==== Using the @WithSpan Annotation -The `@WithSpan` annotation provides a simple way to create custom spans within a trace. By annotating a method with `@WithSpan`, a new span is automatically generated whenever the method is invoked. This span is linked to the current trace context, allowing developers to track key operations without manually managing span lifecycle. +|Loki +|Loki +|`http://loki:3100` -[source, java] ----- -import io.opentelemetry.instrumentation.annotations.WithSpan; -import jakarta.enterprise.context.ApplicationScoped; +|Tempo +|Tempo +|`http://tempo:3200` +|=== -@ApplicationScoped -public class PaymentService { +Click *Save & test*. For each of them, it should show a success message. - @WithSpan - public void processPayment(String orderId) { - // Business logic here - } -} ----- +NOTE: Use Docker service names (`prometheus`, `loki`, `tempo`) as hostnames. Grafana and the backends share the same Docker network, so service-name DNS resolution works. -Every time processPayment is called, a new span is created. The span is automatically linked to the current trace context. No need for explicit span creation or lifecycle management. You can use `@WithSpan` for tracing key business operations, such as order processing, payment handling, or API requests. +=== Building and Running the Payment Service -==== Using `SpanBuilder` for Custom Spans +Open a second terminal and start the service: -For greater flexibility, developers can manually create spans using the OpenTelemetry API. The `SpanBuilder` class provides the ability to define custom span names, making trace analysis more meaningful and structured. Additionally, developers can attach custom attributes to spans, enriching trace data with relevant metadata for deeper insights. This method also offers explicit control over the span lifecycle, allowing spans to be started and ended manually, ensuring they accurately represent specific business operations or execution flows within the application. +[source, bash] +---- +cd code/chapter09/payment +mvn clean package +mvn liberty:run +---- -[source, java] +Wait for: +---- +[AUDIT] CWWKF0011I: The server mpServer is ready to run a smarter planet. ---- -import io.opentelemetry.api.trace.Tracer; -import io.opentelemetry.api.trace.Span; -import jakarta.inject.Inject; -import jakarta.ws.rs.GET; -import jakarta.ws.rs.Path; -@Path("/trace") -public class TraceResource { +The service is available at `http://localhost:9080/payment`. - @Inject - Tracer tracer; +=== Generating Telemetry Traffic - @GET - @Path("/custom") - public String customTrace() { - Span span = tracer.spanBuilder("custom-span").startSpan(); - span.setAttribute("custom.key", "customValue"); - span.end(); - return "Trace recorded"; - } -} ----- +Run the following commands to exercise all telemetry signal paths: -The method `tracer.spanBuilder("custom-span").startSpan()` creates a span with a specific name allowing developers to define meaningful trace segments for better observability. Using `span.setAttribute("custom.key", "customValue")`, custom metadata can be attached to the span, enriching trace data with relevant contextual information. Finally, calling `span.end()` explicitly marks the completion of the span, ensuring accurate tracking of execution duration. The `SpanBuilder` approach is particularly useful when developers require fine-grained control over when spans start and end, as well as the ability to include detailed metadata for enhanced trace analysis. +[source, bash] +---- +# Process payment (creates payment.process span + increments payment.attempts counter) +curl -s -X POST "http://localhost:9080/payment/payments" \ + -H "Content-Type: application/json" \ + -d '{"cardNumber":"4111111111111111","cardHolderName":"Test User","expiryDate":"12/25","securityCode":"123","amount":99.99}' + +# Quick authorize (retry + fallback path) +curl -s -X POST "http://localhost:9080/payment/authorize?amount=75.50" + +# Verification flow (validation → fraud check → funds check) +curl -s -X POST "http://localhost:9080/payment/verify" \ + -H "Content-Type: application/json" \ + -d '{"cardNumber":"4111111111111111","cardHolderName":"Test User","expiryDate":"12/25","securityCode":"123","amount":150.00}' + +# Trigger fraud failure (card number ending in 0000) +curl -s -X POST "http://localhost:9080/payment/verify" \ + -H "Content-Type: application/json" \ + -d '{"cardNumber":"4111111110000","cardHolderName":"Test User","expiryDate":"12/25","securityCode":"123","amount":50.00}' + +# Gateway health check (circuit breaker path) +curl -s "http://localhost:9080/payment/health/gateway" +---- -=== Manual Tracing in `PaymentService` +Run each command several times to produce enough data for Grafana to display. -To manually instrument the processPayment method in the PaymentService, we use OpenTelemetry’s API to create a custom span, add attributes, and control the span lifecycle. +=== Verifying Telemetry in Grafana -[source, java] ----- -import io.opentelemetry.api.trace.Span; -import io.opentelemetry.api.trace.Tracer; -import jakarta.enterprise.context.ApplicationScoped; -import jakarta.inject.Inject; +*Traces in Tempo:* -@ApplicationScoped -public class PaymentService { +. Open Grafana → *Explore* → select *Tempo* +. Switch to the *Search* tab +. Set *Service Name* = `payment-service`, then click *Run query* +. Click any trace to expand the span tree — look for the `payment.process` child span with attributes `payment.amount`, `payment.method`, and `payment.status` - @Inject - Tracer tracer; +*Logs in Loki:* - public void processPayment(String orderId, double amount, String paymentMethod) { - // Create a custom span for tracing the payment process - Span span = tracer.spanBuilder("payment.process").startSpan(); - - try { - // Add attributes to enrich the trace - span.setAttribute("order.id", orderId); - span.setAttribute("payment.amount", amount); - span.setAttribute("payment.method", paymentMethod); - span.setAttribute("payment.status", "IN_PROGRESS"); - - // Business logic for processing the payment - System.out.println(“Processing Payment…); - - // Update span attribute on successful completion - span.setAttribute("payment.status", "SUCCESS"); - } catch (Exception e) { - // Capture error in tracing - span.setAttribute("payment.status", "FAILED"); - span.recordException(e); - } finally { - // End the span to complete the tracing - span.end(); - } - } -} ----- +. Open Grafana → *Explore* → select *Loki* +. Run the query: `{service_name="payment-service"}` +. Look for log lines containing `traceId` — click the trace link to jump to the correlated trace in Tempo -The `payment.process` span is manually created using `tracer.spanBuilder()`, allowing explicit control over the tracing of the payment process. To enhance trace visibility, custom attributes such as the order ID, payment amount, and payment method are attached to the span, providing valuable context for analysis. Additionally, the payment status is recorded as `IN_PROGRESS` when processing starts and updated to `SUCCESS` or `FAILED` based on the outcome. +*Metrics in Prometheus:* -In the event of an error, the span captures and records the exception, ensuring failure details are logged for debugging. The span lifecycle is carefully managed, starting before the business logic executes and ending only after the process is completed in the `finally` block. This structured approach guarantees accurate performance monitoring and trace completeness, improving visibility into how payments are processed in a distributed system. +. Open Grafana → *Explore* → select *Prometheus* +. Query the custom payment counter: ++ +[source] +---- +payment_attempts_total{result="success"} +payment_attempts_total{result="failed"} +payment_attempts_total{result="fallback"} +---- +. Query HTTP server metrics emitted automatically by MicroProfile Telemetry: ++ +[source] +---- +http_server_request_duration_seconds_count +---- +. Query fault tolerance metrics from MicroProfile Fault Tolerance: ++ +[source] +---- +ft_retry_calls_total +ft_circuitbreaker_state_total +---- -== Agent Instrumentation +== Agent Instrumentation -Agent Instrumentation enables telemetry data collection without modifying application code by attaching a Java agent at runtime. This approach is particularly useful for legacy applications or scenarios where modifying source code is not feasible. The OpenTelemetry Java Agent dynamically instruments applications, automatically detecting and tracing interactions within commonly used frameworks such as Jakarta RESTful Web Services, database connections, and messaging systems. +Agent Instrumentation enables telemetry data collection without modifying application code by attaching a Java agent at runtime. This approach is particularly useful for legacy applications or scenarios where modifying source code is not feasible. -One of the key advantages of agent-based instrumentation is that it requires no changes to the application's source code and eliminates the need for recompilation or redeployment. Instead, it can be activated by attaching the agent at application startup. +The OpenTelemetry Java Agent dynamically instruments applications, automatically detecting and tracing interactions within commonly used frameworks such as Jakarta RESTful Web Services, database connections, and messaging systems. -Refer to the https://opentelemetry.io/docs/zero-code/java/agent/getting-started/[OpenTelemetry Java Agent Getting Started page] for step-by-step instructions on enabling it for your application. -Once enabled, the agent automatically instruments the application, seamlessly integrating with distributed tracing systems without requiring developer intervention. This makes it an efficient and non-intrusive way to implement observability in MicroProfile applications. +One of the key advantages of agent-based instrumentation is that it requires no changes to the application's source code, eliminating the need for recompilation or redeployment. Instead, activate it by attaching the agent at application startup. -Once enabled, the agent automatically instruments the application, seamlessly integrating with distributed tracing systems without requiring developer intervention. This makes it an efficient and non-intrusive way to implement observability in MicroProfile applications. +Refer to the https://opentelemetry.io/docs/zero-code/java/agent/getting-started/[OpenTelemetry Java Agent Getting Started page] for step-by-step instructions. == Analyzing Traces -Once trace data is collected and exported to a backend system, analyzing these traces becomes a crucial step in understanding the behavior of your distributed microservices architecture. By examining traces, you can gain insights into system performance, identify bottlenecks, and detect failures or anomalies. +Once trace data is exported to a backend, analyzing it provides insight into system performance, bottlenecks, and failures. === Visualizing Traces -Tracing backends like *Jaeger*, *Zipkin*, or *Graphana Tempo* provide visual interfaces to explore and analyze traces. These tools display traces as timelines or dependency graphs, making it easier to: +Tracing backends like Grafana Tempo provide visual interfaces to explore traces as timelines or dependency graphs, making it easier to: * Understand the sequence of operations. * Identify the services and components involved in a request. @@ -436,81 +577,47 @@ Tracing backends like *Jaeger*, *Zipkin*, or *Graphana Tempo* provide visual int Traces highlight spans with long durations or repeated retries, which often point to bottlenecks or inefficiencies. Pay close attention to: * *Critical Path*: The longest path in a trace that determines the total response time. -* *Service Dependencies*: Examine how upstream and downstream services interact to find slow components. -* *Retries and Failures*: Repeated spans or high failure rates indicate problematic dependencies or transient errors. +* *Service Dependencies*: Upstream and downstream interactions that reveal slow components. +* *Retries and Failures*: Repeated spans or high failure rates indicating problematic dependencies. === Diagnosing Failures -Traces provide valuable information for diagnosing failures, including: - -* *Error Codes*: Look for spans with error attributes, such as `http.status_code=500`. -* *Exception Details*: Many tracing systems capture stack traces or error messages in spans. -* *Service Impact*: Identify which upstream and downstream services are affected by the failure. - -=== Understanding Service Dependencies +Traces provide valuable information for diagnosing failures: -Dependency graphs generated from traces show the interactions between services. These graphs help: - -* Visualize which services depend on each other. -* Detects circular dependencies or excessive coupling. -* Plan optimizations by focusing on critical services. +* *Error Status*: Spans with `StatusCode.ERROR` indicate failed operations. +* *Exception Details*: `span.recordException(e)` captures stack traces inside spans for trace-based debugging. +* *Service Impact*: Identify which upstream and downstream services are affected by a failure. === Correlating Traces with Logs and Metrics -Traces, when combined with logs and metrics, provide a comprehensive picture of the system: - -* *Logs*: Use trace IDs and span IDs in logs to correlate application logs with specific spans. -* *Metrics*: Correlate trace performance data with system metrics like CPU usage, memory consumption, or request rates. -Example: If a span indicates high latency, check corresponding logs and metrics to identify the underlying cause, such as a resource constraint or network delay. - -=== Best Practices for Analyzing Traces - -. *Establish Baselines*: Use traces to establish performance baselines for services. -. *Monitor Critical Paths*: Focus on traces that traverse critical services or user-facing operations. -. *Use Sampling Strategically*: Balance trace volume and storage costs by sampling traces intelligently. -. *Automate Alerts*: Set up alerts for abnormal patterns in traces, such as increased latency or failure rates. -. *Collaborate Across Teams*: Share trace insights with development, operations, and QA teams to improve system reliability. - -By analyzing traces effectively, you can identify opportunities to optimize your microservices, ensure smoother operations, and enhance the overall user experience. Tracing tools provide a powerful way to visualize and understand the intricate dynamics of distributed systems. + -When analyzing traces, developers should look for the following: +Traces, logs, and metrics together provide a complete picture of the system: -* *Long spans:* Spans that take a long time to complete may indicate a performance issue. -* *Missing spans:* Missing spans can make it difficult to understand the flow of a request. -* *Errors:* Errors can indicate problems with a service or a request. -* *High latency:* High latency can indicate a problem with the network or a service. +* *Logs*: Use `traceId` and `spanId` injected into log records to correlate log lines with specific spans. Loki makes this easy via trace links. +* *Metrics*: Correlate trace latency data with custom counters (e.g., `payment_attempts_total`) and system metrics (e.g., CPU usage, request rates). -By analyzing traces, developers can identify and troubleshoot problems with their microservices applications. This can help developers improve the performance and reliability of their applications. +For example, if a span indicates high latency, check corresponding Loki logs and Prometheus metrics to identify the underlying cause — a resource constraint, a network delay, or a retry storm. -Here are some tips for analyzing traces: - -* *Use a trace viewer:* A trace viewer is a tool that can help you visualize and analyze traces. -* *Look for patterns:* Look for patterns in the traces that may indicate a problem. -* *Correlate traces with metrics:* Correlate traces with metrics to get a better understanding of the performance of your application. -* *Use sampling:* Use sampling to reduce the number of traces that are collected. This can improve the performance of your tracing system. +=== Best Practices for Analyzing Traces -By following these tips, developers can effectively analyze traces to improve the performance and reliability of their microservices applications. +. *Establish Baselines*: Use traces to establish performance baselines for services under normal load. +. *Monitor Critical Paths*: Focus on traces that traverse user-facing operations. +. *Use Sampling Strategically*: Balance trace volume and storage costs by sampling intelligently. +. *Automate Alerts*: Set up alerts for abnormal patterns such as increased latency or failure rates. +. *Correlate Signals*: Always investigate traces alongside logs and metrics — each signal reveals a different facet of the same problem. == Security Considerations for Tracing -When implementing tracing in your applications, it is crucial to be mindful of security implications. Tracing involves collecting and storing data about application behavior, which can potentially expose sensitive information if not handled properly. - -* *Data Sensitivity:* Be cautious about the data included in traces. Avoid logging sensitive information such as passwords, API keys, or personally identifiable information (PII). -* *Access Control:* Implement strict access controls to limit who can view and manage trace data. -* *Encryption:* Consider encrypting trace data at rest and in transit to protect it from unauthorized access. -* *Storage:* Carefully manage the storage of trace data. Avoid storing traces indefinitely and implement data retention policies. -* *Third-Party Services:* If using third-party tracing services, ensure they have robust security measures in place to protect your data. +When implementing tracing in your applications, be mindful of the security implications. Tracing data can expose sensitive information if not handled carefully. === Avoid Capturing Sensitive Data -Traces often include attributes and metadata that can contain sensitive information. Avoid storing or transmitting sensitive details, such as: +Traces can contain attributes and metadata with sensitive information. Never store or transmit: -* Personally Identifiable Information (PII) (e.g., names, addresses, social security numbers). -* Payment information (e.g., credit card numbers). -* Authentication credentials (e.g., passwords, API keys, tokens). +* Personally Identifiable Information (PII) — names, addresses, social security numbers. +* Payment information — full card numbers. +* Authentication credentials — passwords, API keys, tokens. -*Best Practice:* - -Sanitize attributes before adding them to spans: +*Best Practice:* Sanitize attributes before adding them to spans: [source, java] ---- @@ -520,85 +627,68 @@ span.setAttribute("credit.card.last4", "****1234"); === Encrypt Trace Data -To prevent unauthorized access during transmission, ensure that telemetry data is encrypted. Use secure protocols such as HTTPS or TLS for exporting trace data to a backend. - - *Example:* - -* Configure the tracing provider to use encrypted connections: +Ensure telemetry data is encrypted in transit. Use HTTPS/TLS for the OTLP exporter endpoint: [source, properties] ---- -otel.exporter.jaeger.endpoint=https://secure-jaeger-collector.example.com otel.exporter.otlp.endpoint=https://secure-collector.example.com ---- === Limit Trace Retention -Trace data can grow rapidly in distributed systems. Retaining it indefinitely increases the risk of exposing sensitive information. Implement retention policies to: +Trace data grows rapidly. Implement retention policies to: -* Retain traces only for the necessary duration for debugging or performance analysis. +* Retain traces only for the duration needed for debugging or performance analysis. * Periodically purge older traces from storage. === Access Control and Auditing -Restrict access to trace data to authorized personnel only. Ensure that your tracing backend implements robust authentication and authorization mechanisms. - -*Best Practice:* +Restrict access to trace data to authorized personnel. Use role-based access control (RBAC) to define permissions for viewing and managing traces, and audit access regularly. -* Use role-based access control (RBAC) to define permissions for viewing and managing traces. -* Audit access to trace data regularly to identify potential misuse or breaches. +=== Sampling to Minimize Exposure -=== Sampling Strategies to Minimize Exposure +Sampling reduces trace volume and limits exposure of sensitive data. Common strategies: -Sampling reduces the volume of traces collected and limits the exposure of sensitive data by capturing only a subset of requests. Common strategies include: +* *Random Sampling*: Captures a fixed percentage of traces. +* *Rate-Limiting Sampling*: Limits the number of traces per second. +* *Parent-Based Sampling*: Respects the sampling decision of the parent span (used in this chapter). -* Random Sampling: Captures a fixed percentage of traces. -* Rate-Limiting Sampling: Limits the number of traces per second. -* Key-Based Sampling: Samples traces based on specific attributes (e.g., user ID). - -*Example:* - -Random sampling to limiting the amount of trace data collected: +*Example:* Sample 10% of traces by trace ID: [source, properties] ---- otel.traces.sampler=traceidratio -otel.traces.sampler.traceidratio=0.1 +otel.traces.sampler.arg=0.1 ---- === Compliance with Regulations -Ensure that your tracing practices comply with data protection and privacy regulations such as GDPR, CCPA, or HIPAA. Key considerations include: +Ensure tracing practices comply with data protection regulations such as GDPR, CCPA, or HIPAA: -* Anonymizing sensitive data before tracing. -* Informing users about telemetry collection in your privacy policy. -* Providing mechanisms to opt out of tracing where required. +* Anonymize sensitive data before including it in spans. +* Inform users about telemetry collection in your privacy policy. +* Provide mechanisms to opt out of tracing where required. === Isolate Tracing Infrastructure -The tracing infrastructure, such as Jaeger or OpenTelemetry Collector, should be isolated from the public internet and accessible only within secure networks. - -*Best Practice:* +Deploy tracing backends (OpenTelemetry Collector, Grafana Tempo) in private subnets or behind firewalls, accessible only within secure networks. Use VPNs or dedicated connections for remote access to tracing dashboards. -* Deploy tracing backends in private subnets or behind firewalls. -* Use VPNs or dedicated connections for remote access to tracing dashboards. +=== Monitor for Trace Anomalies -=== Monitor and Alert on Trace Anomalies +Tracing can help detect potential security incidents. Monitor traces for: -Tracing can help detect potential security incidents. Monitor traces for unusual patterns, such as: - -* Unexpected spikes in requests. +* Unexpected spikes in request volume. * Requests from unknown or unauthorized sources. -* Abnormal response times indicating possible exploits. -Set up alerts for these anomalies to investigate and mitigate potential issues. + -By following these security considerations, you can leverage the benefits of distributed tracing without compromising the security of your system or the privacy of your users. Careful handling of trace data, coupled with robust encryption, access controls, and compliance practices, ensures that tracing remains a valuable yet secure component of your observability strategy. +* Abnormal response times that may indicate exploitation. + +Set up alerts for these anomalies to investigate and mitigate issues quickly. == Conclusion -MicroProfile Telemetry provides a robust foundation for observability in Java-based microservices, enabling developers to implement distributed tracing seamlessly. By leveraging this specification, you can gain deep insights into the flow of requests, identify bottlenecks, and enhance the reliability and performance of your applications. The integration of standardized tracing concepts like spans, traces, and context propagation ensures that developers can maintain a cohesive understanding of their system's behavior across service boundaries. +MicroProfile Telemetry 2.1 provides a robust foundation for observability in Java-based microservices. Building on OpenTelemetry, it provides developers with a vendor-neutral, CDI-native API for instrumenting traces, metrics, and logs. It exports all three signal types to any standard observability backend. -Through instrumentation, context propagation, and effective trace analysis, MicroProfile Telemetry simplifies the complexities of monitoring and debugging distributed systems. It empowers teams to proactively address issues, optimize performance, and improve the user experience. Moreover, by adhering to security best practices, developers can ensure that telemetry data is protected, compliant with regulations, and free of sensitive information. +In this chapter, we instrumented a payment service with manual span creation using `Tracer`, custom metrics using `Meter` and `LongCounter`, and exported all three OTLP signal types to the LGTM observability stack. We saw how MicroProfile Telemetry 2.1 enables end-to-end visibility into payment processing from HTTP request entry to span events, counter increments, and correlated log lines in Grafana. -In this chapter, we explored the critical security considerations surrounding tracing within the MicroProfile Telemetry framework. We emphasized the importance of safeguarding sensitive data by avoiding the inclusion of Personally Identifiable Information (PII) in trace spans. Additionally, we discussed the potential security risks associated with tracing in production environments and the significance of carefully managing sampling rates and data retention policies. By adhering to these security best practices, developers can harness the power of tracing for observability while ensuring the confidentiality and integrity of their applications. +By following the security best practices covered here, sanitizing span attributes, encrypting OTLP endpoints, limiting retention, and applying appropriate sampling teams, you can leverage distributed tracing as a powerful observability tool without compromising application security or user privacy. -As microservices architectures continue to evolve, the ability to observe and trace system interactions will remain a critical factor in maintaining resilient and efficient applications. MicroProfile Telemetry stands as a valuable tool in achieving these goals, providing developers with the observability they need to deliver reliable, high-performance microservices in modern cloud-native environments. +As microservices architectures continue to evolve, the ability to observe and trace system interactions remains critical to maintaining resilient, high-performance applications. MicroProfile Telemetry 2.1 delivers the standardized APIs needed to meet that challenge in modern cloud-native environments.