chore(traces): Add RFC for typed internal trace data model#25329
chore(traces): Add RFC for typed internal trace data model#25329
Conversation
8490c1e to
492e304
Compare
There was a problem hiding this comment.
check-spelling found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cdae2cc44a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| `_dd.error_tracking_*`, `_dd.otel.gateway`). OpenTelemetry semantic conventions do not emit | ||
| `_dd.*`-prefixed keys, so OTLP-sourced and transform-generated resource attributes cannot | ||
| collide with the reserved names. |
There was a problem hiding this comment.
Avoid reserving user-visible
_dd.* attributes implicitly
For OTLP input or VRL transforms that set resource attributes under _dd.* (for example _dd.payload), this assumption is false: OTLP user attributes and transform-generated attributes are not limited to semantic-convention keys. Since the proposal stores Datadog control scopes in the same Resource.attributes map and Datadog egress treats _dd.payload/_dd.tracer specially, a legitimate user attribute can be misclassified or dropped unless these keys are escaped/reserved explicitly or kept outside the user attribute map.
Useful? React with 👍 / 👎.
| - Re-partitions each `Span.attributes` into `meta`/`metrics`/`meta_struct` by `Value` variant | ||
| per "Datadog attribute partitions" above. |
There was a problem hiding this comment.
Synthesize
_dd.p.tid on Datadog egress
For OTLP-sourced 128-bit trace IDs, there is no existing _dd.p.tid entry in Span.attributes, so repartitioning only the attribute map while emitting Datadog's 64-bit Span.traceID drops the upper half of the ID. Datadog represents that high half as meta["_dd.p.tid"], so Datadog egress needs to derive/update the tag from Span.trace_id.high_u64() rather than relying on a pre-existing attribute.
Useful? React with 👍 / 👎.
492e304 to
22953b6
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 22953b6585
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| | `Span.error` and `Span.meta["error.message"]` | `Span.status` [^dd-err] | | ||
| | `Span.meta` | `Span.attributes` (`Value::String`) | | ||
| | `Span.metrics` | `Span.attributes` (`Value::Float`) | | ||
| | `Span.meta_struct` | `Span.attributes` (`Value::Bytes`) | | ||
|
|
There was a problem hiding this comment.
Map Datadog span events and links
The Datadog span proto cited by the RFC also carries spanEvents and spanLinks fields, but this mapping stops at meta_struct and never maps those wire fields into the internal Span.events/Span.links. Any Datadog relay payload containing span events or links would drop them on Datadog -> Vector -> Datadog, so the mapping needs to preserve them or explicitly exclude those fields from the round-trip guarantee.
Useful? React with 👍 / 👎.
| - Gathers `TracerPayload`s into one `AgentPayload`. Agent-payload-level fields | ||
| (`agentVersion`, `targetTPS`, `errorTPS`, `tags`) come from Vector configuration plus | ||
| `Resource.attributes."_dd.payload"`. |
There was a problem hiding this comment.
Carry agent-level Datadog host/env
For Datadog payloads where top-level AgentPayload.hostName/env are set independently of TracerPayload.hostname/env, this egress rule has no stored values to emit: ingress only maps tracer-level host/env into Resource, and the rebuilt AgentPayload fields listed here omit the agent-level host/env entirely. Vector currently partitions and sends those top-level fields, so relayed traffic can be attributed to the wrong host or env unless these agent fields are preserved and used when splitting egress payloads.
Useful? React with 👍 / 👎.
aaffd17 to
1a2edc5
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1a2edc5d4f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
1a2edc5 to
154f9c1
Compare
950d6c7 to
fe697cc
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fe697cc61b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
fe697cc to
08848d0
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 08848d06b0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Datadog attributes are always scalars on the wire, so the nested-value capability is unused on | ||
| the Datadog egress path; the Datadog sink stringifies any non-scalar value via JSON. |
There was a problem hiding this comment.
Preserve Datadog span-event attribute values
For Datadog payloads with spanEvents[*].attributes containing typed values such as booleans, integers, doubles, or arrays, this rule is not lossless: upstream Datadog SpanEvent.attributes uses AttributeAnyValue, not the span meta/metrics scalar partitions, so stringifying non-scalars on Datadog egress would not recreate the original event attributes and would violate the promised Datadog -> Vector -> Datadog round-trip. Please specify a lossless mapping for SpanEvent.attributes instead of treating all Datadog attributes as scalar span attributes.
Useful? React with 👍 / 👎.
| The two keys live under the `_dd.*` namespace alongside other Datadog-internal keys | ||
| (`_dd.apm_mode`, `_dd.tags.container`, `_dd.tags.process`, `_dd.p.dm`, `_dd.p.tid`, | ||
| `_dd.error_tracking_*`, `_dd.otel.gateway`). OpenTelemetry semantic conventions do not emit | ||
| `_dd.*`-prefixed keys, so OTLP-sourced and transform-generated resource attributes cannot | ||
| collide with the reserved names. |
There was a problem hiding this comment.
Preserve user
_dd.* resource attributes
For OTLP inputs that legitimately carry a resource attribute named _dd.payload or _dd.tracer (or a transform writes one), this assumption is false: OpenTelemetry allows custom attribute keys, and the RFC later treats these keys as Datadog envelopes and rewrites them to datadog.payload / datadog.tracer.tags on OTLP egress. That breaks the stated OTLP -> Vector -> OTLP zero-loss guarantee for an otherwise unmodified trace, so the design needs either out-of-band storage for Datadog envelopes or an explicit escape/collision rule for user _dd.* keys.
Useful? React with 👍 / 👎.
| - [ ] Migrate the `opentelemetry` source to produce `Typed` natively. By this point every | ||
| trace-aware downstream component is `Typed`-capable. | ||
| - [ ] Migrate the `datadog_agent` source to produce `Typed` natively. |
There was a problem hiding this comment.
Migrate legacy trace consumers before source flips
Before flipping sources to emit Typed, this checklist has only migrated the Datadog trace sink, sample, and trace_to_log, but existing trace consumers still read the legacy LogEvent view; for example, src/sinks/kafka/request_builder.rs uses trace.get(...) for key_field/headers_key and trace.as_ref().get_timestamp() for timestamps. Because the RFC says untyped accessors return errors on Typed, Kafka trace configurations would stop extracting keys, headers, or timestamps once either source flips, so the plan should include a repo-wide migration of all TraceEvent legacy-access consumers before these source steps.
Useful? React with 👍 / 👎.
08848d0 to
c6d5ea1
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c6d5ea19c6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| A newtype over `ObjectMap` (re-exported from `vrl::value`). `Value` carries `Bytes`, `Float`, | ||
| `Integer`, `Boolean`, `Timestamp`, `Array`, and `Object` variants (with `Regex` and `Null` | ||
| unused for trace attributes); UTF-8 strings and opaque byte payloads alike are stored as |
There was a problem hiding this comment.
Preserve empty OTLP AnyValue attributes
For OTLP payloads that contain an attribute with an explicitly empty AnyValue (the proto permits the oneof to be unset), this representation has no lossless carrier: the RFC marks Value::Null as unused for trace attributes and the egress rules only reconstruct concrete AnyValue variants. Such an attribute would have to be dropped or emitted as a different value, breaking the promised OTLP relay; please either reserve Value::Null for empty AnyValue or scope that case out.
Useful? React with 👍 / 👎.
| [^otlp-env]: The OTLP source promotes whichever of the two keys is present; if both are present, | ||
| `deployment.environment.name` wins and `deployment.environment` is dropped. On OTLP egress, | ||
| `Resource.environment` is emitted only as `deployment.environment.name`. See "Per-type design | ||
| choices" under Rationale for the spec history and cross-format motivation. |
There was a problem hiding this comment.
Preserve deprecated environment attributes
For OTLP resources that use the deprecated deployment.environment attribute (or carry both it and deployment.environment.name with different values), this rule rewrites/drops a user-visible resource attribute: ingress promotes the old key into Resource.environment, then egress emits only the new key. That breaks OTLP -> Vector -> OTLP for pre-stabilization producers even though the RFC explicitly accepts that key; the original attribute key/value needs to be retained in Resource.attributes or the exception should be scoped out.
Useful? React with 👍 / 👎.
Summary
Sets up a comprehensive RFC documenting a proposed strongly-typed trace data model for Vector, including semantically-lossless conversions to and from OLTP and Datadog APM formats.
This is a different path than #20170 which explicitly takes into account the Datadog-Vector-Datadog path rather than just adopting one format or another.
Rendered
Vector configuration
N/A
How did you test this PR?
N/A
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
opentelemetrysource and sink #1444datadog_agentincorrectly castsu64values from the Agent asi64in order to fit into theValueenum. #14687Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.