feat(metrics): Add per-component CPU usage metric (poll-hook)#25317
feat(metrics): Add per-component CPU usage metric (poll-hook)#25317gwenaskell wants to merge 19 commits intomasterfrom
Conversation
…dary For the concurrent transform path, replace the inline ThreadTime brackets inside the spawned async block with a CpuTimedFuture adapter that samples thread CPU time around every Future::poll and accumulates the delta into the component_cpu_usage_ns_total counter. Within a single poll tokio cannot migrate the task or run another task on this thread, so each pair is a clean per-thread CPU measurement; multi-poll futures accumulate correctly, which keeps the wrapper applicable if the body ever grows .await points or to future task-transform coverage. The inline path is unchanged: its body is sync and runs in the transform's own task, so direct measurement is the simplest correct option. The RFC is updated to describe the wrapper approach in Rationale, Plan Of Attack, and Future Improvements. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| The value is cumulative CPU nanoseconds consumed by the component. Operators | ||
| use it to compute CPU core utilization: | ||
|
|
||
| ```promql |
| with each poll independently sampling the thread it ran on. This isolates | ||
| the timing concern from the transform body and keeps it robust if the body | ||
| ever grows `.await` points. | ||
| - **Low overhead.** Two `clock_gettime` calls per poll (~80ns total on Linux) |
| far cheaper. Per-event latency can be derived from the counter and | ||
| `events_sent_total` if needed (`cpu_ns / events = avg cpu ns per event`). | ||
|
|
||
| ### `getrusage(RUSAGE_THREAD)` instead of `clock_gettime` |
| On Linux, `getrusage(RUSAGE_THREAD)` also provides per-thread CPU time (as | ||
| `ru_utime` + `ru_stime`). | ||
|
|
||
| **Not preferred because:** `clock_gettime(CLOCK_THREAD_CPUTIME_ID)` has |
…trait
Replace the explicit CpuTimedFuture::new constructor with a CpuTimedExt
trait so the wrapper composes naturally with .in_current_span() and
similar future-extension methods:
async move { ... }
.cpu_timed(cpu_ns.clone())
.in_current_span()
Mirrors the style of tracing::Instrument::in_current_span. No behavior
change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a `cpu_ns: Counter` field to `TransformContext`, defaulting to `Counter::noop()`. The topology builder resolves the counter once, inside the transform `error_span!` so it is tagged with the right component_id / component_kind / component_type, and stores it on the context. This is the single Counter handle every transform path consumes — sync, task, and any helper tokio tasks — so label resolution and recorder lookup are paid once at construction time rather than on every poll. For task transforms (`build_task_transform`), wrap the outer task future with `.cpu_timed(counter)` before `.boxed()`. CPU time is accumulated across every poll of the task; multi-poll futures accumulate correctly, and time the task spends parked in `Pending` is naturally excluded. For transforms that spawn long-running helper tokio tasks at construction time, plumb the counter through and `.cpu_timed(...)` those spawns too: - `aws_ec2_metadata`: the periodic IMDS-refresh worker. - `throttle`'s `RateLimiterRunner`: the periodic `retain_recent` flush loop. The counter is plumbed through `RateLimiterRunner::start` as a parameter. Without this, those helpers' CPU would be silently excluded. The bracket scope for task transforms is slightly wider than for sync transforms — it includes input-channel polls, the Utilization / OutputUtilization wrappers, and the fanout-send loop — but channel / fanout overhead is small relative to transform work, so the metric remains comparable across kinds. RFC and changelog updated to reflect the broader coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restructure the "Scope of the measurement" rationale bullet to make the upstream-isolation property explicit. Vector components only communicate via BufferReceiver / BufferSender channels (never via stream combinators chained across component boundaries), so polling a task transform's input dequeues items but never runs the upstream's code. Upstream CPU was charged to its own cpu_ns when it ran in its own task. Spell out what is and is not included in cpu_ns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Threading cpu_ns as a separate argument pushed build_task_transform above clippy's too_many_arguments threshold. Mirror build_sync_transform by taking the whole TransformNode and destructuring at the top. The later `let mut outputs = HashMap::new()` shadows the destructured Vec — fine since the Vec is only used earlier when building the schema_definition_map. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7a66b66dd7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| tags: _component_tags | ||
| } | ||
| component_cpu_usage_ns_total: { | ||
| description: "The CPU time consumed by a component in nanoseconds. Available for sync and function transforms." |
There was a problem hiding this comment.
Update CPU metric docs to include task transforms
The new implementation records component_cpu_usage_ns_total for task transforms as well (see build_task_transform now wrapping the task future with .cpu_timed(...)), but this description still says the metric is only available for sync/function transforms. This creates incorrect user-facing documentation and can cause operators to miss or misinterpret task-transform CPU data.
Useful? React with 👍 / 👎.
bruceg
left a comment
There was a problem hiding this comment.
This looks basically sound so long as the performance impact is minimal
| merged_schema_definition: merged_definition.clone(), | ||
| schema: self.config.schema, | ||
| extra_context: self.extra_context.clone(), | ||
| cpu_ns, |
There was a problem hiding this comment.
nit: could just instantiate the counter! here like all the other fields.
| self.quota, | ||
| self.clock.clone(), | ||
| self.flush_keys_interval, | ||
| self.cpu_ns.clone(), |
There was a problem hiding this comment.
At some point it might make more sense to just pass self.
| /// Extension trait that wraps a future in [`CpuTimedFuture`] via a chained | ||
| /// call: | ||
| /// | ||
| /// ```ignore | ||
| /// async move { /* work */ }.cpu_timed(counter) | ||
| /// ``` | ||
| /// | ||
| /// Mirrors the style of [`tracing::Instrument::in_current_span`]. | ||
| pub(crate) trait CpuTimedExt: Future + Sized { | ||
| fn cpu_timed(self, counter: Counter) -> CpuTimedFuture<Self> { | ||
| CpuTimedFuture { | ||
| inner: self, | ||
| counter, | ||
| } | ||
| } | ||
| } | ||
|
|
||
| impl<F: Future> CpuTimedExt for F {} |
There was a problem hiding this comment.
If I'm reading right, all calls to cpu_timed follow a tokio::spawn. What about providing a wrapper for that whole tokio::spawn(future).cpu_timed(counter) sequence instead, like fn spawn_timed(…)? It will also make it more visible when a task is spawned without adding the timer accounting.
| let this = self.project(); | ||
| let t0 = ThreadTime::now(); |
There was a problem hiding this comment.
It's trivial, but consider flipping these operations to let this go straight from project() to usage.
|
|
||
| This metric is always emitted for transforms; there is no configuration knob. | ||
|
|
||
| ## Rationale |
There was a problem hiding this comment.
This RFC is missing the implementation plan, which should be the primary focus of the rationale here. Basically move most of the plan into an implementation section and just reference points in the plan. A bunch of this rationale is also explaining how it is implemented too.
There was a problem hiding this comment.
One note; this PR is both an implementation and an RFC, which is unusual. We usually implement the RFC after it has been approved.
| The channel-poll / fanout-send bookkeeping our wrapper does include is | ||
| small relative to the transform's own work, so the metric remains a | ||
| meaningful comparator across transform kinds. |
There was a problem hiding this comment.
This seems to refer to past implementation.
| is negligible relative to the work `transform_all` performs. | ||
| - **No accumulation errors.** The counter stores `u64` nanoseconds; each | ||
| increment is exact integer arithmetic. The single `u64 → f64` cast at scrape | ||
| time has bounded, non-accumulated error. |
There was a problem hiding this comment.
nit: the "error" is specifically precision loss.
| - **Platform-specific code.** The precise implementation uses `cfg`-gated FFI | ||
| for Linux, macOS, and Windows. Other platforms fall back to wall-clock time, | ||
| giving three maintained code paths plus one fallback. |
There was a problem hiding this comment.
Should we instead refuse to emit the metric if we can't actually get CPU time? It's a misleading measure otherwise.
There was a problem hiding this comment.
Yes, and it might be easy to do with this approach, I'll look into it
| 1. **User/system split:** Should we report user and system CPU time separately | ||
| (as `mode="user"` / `mode="system"` tags) like `host_cpu_seconds_total` | ||
| does? The Linux API supports this. It adds cardinality but helps distinguish | ||
| transforms that trigger syscalls (e.g., enrichment table lookups) from pure | ||
| computation. |
There was a problem hiding this comment.
FWIW for function/sync transforms, the system CPU time should be 0 or effectively 0.
Summary
Alternative implementation of #25185. Same metric (
component_cpu_usage_ns_total), same Tier 1/Tier 2 platform support; the difference is how CPU time is sampled for the concurrent transform path: it is now hooked onto the spawned task'sFuture::pollboundary via a thinCpuTimedFutureadapter, rather than measured inline inside the async block.Within a single poll, tokio's cooperative scheduler guarantees the task cannot migrate to another worker thread and no other task can run on the current thread, so each
(before_poll, after_poll)pair is a clean per-thread CPU measurement. Multi-poll futures accumulate correctly — which keeps the wrapper applicable if the spawned body ever grows.awaitpoints and makes the future extension to task transforms a one-line wrap.run_inlineis unchanged: its body is sync and already runs in the transform's own task, so directThreadTimebrackets remain the simplest correct option there.See the RFC for more details.
Vector configuration
How did you test this PR?
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make test