Skip to content

feat(metrics): Add per-component CPU usage metric (poll-hook)#25317

Open
gwenaskell wants to merge 19 commits intomasterfrom
yoenn.burban/OPA-5012-add-per-component-cpu-metric-poll-hook
Open

feat(metrics): Add per-component CPU usage metric (poll-hook)#25317
gwenaskell wants to merge 19 commits intomasterfrom
yoenn.burban/OPA-5012-add-per-component-cpu-metric-poll-hook

Conversation

@gwenaskell
Copy link
Copy Markdown
Contributor

Summary

Alternative implementation of #25185. Same metric (component_cpu_usage_ns_total), same Tier 1/Tier 2 platform support; the difference is how CPU time is sampled for the concurrent transform path: it is now hooked onto the spawned task's Future::poll boundary via a thin CpuTimedFuture adapter, rather than measured inline inside the async block.

Within a single poll, tokio's cooperative scheduler guarantees the task cannot migrate to another worker thread and no other task can run on the current thread, so each (before_poll, after_poll) pair is a clean per-thread CPU measurement. Multi-poll futures accumulate correctly — which keeps the wrapper applicable if the spawned body ever grows .await points and makes the future extension to task transforms a one-line wrap.

run_inline is unchanged: its body is sync and already runs in the transform's own task, so direct ThreadTime brackets remain the simplest correct option there.

See the RFC for more details.

Vector configuration

How did you test this PR?

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.

gwenaskell and others added 14 commits April 14, 2026 14:17
…dary

For the concurrent transform path, replace the inline ThreadTime brackets
inside the spawned async block with a CpuTimedFuture adapter that samples
thread CPU time around every Future::poll and accumulates the delta into
the component_cpu_usage_ns_total counter. Within a single poll tokio cannot
migrate the task or run another task on this thread, so each pair is a
clean per-thread CPU measurement; multi-poll futures accumulate correctly,
which keeps the wrapper applicable if the body ever grows .await points
or to future task-transform coverage.

The inline path is unchanged: its body is sync and runs in the transform's
own task, so direct measurement is the simplest correct option.

The RFC is updated to describe the wrapper approach in Rationale, Plan Of
Attack, and Future Improvements.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gwenaskell gwenaskell requested review from a team as code owners April 28, 2026 15:17
@github-actions github-actions Bot added work in progress domain: topology Anything related to Vector's topology code domain: external docs Anything related to Vector's external, public documentation domain: rfc and removed work in progress labels Apr 28, 2026
The value is cumulative CPU nanoseconds consumed by the component. Operators
use it to compute CPU core utilization:

```promql
with each poll independently sampling the thread it ran on. This isolates
the timing concern from the transform body and keeps it robust if the body
ever grows `.await` points.
- **Low overhead.** Two `clock_gettime` calls per poll (~80ns total on Linux)
far cheaper. Per-event latency can be derived from the counter and
`events_sent_total` if needed (`cpu_ns / events = avg cpu ns per event`).

### `getrusage(RUSAGE_THREAD)` instead of `clock_gettime`
On Linux, `getrusage(RUSAGE_THREAD)` also provides per-thread CPU time (as
`ru_utime` + `ru_stime`).

**Not preferred because:** `clock_gettime(CLOCK_THREAD_CPUTIME_ID)` has
…trait

Replace the explicit CpuTimedFuture::new constructor with a CpuTimedExt
trait so the wrapper composes naturally with .in_current_span() and
similar future-extension methods:

    async move { ... }
        .cpu_timed(cpu_ns.clone())
        .in_current_span()

Mirrors the style of tracing::Instrument::in_current_span. No behavior
change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a `cpu_ns: Counter` field to `TransformContext`, defaulting to
`Counter::noop()`. The topology builder resolves the counter once,
inside the transform `error_span!` so it is tagged with the right
component_id / component_kind / component_type, and stores it on
the context. This is the single Counter handle every transform
path consumes — sync, task, and any helper tokio tasks — so label
resolution and recorder lookup are paid once at construction time
rather than on every poll.

For task transforms (`build_task_transform`), wrap the outer task
future with `.cpu_timed(counter)` before `.boxed()`. CPU time is
accumulated across every poll of the task; multi-poll futures
accumulate correctly, and time the task spends parked in `Pending`
is naturally excluded.

For transforms that spawn long-running helper tokio tasks at
construction time, plumb the counter through and `.cpu_timed(...)`
those spawns too:

- `aws_ec2_metadata`: the periodic IMDS-refresh worker.
- `throttle`'s `RateLimiterRunner`: the periodic
  `retain_recent` flush loop. The counter is plumbed through
  `RateLimiterRunner::start` as a parameter.

Without this, those helpers' CPU would be silently excluded.

The bracket scope for task transforms is slightly wider than for
sync transforms — it includes input-channel polls, the
Utilization / OutputUtilization wrappers, and the fanout-send
loop — but channel / fanout overhead is small relative to
transform work, so the metric remains comparable across kinds.
RFC and changelog updated to reflect the broader coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the domain: transforms Anything related to Vector's transform components label Apr 28, 2026
gwenaskell and others added 3 commits April 28, 2026 18:23
Restructure the "Scope of the measurement" rationale bullet to make
the upstream-isolation property explicit. Vector components only
communicate via BufferReceiver / BufferSender channels (never via
stream combinators chained across component boundaries), so polling
a task transform's input dequeues items but never runs the upstream's
code. Upstream CPU was charged to its own cpu_ns when it ran in its
own task. Spell out what is and is not included in cpu_ns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Threading cpu_ns as a separate argument pushed build_task_transform
above clippy's too_many_arguments threshold. Mirror build_sync_transform
by taking the whole TransformNode and destructuring at the top. The
later `let mut outputs = HashMap::new()` shadows the destructured
Vec — fine since the Vec is only used earlier when building the
schema_definition_map.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gwenaskell
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7a66b66dd7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

tags: _component_tags
}
component_cpu_usage_ns_total: {
description: "The CPU time consumed by a component in nanoseconds. Available for sync and function transforms."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Update CPU metric docs to include task transforms

The new implementation records component_cpu_usage_ns_total for task transforms as well (see build_task_transform now wrapping the task future with .cpu_timed(...)), but this description still says the metric is only available for sync/function transforms. This creates incorrect user-facing documentation and can cause operators to miss or misinterpret task-transform CPU data.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

@bruceg bruceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks basically sound so long as the performance impact is minimal

Comment thread src/topology/builder.rs
merged_schema_definition: merged_definition.clone(),
schema: self.config.schema,
extra_context: self.extra_context.clone(),
cpu_ns,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could just instantiate the counter! here like all the other fields.

Comment on lines +79 to +82
self.quota,
self.clock.clone(),
self.flush_keys_interval,
self.cpu_ns.clone(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point it might make more sense to just pass self.

Comment thread src/cpu_time.rs
Comment on lines +206 to +223
/// Extension trait that wraps a future in [`CpuTimedFuture`] via a chained
/// call:
///
/// ```ignore
/// async move { /* work */ }.cpu_timed(counter)
/// ```
///
/// Mirrors the style of [`tracing::Instrument::in_current_span`].
pub(crate) trait CpuTimedExt: Future + Sized {
fn cpu_timed(self, counter: Counter) -> CpuTimedFuture<Self> {
CpuTimedFuture {
inner: self,
counter,
}
}
}

impl<F: Future> CpuTimedExt for F {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading right, all calls to cpu_timed follow a tokio::spawn. What about providing a wrapper for that whole tokio::spawn(future).cpu_timed(counter) sequence instead, like fn spawn_timed(…)? It will also make it more visible when a task is spawned without adding the timer accounting.

Comment thread src/cpu_time.rs
Comment on lines +198 to +199
let this = self.project();
let t0 = ThreadTime::now();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's trivial, but consider flipping these operations to let this go straight from project() to usage.


This metric is always emitted for transforms; there is no configuration knob.

## Rationale
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RFC is missing the implementation plan, which should be the primary focus of the rationale here. Basically move most of the plan into an implementation section and just reference points in the plan. A bunch of this rationale is also explaining how it is implemented too.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note; this PR is both an implementation and an RFC, which is unusual. We usually implement the RFC after it has been approved.

Comment on lines +147 to +149
The channel-poll / fanout-send bookkeeping our wrapper does include is
small relative to the transform's own work, so the metric remains a
meaningful comparator across transform kinds.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to refer to past implementation.

is negligible relative to the work `transform_all` performs.
- **No accumulation errors.** The counter stores `u64` nanoseconds; each
increment is exact integer arithmetic. The single `u64 → f64` cast at scrape
time has bounded, non-accumulated error.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the "error" is specifically precision loss.

Comment on lines +158 to +160
- **Platform-specific code.** The precise implementation uses `cfg`-gated FFI
for Linux, macOS, and Windows. Other platforms fall back to wall-clock time,
giving three maintained code paths plus one fallback.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead refuse to emit the metric if we can't actually get CPU time? It's a misleading measure otherwise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and it might be easy to do with this approach, I'll look into it

Comment on lines +192 to +196
1. **User/system split:** Should we report user and system CPU time separately
(as `mode="user"` / `mode="system"` tags) like `host_cpu_seconds_total`
does? The Linux API supports this. It adds cardinality but helps distinguish
transforms that trigger syscalls (e.g., enrichment table lookups) from pure
computation.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW for function/sync transforms, the system CPU time should be 0 or effectively 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: external docs Anything related to Vector's external, public documentation domain: rfc domain: topology Anything related to Vector's topology code domain: transforms Anything related to Vector's transform components work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants