You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add OpenTelemetry metrics for queue task processing: enqueue attempts, processing duration, completion, failure, retry scheduling, and in-flight work.
This is separate from #735. That issue adds a way to read backlog depth from queue backends. This issue tracks the work Fedify observes while it enqueues and processes its own inbox, outbox, and fanout tasks.
Current state
Fedify can run inbox, outbox, and fanout work through MessageQueue implementations. It can also use separate queues per role or a shared queue for all roles.
Today operators do not get aggregate metrics for questions such as:
How long do queued inbox tasks take to process?
Are outbox tasks failing more often than fanout tasks?
Is a worker busy, idle, or repeatedly failing the same kind of task?
How many tasks are being enqueued by Fedify, independent of backend queue depth?
Queue depth alone does not answer these questions. A small queue with slow processing is different from a large queue that drains quickly.
Proposed solution
Once #619 adds metrics support, add queue task metrics at Fedify's enqueue and worker-processing boundaries.
Proposed instruments:
fedify.queue.task.enqueued: counter, incremented when Fedify enqueues a task.
fedify.queue.task.started: counter, incremented when a worker starts processing a queued task.
fedify.queue.task.completed: counter, incremented when processing finishes without throwing.
fedify.queue.task.failed: counter, incremented when processing throws.
fedify.queue.task.duration: histogram, recording processing duration in milliseconds.
fedify.queue.task.in_flight: observable gauge, recording tasks currently being processed in this process.
Proposed attributes:
fedify.queue.role: inbox, outbox, or fanout.
fedify.queue.backend: best-effort backend name where available.
fedify.queue.native_retrial: whether the queue backend declares nativeRetrial.
activitypub.activity.type, when the queued message carries an activity type.
fedify.queue.task.result: completed, failed, or aborted.
Do not include activity IDs, actor IDs, object IDs, inbox URLs, or queue message IDs as metric attributes.
Scope
Instrument Fedify's own enqueue calls for inbox, outbox, and fanout tasks.
Instrument processQueuedTask() and queue worker paths for task start, completion, failure, and duration.
Track in-flight task counts per process. Cross-process totals should be left to the metrics backend to aggregate.
Summary
Add OpenTelemetry metrics for queue task processing: enqueue attempts, processing duration, completion, failure, retry scheduling, and in-flight work.
This is separate from #735. That issue adds a way to read backlog depth from queue backends. This issue tracks the work Fedify observes while it enqueues and processes its own inbox, outbox, and fanout tasks.
Current state
Fedify can run inbox, outbox, and fanout work through
MessageQueueimplementations. It can also use separate queues per role or a shared queue for all roles.Today operators do not get aggregate metrics for questions such as:
Queue depth alone does not answer these questions. A small queue with slow processing is different from a large queue that drains quickly.
Proposed solution
Once #619 adds metrics support, add queue task metrics at Fedify's enqueue and worker-processing boundaries.
Proposed instruments:
fedify.queue.task.enqueued: counter, incremented when Fedify enqueues a task.fedify.queue.task.started: counter, incremented when a worker starts processing a queued task.fedify.queue.task.completed: counter, incremented when processing finishes without throwing.fedify.queue.task.failed: counter, incremented when processing throws.fedify.queue.task.duration: histogram, recording processing duration in milliseconds.fedify.queue.task.in_flight: observable gauge, recording tasks currently being processed in this process.Proposed attributes:
fedify.queue.role:inbox,outbox, orfanout.fedify.queue.backend: best-effort backend name where available.fedify.queue.native_retrial: whether the queue backend declaresnativeRetrial.activitypub.activity.type, when the queued message carries an activity type.fedify.queue.task.result:completed,failed, oraborted.Do not include activity IDs, actor IDs, object IDs, inbox URLs, or queue message IDs as metric attributes.
Scope
processQueuedTask()and queue worker paths for task start, completion, failure, and duration.MessageQueue; backend backlog depth is covered by Optional queue depth reporting for OpenTelemetry metrics #735.docs/manual/opentelemetry.mdwith the new instruments and their relationship to queue depth.Acceptance criteria
Open questions
ParallelMessageQueueexpose worker concurrency through metrics, or should this stay at the Fedify task layer?AbortSignal?