You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add OpenTelemetry metrics for WebFinger lookup and actor discovery paths, including lookup counts, durations, and outcomes.
Current state
Fedify already instruments webfinger.lookup, webfinger.handle, and activitypub.get_actor_handle as spans. These spans are useful when debugging one lookup, but they do not provide sample-independent metrics for discovery reliability or latency.
Discovery failures are operationally noisy. A server can appear broken to users when handle resolution is slow, blocked by a remote firewall, or repeatedly returning malformed resource descriptors. Operators need aggregate metrics for those paths.
Proposed solution
Once #619 adds metrics support, add counters and histograms for WebFinger and actor discovery.
Proposed instruments:
webfinger.lookup: counter, incremented for outgoing WebFinger lookup attempts.
webfinger.lookup.duration: histogram, recording outgoing lookup duration in milliseconds.
webfinger.handle: counter, incremented for incoming WebFinger requests handled by Fedify.
webfinger.handle.duration: histogram, recording request handling duration in milliseconds.
activitypub.actor.discovery: counter, incremented for actor handle discovery attempts.
activitypub.actor.discovery.duration: histogram, recording actor discovery duration in milliseconds.
Proposed attributes:
webfinger.resource.scheme: acct, https, or another URI scheme.
activitypub.discovery.result: resolved, not_found, invalid, network_error, not_acceptable, or error.
activitypub.remote.host: hostname only for outgoing lookup targets, when available.
http.response.status_code, when a remote HTTP response exists.
Do not include full handles, resource URIs, actor IDs, or lookup URLs as metric attributes.
Scope
Instrument outgoing WebFinger lookup APIs.
Instrument incoming WebFinger handler paths served by Fedify.
Instrument actor-handle lookup paths that resolve handles into actor URLs or actor objects.
Keep NodeInfo metrics out of scope unless they share implementation paths with WebFinger handling.
Update docs/manual/opentelemetry.md with metric names, units, and cardinality guidance.
Acceptance criteria
WebFinger lookup count and duration metrics are emitted for success and failure paths.
Incoming WebFinger handling metrics are emitted without exposing the queried resource string as a metric attribute.
Actor discovery metrics classify resolved, not-found, invalid, network-error, and thrown-error paths where Fedify can distinguish them.
Metrics use host-only remote attributes and avoid full handles or URLs.
Tests cover at least one successful WebFinger lookup and one failed lookup.
Documentation describes which discovery paths are covered.
Summary
Add OpenTelemetry metrics for WebFinger lookup and actor discovery paths, including lookup counts, durations, and outcomes.
Current state
Fedify already instruments
webfinger.lookup,webfinger.handle, andactivitypub.get_actor_handleas spans. These spans are useful when debugging one lookup, but they do not provide sample-independent metrics for discovery reliability or latency.Discovery failures are operationally noisy. A server can appear broken to users when handle resolution is slow, blocked by a remote firewall, or repeatedly returning malformed resource descriptors. Operators need aggregate metrics for those paths.
Proposed solution
Once #619 adds metrics support, add counters and histograms for WebFinger and actor discovery.
Proposed instruments:
webfinger.lookup: counter, incremented for outgoing WebFinger lookup attempts.webfinger.lookup.duration: histogram, recording outgoing lookup duration in milliseconds.webfinger.handle: counter, incremented for incoming WebFinger requests handled by Fedify.webfinger.handle.duration: histogram, recording request handling duration in milliseconds.activitypub.actor.discovery: counter, incremented for actor handle discovery attempts.activitypub.actor.discovery.duration: histogram, recording actor discovery duration in milliseconds.Proposed attributes:
webfinger.resource.scheme:acct,https, or another URI scheme.activitypub.discovery.result:resolved,not_found,invalid,network_error,not_acceptable, orerror.activitypub.remote.host: hostname only for outgoing lookup targets, when available.http.response.status_code, when a remote HTTP response exists.Do not include full handles, resource URIs, actor IDs, or lookup URLs as metric attributes.
Scope
docs/manual/opentelemetry.mdwith metric names, units, and cardinality guidance.Acceptance criteria
Open questions