You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Build a benchmarking tool for Fedify applications that can generate ActivityPub-specific load against a local, staging, or otherwise controlled target server and report latency, throughput, error rates, queue drain behavior, and signature verification cost.
Milestone 6 includes performance benchmarking tools that help developers understand how their applications perform under load and identify bottlenecks before they become production problems. Fedify currently has tracing docs and planned metrics work, but no purpose-built way to exercise inbox, discovery, signature verification, outbox/fanout, and queue paths in a repeatable benchmark.
The first version should probably live in @fedify/cli as fedify bench, unless package boundaries suggest a separate package such as @fedify/bench. It should produce terminal output by default and optionally write machine-readable JSON for CI or later dashboard ingestion.
Problem
Generic HTTP load tools such as autocannon, wrk, and k6 can measure request latency, but they do not understand ActivityPub. They do not sign inbox requests, construct realistic ActivityStreams payloads, vary activity types, model fanout size, track queue drain time, or test failure/retry cases in a way that maps to Fedify's operational model.
This leaves developers guessing about questions they need to answer before running a federated app in production:
How many signed inbox activities can this deployment process per second?
What is the p95 latency for signature verification and inbox handling?
How long does the outbox queue take to drain after a fanout burst?
Does performance change when the queue backend is SQLite, PostgreSQL, Redis, or in-process?
Which bottleneck appears first under load: HTTP handling, signature verification, database access, queue processing, or outbound delivery?
Did a change in Fedify or the application make federation throughput worse?
The tool should be safe by default. It should be aimed at local development, staging, and CI. Running it against a live production server should require explicit confirmation or a clearly named unsafe option, because some scenarios create real federation side effects.
Proposed solution
Add a benchmarking tool that can run a small set of ActivityPub-specific scenarios against a target Fedify application.
The scenario file in the last example would be benchmarks/fanout.yml.
The first version should focus on scenarios that exercise Fedify behavior rather than arbitrary HTTP benchmarking:
Signed inbox delivery with configurable activity type, payload size, duration, concurrency, and request rate.
WebFinger and actor discovery lookups with configurable handle sets and expected result mix.
Object and actor fetch requests for local Fedify endpoints.
Outbox or fanout burst scenarios where the tool can trigger a known local action and then measure queue drain time, if the target app exposes a test hook or benchmark fixture endpoint.
Failure scenarios for invalid signatures, missing actors, remote 404/410 responses, slow remote inboxes, and network errors where they can be simulated without contacting real fediverse peers.
The default output should be useful in a terminal:
Fedify inbox benchmark
Target: http://localhost:3000
Duration: 60s
Concurrency: 50
Requests: 18,240
Success rate: 99.4%
Throughput: 304 req/s
Latency p50: 24 ms
Latency p95: 91 ms
Latency p99: 184 ms
Signature verification p95: 12 ms
Queue drain p95: 1.8 s
Errors:
401 signature_failed: 72
500 handler_error: 31
The tool should also support JSON output for CI regression checks:
If the OpenTelemetry metrics work under #316 and #619 is available, the benchmark report should be able to include selected Fedify metrics from the run. The tool should not require a metrics backend for basic operation, but it should document how to correlate benchmark output with Prometheus, OpenTelemetry Collector, or the monitoring guide from #743.
Document the tool in a new manual page, probably docs/manual/benchmarking.md, and link it from docs/manual/deploy.md and docs/manual/opentelemetry.md where performance and monitoring are discussed. Update docs/.vitepress/config.mts so the page appears in the manual navigation.
Scope
Add a CLI or scriptable benchmark entry point for Fedify-specific workloads.
Prefer a controlled target app, local server, staging server, or CI harness over running directly on a production host through ssh.
Support terminal summary output and JSON output.
Include at least inbox and WebFinger/discovery scenarios in the first version.
Include queue drain or fanout benchmarking if a safe test hook pattern can be defined without requiring every app to expose private endpoints.
Provide documentation with concrete usage examples, safety guidance, and CI usage.
Do not benchmark unrelated framework routing performance except as it affects Fedify request paths.
Do not contact arbitrary live fediverse peers by default.
Acceptance criteria
A developer can run a documented command against a local Fedify app and get latency, throughput, success rate, and error summaries.
At least one signed inbox benchmark scenario is implemented.
At least one discovery or fetch benchmark scenario is implemented.
The tool can write machine-readable JSON output suitable for CI comparison.
The documentation includes examples for local development, staging, and CI.
The documentation clearly warns against running write-heavy or federation-side-effect scenarios against production servers.
Benchmark requests avoid unbounded real-world federation side effects by default.
The benchmark output makes clear which numbers come from the benchmark client and which numbers come from Fedify/OpenTelemetry metrics, if metrics are used.
The implementation has tests for argument parsing, result aggregation, and at least one benchmark scenario using a local test server.
The manual page is listed in the VitePress sidebar.
Open questions
Should this live in @fedify/cli as fedify bench, or should it be a separate package such as @fedify/bench?
Should scenarios be configured only through command-line flags in the first version, or should YAML/JSON scenario files be supported from the start?
Should CI regression thresholds be part of the first version, for example --max-p95 100ms or --min-throughput 200/s?
How should fanout and queue-drain benchmarks trigger application behavior without forcing every Fedify app to expose benchmark-only routes?
Should the tool integrate directly with Prometheus/OpenTelemetry backends, or should it only produce JSON and rely on existing observability tools for metric correlation?
Should we include a minimal benchmark fixture app under test/bench/ or reuse the existing smoke-test harness?
Summary
Build a benchmarking tool for Fedify applications that can generate ActivityPub-specific load against a local, staging, or otherwise controlled target server and report latency, throughput, error rates, queue drain behavior, and signature verification cost.
Milestone 6 includes performance benchmarking tools that help developers understand how their applications perform under load and identify bottlenecks before they become production problems. Fedify currently has tracing docs and planned metrics work, but no purpose-built way to exercise inbox, discovery, signature verification, outbox/fanout, and queue paths in a repeatable benchmark.
The first version should probably live in @fedify/cli as
fedify bench, unless package boundaries suggest a separate package such as @fedify/bench. It should produce terminal output by default and optionally write machine-readable JSON for CI or later dashboard ingestion.Problem
Generic HTTP load tools such as
autocannon,wrk, andk6can measure request latency, but they do not understand ActivityPub. They do not sign inbox requests, construct realistic ActivityStreams payloads, vary activity types, model fanout size, track queue drain time, or test failure/retry cases in a way that maps to Fedify's operational model.This leaves developers guessing about questions they need to answer before running a federated app in production:
The tool should be safe by default. It should be aimed at local development, staging, and CI. Running it against a live production server should require explicit confirmation or a clearly named unsafe option, because some scenarios create real federation side effects.
Proposed solution
Add a benchmarking tool that can run a small set of ActivityPub-specific scenarios against a target Fedify application.
One possible user interface:
The scenario file in the last example would be benchmarks/fanout.yml.
The first version should focus on scenarios that exercise Fedify behavior rather than arbitrary HTTP benchmarking:
404/410responses, slow remote inboxes, and network errors where they can be simulated without contacting real fediverse peers.The default output should be useful in a terminal:
The tool should also support JSON output for CI regression checks:
If the OpenTelemetry metrics work under #316 and #619 is available, the benchmark report should be able to include selected Fedify metrics from the run. The tool should not require a metrics backend for basic operation, but it should document how to correlate benchmark output with Prometheus, OpenTelemetry Collector, or the monitoring guide from #743.
Document the tool in a new manual page, probably docs/manual/benchmarking.md, and link it from docs/manual/deploy.md and docs/manual/opentelemetry.md where performance and monitoring are discussed. Update docs/.vitepress/config.mts so the page appears in the manual navigation.
Scope
ssh.Acceptance criteria
Open questions
fedify bench, or should it be a separate package such as @fedify/bench?--max-p95 100msor--min-throughput 200/s?