Skip to content

Performance benchmarking tools for Fedify federation workloads #744

@dahlia

Description

@dahlia

Summary

Build a benchmarking tool for Fedify applications that can generate ActivityPub-specific load against a local, staging, or otherwise controlled target server and report latency, throughput, error rates, queue drain behavior, and signature verification cost.

Milestone 6 includes performance benchmarking tools that help developers understand how their applications perform under load and identify bottlenecks before they become production problems. Fedify currently has tracing docs and planned metrics work, but no purpose-built way to exercise inbox, discovery, signature verification, outbox/fanout, and queue paths in a repeatable benchmark.

The first version should probably live in @fedify/cli as fedify bench, unless package boundaries suggest a separate package such as @fedify/bench. It should produce terminal output by default and optionally write machine-readable JSON for CI or later dashboard ingestion.

Problem

Generic HTTP load tools such as autocannon, wrk, and k6 can measure request latency, but they do not understand ActivityPub. They do not sign inbox requests, construct realistic ActivityStreams payloads, vary activity types, model fanout size, track queue drain time, or test failure/retry cases in a way that maps to Fedify's operational model.

This leaves developers guessing about questions they need to answer before running a federated app in production:

  • How many signed inbox activities can this deployment process per second?
  • What is the p95 latency for signature verification and inbox handling?
  • How long does the outbox queue take to drain after a fanout burst?
  • Does performance change when the queue backend is SQLite, PostgreSQL, Redis, or in-process?
  • Which bottleneck appears first under load: HTTP handling, signature verification, database access, queue processing, or outbound delivery?
  • Did a change in Fedify or the application make federation throughput worse?

The tool should be safe by default. It should be aimed at local development, staging, and CI. Running it against a live production server should require explicit confirmation or a clearly named unsafe option, because some scenarios create real federation side effects.

Proposed solution

Add a benchmarking tool that can run a small set of ActivityPub-specific scenarios against a target Fedify application.

One possible user interface:

fedify bench inbox --target http://localhost:3000 --duration 60s --concurrency 50
fedify bench webfinger --target http://localhost:3000 --rate 200/s --duration 30s
fedify bench object --target http://localhost:3000 --objects 1000 --concurrency 25
fedify bench fanout --target http://localhost:3000 --actors 10 --followers 500 --queue-drain-timeout 2m
fedify bench scenario --file benchmarks/fanout.yml --target https://staging.example.com --output report.json

The scenario file in the last example would be benchmarks/fanout.yml.

The first version should focus on scenarios that exercise Fedify behavior rather than arbitrary HTTP benchmarking:

  • Signed inbox delivery with configurable activity type, payload size, duration, concurrency, and request rate.
  • WebFinger and actor discovery lookups with configurable handle sets and expected result mix.
  • Object and actor fetch requests for local Fedify endpoints.
  • Outbox or fanout burst scenarios where the tool can trigger a known local action and then measure queue drain time, if the target app exposes a test hook or benchmark fixture endpoint.
  • Failure scenarios for invalid signatures, missing actors, remote 404/410 responses, slow remote inboxes, and network errors where they can be simulated without contacting real fediverse peers.

The default output should be useful in a terminal:

Fedify inbox benchmark

Target: http://localhost:3000
Duration: 60s
Concurrency: 50
Requests: 18,240
Success rate: 99.4%
Throughput: 304 req/s
Latency p50: 24 ms
Latency p95: 91 ms
Latency p99: 184 ms
Signature verification p95: 12 ms
Queue drain p95: 1.8 s

Errors:
  401 signature_failed: 72
  500 handler_error: 31

The tool should also support JSON output for CI regression checks:

fedify bench inbox --target http://localhost:3000 --output benchmark-result.json

If the OpenTelemetry metrics work under #316 and #619 is available, the benchmark report should be able to include selected Fedify metrics from the run. The tool should not require a metrics backend for basic operation, but it should document how to correlate benchmark output with Prometheus, OpenTelemetry Collector, or the monitoring guide from #743.

Document the tool in a new manual page, probably docs/manual/benchmarking.md, and link it from docs/manual/deploy.md and docs/manual/opentelemetry.md where performance and monitoring are discussed. Update docs/.vitepress/config.mts so the page appears in the manual navigation.

Scope

  • Add a CLI or scriptable benchmark entry point for Fedify-specific workloads.
  • Prefer a controlled target app, local server, staging server, or CI harness over running directly on a production host through ssh.
  • Support terminal summary output and JSON output.
  • Include at least inbox and WebFinger/discovery scenarios in the first version.
  • Include queue drain or fanout benchmarking if a safe test hook pattern can be defined without requiring every app to expose private endpoints.
  • Provide documentation with concrete usage examples, safety guidance, and CI usage.
  • Keep long-running production monitoring, dashboard panels, and alert rules in Production monitoring dashboards and alerting guide #743.
  • Do not benchmark unrelated framework routing performance except as it affects Fedify request paths.
  • Do not contact arbitrary live fediverse peers by default.

Acceptance criteria

  • A developer can run a documented command against a local Fedify app and get latency, throughput, success rate, and error summaries.
  • At least one signed inbox benchmark scenario is implemented.
  • At least one discovery or fetch benchmark scenario is implemented.
  • The tool can write machine-readable JSON output suitable for CI comparison.
  • The documentation includes examples for local development, staging, and CI.
  • The documentation clearly warns against running write-heavy or federation-side-effect scenarios against production servers.
  • Benchmark requests avoid unbounded real-world federation side effects by default.
  • The benchmark output makes clear which numbers come from the benchmark client and which numbers come from Fedify/OpenTelemetry metrics, if metrics are used.
  • The implementation has tests for argument parsing, result aggregation, and at least one benchmark scenario using a local test server.
  • The manual page is listed in the VitePress sidebar.

Open questions

  • Should this live in @fedify/cli as fedify bench, or should it be a separate package such as @fedify/bench?
  • Should scenarios be configured only through command-line flags in the first version, or should YAML/JSON scenario files be supported from the start?
  • Should CI regression thresholds be part of the first version, for example --max-p95 100ms or --min-throughput 200/s?
  • How should fanout and queue-drain benchmarks trigger application behavior without forcing every Fedify app to expose benchmark-only routes?
  • Should the tool integrate directly with Prometheus/OpenTelemetry backends, or should it only produce JSON and rely on existing observability tools for metric correlation?
  • Should we include a minimal benchmark fixture app under test/bench/ or reuse the existing smoke-test harness?

Metadata

Metadata

Assignees

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions