Skip to content

feat: pre-register prometheus metrics at startup so /metrics is never empty #1119

@TaylorMutch

Description

@TaylorMutch

Problem Statement

The /metrics endpoint on port 9090 returns an empty body (200 OK, content-length: 0) until at least one request flows through the API server. This causes real operational problems:

  • Prometheus scrapes return empty results after pod restarts, producing gaps or missing series in dashboards
  • Alerting rules that fire on missing metrics produce false positives during startup
  • Dashboards show broken/zero-width series until the first real API request arrives, which can be minutes or hours after a fresh deploy

The root cause is that metrics-exporter-prometheus only adds a metric to the registry the first time a counter! / histogram! macro is called. Before that point, handle.render() returns an empty string.

Proposed Design

At server startup — immediately after PrometheusBuilder::new().install_recorder() in crates/openshell-server/src/lib.rs — call describe_counter! and describe_histogram! for every metric name defined in crates/openshell-server/src/multiplex.rs. This causes metrics-exporter-prometheus to emit those metrics with zero values from the first scrape, before any traffic arrives.

// in lib.rs, after install_recorder()
use metrics::{describe_counter, describe_histogram, Unit};

describe_counter!(
    "openshell_server_grpc_requests_total",
    "Total number of gRPC requests handled"
);
describe_histogram!(
    "openshell_server_grpc_request_duration_seconds",
    Unit::Seconds,
    "gRPC request duration in seconds"
);
describe_counter!(
    "openshell_server_http_requests_total",
    "Total number of HTTP requests handled"
);
describe_histogram!(
    "openshell_server_http_request_duration_seconds",
    Unit::Seconds,
    "HTTP request duration in seconds"
);

The describe_* macros are idempotent and zero-cost after the first call, so there is no risk of double-registration.

Alternatives Considered

  • Synthetic startup request: Send a fake internal request through the multiplexer to trigger the macros. Rejected — pollutes metrics with a spurious data point and is fragile.
  • Custom recorder wrapper: Wrap PrometheusBuilder to pre-seed the registry. More complex than necessary given describe_* already does this.
  • Accept empty metrics until first request: Current behavior. Unacceptable for production observability.

Agent Investigation

Investigated during local skaffold dev session on add-skaffold-tooling/tmutch.

Findings:

  • Metrics server starts on 0.0.0.0:9090 and responds 200 OK with content-length: 0 from first boot (confirmed via kubectl port-forward + curl -sv)
  • counter! / histogram! calls exist only in MultiplexedService::call() (crates/openshell-server/src/multiplex.rs lines 366–367 for gRPC, 392–393 for HTTP)
  • The health server (port 8081) and metrics server (port 9090) are separate Axum instances — requests to those ports never enter MultiplexedService, so liveness/readiness probes do not generate metrics
  • Port 8080 requires mTLS in the default config; plain HTTP health checks cannot trigger metric recording even accidentally
  • PrometheusBuilder::new().install_recorder() is called in crates/openshell-server/src/lib.rs around line 251 — the natural place to add describe_* calls

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions