Problem Statement
The /metrics endpoint on port 9090 returns an empty body (200 OK, content-length: 0) until at least one request flows through the API server. This causes real operational problems:
- Prometheus scrapes return empty results after pod restarts, producing gaps or missing series in dashboards
- Alerting rules that fire on missing metrics produce false positives during startup
- Dashboards show broken/zero-width series until the first real API request arrives, which can be minutes or hours after a fresh deploy
The root cause is that metrics-exporter-prometheus only adds a metric to the registry the first time a counter! / histogram! macro is called. Before that point, handle.render() returns an empty string.
Proposed Design
At server startup — immediately after PrometheusBuilder::new().install_recorder() in crates/openshell-server/src/lib.rs — call describe_counter! and describe_histogram! for every metric name defined in crates/openshell-server/src/multiplex.rs. This causes metrics-exporter-prometheus to emit those metrics with zero values from the first scrape, before any traffic arrives.
// in lib.rs, after install_recorder()
use metrics::{describe_counter, describe_histogram, Unit};
describe_counter!(
"openshell_server_grpc_requests_total",
"Total number of gRPC requests handled"
);
describe_histogram!(
"openshell_server_grpc_request_duration_seconds",
Unit::Seconds,
"gRPC request duration in seconds"
);
describe_counter!(
"openshell_server_http_requests_total",
"Total number of HTTP requests handled"
);
describe_histogram!(
"openshell_server_http_request_duration_seconds",
Unit::Seconds,
"HTTP request duration in seconds"
);
The describe_* macros are idempotent and zero-cost after the first call, so there is no risk of double-registration.
Alternatives Considered
- Synthetic startup request: Send a fake internal request through the multiplexer to trigger the macros. Rejected — pollutes metrics with a spurious data point and is fragile.
- Custom recorder wrapper: Wrap
PrometheusBuilder to pre-seed the registry. More complex than necessary given describe_* already does this.
- Accept empty metrics until first request: Current behavior. Unacceptable for production observability.
Agent Investigation
Investigated during local skaffold dev session on add-skaffold-tooling/tmutch.
Findings:
- Metrics server starts on
0.0.0.0:9090 and responds 200 OK with content-length: 0 from first boot (confirmed via kubectl port-forward + curl -sv)
counter! / histogram! calls exist only in MultiplexedService::call() (crates/openshell-server/src/multiplex.rs lines 366–367 for gRPC, 392–393 for HTTP)
- The health server (port 8081) and metrics server (port 9090) are separate Axum instances — requests to those ports never enter
MultiplexedService, so liveness/readiness probes do not generate metrics
- Port 8080 requires mTLS in the default config; plain HTTP health checks cannot trigger metric recording even accidentally
PrometheusBuilder::new().install_recorder() is called in crates/openshell-server/src/lib.rs around line 251 — the natural place to add describe_* calls
Problem Statement
The
/metricsendpoint on port 9090 returns an empty body (200 OK,content-length: 0) until at least one request flows through the API server. This causes real operational problems:The root cause is that
metrics-exporter-prometheusonly adds a metric to the registry the first time acounter!/histogram!macro is called. Before that point,handle.render()returns an empty string.Proposed Design
At server startup — immediately after
PrometheusBuilder::new().install_recorder()incrates/openshell-server/src/lib.rs— calldescribe_counter!anddescribe_histogram!for every metric name defined incrates/openshell-server/src/multiplex.rs. This causesmetrics-exporter-prometheusto emit those metrics with zero values from the first scrape, before any traffic arrives.The
describe_*macros are idempotent and zero-cost after the first call, so there is no risk of double-registration.Alternatives Considered
PrometheusBuilderto pre-seed the registry. More complex than necessary givendescribe_*already does this.Agent Investigation
Investigated during local skaffold dev session on
add-skaffold-tooling/tmutch.Findings:
0.0.0.0:9090and responds200 OKwithcontent-length: 0from first boot (confirmed viakubectl port-forward+curl -sv)counter!/histogram!calls exist only inMultiplexedService::call()(crates/openshell-server/src/multiplex.rslines 366–367 for gRPC, 392–393 for HTTP)MultiplexedService, so liveness/readiness probes do not generate metricsPrometheusBuilder::new().install_recorder()is called incrates/openshell-server/src/lib.rsaround line 251 — the natural place to adddescribe_*calls