feat: pre-register prometheus metrics at startup so /metrics is never empty

## Problem Statement

The `/metrics` endpoint on port 9090 returns an empty body (`200 OK`, `content-length: 0`) until at least one request flows through the API server. This causes real operational problems:

- Prometheus scrapes return empty results after pod restarts, producing gaps or missing series in dashboards
- Alerting rules that fire on missing metrics produce false positives during startup
- Dashboards show broken/zero-width series until the first real API request arrives, which can be minutes or hours after a fresh deploy

The root cause is that `metrics-exporter-prometheus` only adds a metric to the registry the first time a `counter!` / `histogram!` macro is called. Before that point, `handle.render()` returns an empty string.

## Proposed Design

At server startup — immediately after `PrometheusBuilder::new().install_recorder()` in `crates/openshell-server/src/lib.rs` — call `describe_counter!` and `describe_histogram!` for every metric name defined in `crates/openshell-server/src/multiplex.rs`. This causes `metrics-exporter-prometheus` to emit those metrics with zero values from the first scrape, before any traffic arrives.

```rust
// in lib.rs, after install_recorder()
use metrics::{describe_counter, describe_histogram, Unit};

describe_counter!(
    "openshell_server_grpc_requests_total",
    "Total number of gRPC requests handled"
);
describe_histogram!(
    "openshell_server_grpc_request_duration_seconds",
    Unit::Seconds,
    "gRPC request duration in seconds"
);
describe_counter!(
    "openshell_server_http_requests_total",
    "Total number of HTTP requests handled"
);
describe_histogram!(
    "openshell_server_http_request_duration_seconds",
    Unit::Seconds,
    "HTTP request duration in seconds"
);
```

The `describe_*` macros are idempotent and zero-cost after the first call, so there is no risk of double-registration.

## Alternatives Considered

- **Synthetic startup request:** Send a fake internal request through the multiplexer to trigger the macros. Rejected — pollutes metrics with a spurious data point and is fragile.
- **Custom recorder wrapper:** Wrap `PrometheusBuilder` to pre-seed the registry. More complex than necessary given `describe_*` already does this.
- **Accept empty metrics until first request:** Current behavior. Unacceptable for production observability.

## Agent Investigation

Investigated during local skaffold dev session on `add-skaffold-tooling/tmutch`.

**Findings:**

- Metrics server starts on `0.0.0.0:9090` and responds `200 OK` with `content-length: 0` from first boot (confirmed via `kubectl port-forward` + `curl -sv`)
- `counter!` / `histogram!` calls exist only in `MultiplexedService::call()` (`crates/openshell-server/src/multiplex.rs` lines 366–367 for gRPC, 392–393 for HTTP)
- The health server (port 8081) and metrics server (port 9090) are separate Axum instances — requests to those ports never enter `MultiplexedService`, so liveness/readiness probes do not generate metrics
- Port 8080 requires mTLS in the default config; plain HTTP health checks cannot trigger metric recording even accidentally
- `PrometheusBuilder::new().install_recorder()` is called in `crates/openshell-server/src/lib.rs` around line 251 — the natural place to add `describe_*` calls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: pre-register prometheus metrics at startup so /metrics is never empty #1119

Problem Statement

Proposed Design

Alternatives Considered

Agent Investigation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: pre-register prometheus metrics at startup so /metrics is never empty #1119

Description

Problem Statement

Proposed Design

Alternatives Considered

Agent Investigation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions