Skip to content

Latest commit

 

History

History
132 lines (97 loc) · 4.69 KB

File metadata and controls

132 lines (97 loc) · 4.69 KB

Observability — metrics and traces

AutoControl exposes a Prometheus-compatible /metrics endpoint and an OpenTelemetry-style tracer for every action. Out of the box you get:

  • Per-action call counts and latency histograms.
  • Per-agent-step counters partitioned by tool name and outcome.
  • Span trees around the executor, the agent loop, and any user code wrapped with the :func:`traced` decorator.

The metric primitives are stdlib-only, so you do not need prometheus_client installed. If you do install it later — or opentelemetry-api for traces — AutoControl picks them up automatically.

Start the bundled HTTP exporter and point a scraper at it:

from je_auto_control import default_metrics_exporter

exporter = default_metrics_exporter()  # binds 127.0.0.1:9090
exporter.start()

Then in another process:

$ curl http://127.0.0.1:9090/metrics
# HELP autocontrol_action_calls_total Number of AC_* actions executed
# TYPE autocontrol_action_calls_total counter
autocontrol_action_calls_total{action="AC_screenshot",outcome="ok"} 42
...

Drop the URL into a Prometheus scrape config and you have a Grafana dashboard.

The executor and agent loop emit these automatically — you do not need to instrument anything yourself:

Metric Type Labels
autocontrol_action_calls_total Counter action, outcome (ok/error)
autocontrol_action_duration_seconds Histogram action
autocontrol_agent_runs_total Counter (none)
autocontrol_agent_steps_total Counter tool, outcome
autocontrol_agent_outcomes_total Counter outcome (succeeded/failed)

The histogram uses Prometheus' default bucket layout (5 ms → 10 s), which covers everything from a synchronous keystroke to a slow OCR pass.

Same primitives are exposed through the package facade:

from je_auto_control import (
    MetricCounter, MetricGauge, MetricHistogram, default_metric_registry,
)

registry = default_metric_registry()
widgets_built = registry.register(MetricCounter(
    "myapp_widgets_built_total",
    "Count of widgets generated by my pipeline.",
    label_names=("kind",),
))

widgets_built.inc(labels={"kind": "blue"})

Names follow Prometheus rules: snake_case, no dashes, must start with a letter or underscore. The registry rejects collisions so a typo can't silently fork a series.

The :func:`traced` decorator wraps any callable in a span. The default tracer is a no-op until opentelemetry-api is installed — when it is, your spans flow through whatever exporter you have configured (OTLP, Jaeger, etc.) without changing your call sites:

from je_auto_control import traced

@traced("my_pipeline.process_one")
def process_one(item):
    ...

For manual span control:

from je_auto_control import default_tracer

tracer = default_tracer()
with tracer.start_as_current_span("crop_and_ocr") as span:
    span.set_attribute("region", "header")
    ...

Recommended setup for a multi-host AutoControl daemon fleet:

  1. Each host calls default_metrics_exporter().start() on boot.
  2. Prometheus scrapes host:9090/metrics every 15 s.
  3. opentelemetry-api + opentelemetry-sdk + an OTLP exporter are installed for the tracing backend (Datadog / Honeycomb / Jaeger).
  4. Grafana dashboard alerts on:
    • rate(autocontrol_action_calls_total{outcome="error"}[5m]) > 0.1
    • histogram_quantile(0.99, rate(autocontrol_action_duration_seconds_bucket[5m])) > 2.0
    • up{job="autocontrol"} == 0

The exporter binds to 127.0.0.1 by default. To expose it to a scraper on another host, pass host="0.0.0.0" and put it behind a firewall or auth proxy — there is no authentication on /metrics.

Metrics include action names but no payload data, so leaking /metrics to an external scraper is low-risk on its own. Trace spans may carry the first 120 chars of agent goals and tool arguments — review your :func:`traced` call sites before sending traces to a third-party SaaS.