Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
c1ab17f
Initial plan
Copilot Jan 27, 2026
e767a4c
Implement cache analytics and observability framework
Copilot Jan 27, 2026
b1eaa4a
Add metrics documentation and fix linting issues
Copilot Jan 27, 2026
bbd24f2
Add comprehensive implementation documentation
Copilot Jan 27, 2026
2eb9f00
Merge branch 'master' into copilot/add-cache-analytics-framework
Borda Jan 27, 2026
769da0d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 27, 2026
797e95f
Add `assert` to ensure `start_time` is not `None` before latency reco…
Borda Jan 27, 2026
6beb71c
Update README.rst
Borda Jan 27, 2026
3058526
Update examples/metrics_example.py
Borda Jan 27, 2026
070a585
Update src/cachier/metrics.py
Borda Jan 27, 2026
dd53b16
Address PR review feedback - complete implementation
Copilot Jan 27, 2026
c73c838
Address remaining PR review feedback
Copilot Jan 27, 2026
f2948b4
Merge branch 'master' into copilot/add-cache-analytics-framework
Borda Jan 27, 2026
6f82691
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 27, 2026
4f69bce
Merge branch 'master' into copilot/add-cache-analytics-framework
Borda Jan 29, 2026
8b4da10
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 29, 2026
bf77008
Merge branch 'master' into copilot/add-cache-analytics-framework
Borda Jan 30, 2026
c6aef7e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 30, 2026
ea89041
Apply suggestions from code review
Borda Jan 30, 2026
fad7009
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 30, 2026
1edaf30
Address PR review feedback - code quality improvements
Copilot Jan 30, 2026
93be090
Refactor metrics example to use single formatted print statement
Copilot Jan 30, 2026
5802211
Consolidate prometheus metric headers and fix imports
Copilot Jan 30, 2026
b1a1878
Merge branch 'master' into copilot/add-cache-analytics-framework
Borda Mar 6, 2026
dad326d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 6, 2026
c40189a
Align S3 backend with metrics framework
Copilot Mar 6, 2026
db49238
Merge branch 'master' into copilot/add-cache-analytics-framework
Borda Mar 16, 2026
244c42a
Refactor Prometheus exporter to use `_get_func_metrics` helper for cl…
Borda Mar 16, 2026
586e3fb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 16, 2026
2d67baa
Update linters' configurations and clean up docstring conventions
Borda Mar 16, 2026
42b38c7
Merge branch 'copilot/add-cache-analytics-framework' of https://githu…
Borda Mar 16, 2026
ab141a3
Update linters' configurations and clean up docstring conventions
Borda Mar 16, 2026
aec53fe
Fix metrics framework: async instrumentation, Prometheus consistency,…
Borda Mar 16, 2026
007212b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 16, 2026
46519e0
Achieve 100% coverage on metrics and exporters modules
Borda Mar 16, 2026
7a45179
Refactor metrics examples: modularize examples into functions and add…
Borda Mar 16, 2026
002b105
Refactor Prometheus exporter and cache metrics framework
Borda Mar 16, 2026
3d16227
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 16, 2026
5735408
Refactor: compact Prometheus client imports and docstrings in metrics…
Borda Mar 16, 2026
de75877
Merge branch 'copilot/add-cache-analytics-framework' of https://githu…
Borda Mar 16, 2026
0a367cd
Refactor: prefix `set_entry` and `aset_entry` with `_` across all cor…
Borda Mar 16, 2026
bdb9059
Refactor: replace `set_entry` with `_set_entry` in async methods acro…
Borda Mar 16, 2026
265b844
Refactor: rename `MetricsContext` variable to `_mctx` for consistent …
Borda Mar 16, 2026
6a2c6c0
Refactor: update monkeypatching to reflect `_set_entry` and `_aset_en…
Borda Mar 16, 2026
3fb8990
Apply suggestions from code review
Borda Mar 16, 2026
a154d7f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 16, 2026
76e64b0
Refactor: simplify cutoff calculation in metrics using ternary operator
Borda Mar 16, 2026
3158564
Refactor: rename `set_entry` to `_set_entry`, refine size-limit logic…
Borda Mar 16, 2026
bc49e19
Add tests for metrics: validate `entry_count` and `total_size_bytes` …
Borda Mar 16, 2026
a30fb90
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 16, 2026
8211f2d
Add tests for `_BaseCore`: metric hooks default values and timeout be…
Borda Mar 16, 2026
3070f4f
Add tests for metrics: refactor sampling rate tests and add Prometheu…
Borda Mar 16, 2026
75d22b3
Remove outdated tests for overwrite/skip cache and Prometheus exporte…
Borda Mar 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ Current features

* Thread-safety.
* **Per-call max age:** Specify a maximum age for cached values per call.
* **Cache analytics and observability:** Track cache performance metrics including hit rates, latencies, and more.

Cachier is **NOT**:

Expand Down Expand Up @@ -325,6 +326,102 @@ Cache `None` Values
By default, ``cachier`` does not cache ``None`` values. You can override this behaviour by passing ``allow_none=True`` to the function call.


Cache Analytics and Observability
==================================

Cachier provides built-in metrics collection to monitor cache performance in production environments. This feature is particularly useful for understanding cache effectiveness, identifying optimization opportunities, and debugging performance issues.

Enabling Metrics
----------------

Enable metrics by setting ``enable_metrics=True`` when decorating a function:

.. code-block:: python

from cachier import cachier

@cachier(backend='memory', enable_metrics=True)
def expensive_operation(x):
return x ** 2

# Access metrics
stats = expensive_operation.metrics.get_stats()
print(f"Hit rate: {stats.hit_rate}%")
print(f"Avg latency: {stats.avg_latency_ms}ms")

Tracked Metrics
---------------

The metrics system tracks:

* **Cache hits and misses**: Number of cache hits/misses and hit rate percentage
* **Operation latencies**: Average time for cache operations
* **Stale cache hits**: Number of times stale cache entries were accessed
* **Recalculations**: Count of cache recalculations triggered
* **Wait timeouts**: Timeouts during concurrent calculation waits
* **Size limit rejections**: Entries rejected due to ``entry_size_limit``
* **Cache size (memory backend only)**: Number of entries and total size in bytes for the in-memory cache core

Sampling Rate
-------------

For high-traffic functions, you can reduce overhead by sampling a fraction of operations:

.. code-block:: python

@cachier(enable_metrics=True, metrics_sampling_rate=0.1) # Sample 10% of calls
def high_traffic_function(x):
return x * 2

Exporting to Prometheus
------------------------

Export metrics to Prometheus for monitoring and alerting:

.. code-block:: python

from cachier import cachier
from cachier.exporters import PrometheusExporter

@cachier(backend='redis', enable_metrics=True)
def my_operation(x):
return x ** 2

# Set up Prometheus exporter
# use_prometheus_client controls whether metrics are exposed via the prometheus_client
# registry (True) or via Cachier's own HTTP handler (False). In both modes, metrics for
# registered functions are collected live at scrape time.
exporter = PrometheusExporter(port=9090, use_prometheus_client=True)
exporter.register_function(my_operation)
exporter.start()

# Metrics available at http://localhost:9090/metrics

The exporter provides metrics in Prometheus text format, compatible with standard Prometheus scraping, in both ``use_prometheus_client=True`` and ``use_prometheus_client=False`` modes. When ``use_prometheus_client=True``, Cachier registers a custom collector with ``prometheus_client`` that pulls live statistics from registered functions at scrape time, so scraped values reflect the current state of the cache. When ``use_prometheus_client=False``, Cachier serves the same metrics directly without requiring the ``prometheus_client`` dependency.

Programmatic Access
-------------------

Access metrics programmatically for custom monitoring:

.. code-block:: python

stats = my_function.metrics.get_stats()

if stats.hit_rate < 70.0:
print(f"Warning: Cache hit rate is {stats.hit_rate}%")
print(f"Consider increasing cache size or adjusting stale_after")

Reset Metrics
-------------

Clear collected metrics:

.. code-block:: python

my_function.metrics.reset()


Cachier Cores
=============

Expand Down
231 changes: 231 additions & 0 deletions examples/metrics_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
"""Demonstration of cachier's metrics and observability features."""

import time
from datetime import timedelta

from cachier import cachier


def demo_basic_metrics_tracking():
"""Demonstrate basic metrics tracking."""
print("=" * 60)
print("Example 1: Basic Metrics Tracking")
print("=" * 60)

@cachier(backend="memory", enable_metrics=True)
def expensive_operation(x):
"""Simulate an expensive computation."""
time.sleep(0.1) # Simulate work
return x**2

expensive_operation.clear_cache()

# First call - cache miss
print("\nFirst call (cache miss):")
result1 = expensive_operation(5)
print(f" Result: {result1}")

stats = expensive_operation.metrics.get_stats()
print(f" Hits: {stats.hits}, Misses: {stats.misses}")
print(f" Hit rate: {stats.hit_rate:.1f}%")
print(f" Avg latency: {stats.avg_latency_ms:.2f}ms")

# Second call - cache hit
print("\nSecond call (cache hit):")
result2 = expensive_operation(5)
print(f" Result: {result2}")

stats = expensive_operation.metrics.get_stats()
print(f" Hits: {stats.hits}, Misses: {stats.misses}")
print(f" Hit rate: {stats.hit_rate:.1f}%")
print(f" Avg latency: {stats.avg_latency_ms:.2f}ms")

# Third call with different argument - cache miss
print("\nThird call with different argument (cache miss):")
result3 = expensive_operation(10)
print(f" Result: {result3}")

stats = expensive_operation.metrics.get_stats()
print(f" Hits: {stats.hits}, Misses: {stats.misses}")
print(f" Hit rate: {stats.hit_rate:.1f}%")
print(f" Avg latency: {stats.avg_latency_ms:.2f}ms")
print(f" Total calls: {stats.total_calls}")


def demo_stale_cache_tracking():
"""Demonstrate stale cache tracking."""
print("\n" + "=" * 60)
print("Example 2: Stale Cache Tracking")
print("=" * 60)

@cachier(
backend="memory",
enable_metrics=True,
stale_after=timedelta(seconds=1),
next_time=False,
)
def time_sensitive_operation(x):
"""Operation with stale_after configured."""
return x * 2

time_sensitive_operation.clear_cache()

# Initial call
print("\nInitial call:")
result = time_sensitive_operation(5)
print(f" Result: {result}")

# Call while fresh
print("\nCall while fresh (within 1 second):")
result = time_sensitive_operation(5)
print(f" Result: {result}")

# Wait for cache to become stale
print("\nWaiting for cache to become stale...")
time.sleep(1.5)

# Call after stale
print("Call after cache is stale:")
result = time_sensitive_operation(5)
print(f" Result: {result}")

stats = time_sensitive_operation.metrics.get_stats()
print("\nMetrics after stale access:")
print(f" Hits: {stats.hits}")
print(f" Stale hits: {stats.stale_hits}")
print(f" Recalculations: {stats.recalculations}")


def demo_metrics_sampling():
"""Demonstrate metrics sampling to reduce overhead."""
print("\n" + "=" * 60)
print("Example 3: Metrics Sampling (50% sampling rate)")
print("=" * 60)

@cachier(
backend="memory",
enable_metrics=True,
metrics_sampling_rate=0.5, # Only sample 50% of calls
)
def sampled_operation(x):
"""Operation with reduced metrics sampling."""
return x + 1

sampled_operation.clear_cache()

# Make many calls
print("\nMaking 100 calls with 10 unique arguments...")
for i in range(100):
sampled_operation(i % 10)

stats = sampled_operation.metrics.get_stats()
print("\nMetrics (with 50% sampling):")
print(f" Total calls recorded: {stats.total_calls}")
print(f" Hits: {stats.hits}")
print(f" Misses: {stats.misses}")
print(f" Hit rate: {stats.hit_rate:.1f}%")
print(" Note: Total calls < 100 due to sampling; hit rate is approximately representative of overall behavior.")


def demo_comprehensive_metrics():
"""Demonstrate a comprehensive metrics snapshot."""
print("\n" + "=" * 60)
print("Example 4: Comprehensive Metrics Snapshot")
print("=" * 60)

@cachier(backend="memory", enable_metrics=True, entry_size_limit="1KB")
def comprehensive_operation(x):
"""Operation to demonstrate all metrics."""
if x > 1000:
# Return large data to trigger size limit rejection
return "x" * 2000
return x * 2

comprehensive_operation.clear_cache()

# Generate various metric events
comprehensive_operation(5) # Miss + recalculation
comprehensive_operation(5) # Hit
comprehensive_operation(10) # Miss + recalculation
comprehensive_operation(2000) # Size limit rejection

stats = comprehensive_operation.metrics.get_stats()
print(
f"\nComplete metrics snapshot:\n"
f" Hits: {stats.hits}\n"
f" Misses: {stats.misses}\n"
f" Hit rate: {stats.hit_rate:.1f}%\n"
f" Total calls: {stats.total_calls}\n"
f" Avg latency: {stats.avg_latency_ms:.2f}ms\n"
f" Stale hits: {stats.stale_hits}\n"
f" Recalculations: {stats.recalculations}\n"
f" Wait timeouts: {stats.wait_timeouts}\n"
f" Size limit rejections: {stats.size_limit_rejections}\n"
f" Entry count: {stats.entry_count}\n"
f" Total size (bytes): {stats.total_size_bytes}"
)


def demo_programmatic_monitoring():
"""Demonstrate programmatic cache health monitoring."""
print("\n" + "=" * 60)
print("Example 5: Programmatic Monitoring")
print("=" * 60)

@cachier(backend="memory", enable_metrics=True)
def monitored_operation(x):
"""Operation being monitored."""
return x**3

monitored_operation.clear_cache()

def check_cache_health(func, threshold=80.0):
"""Check if cache hit rate meets threshold."""
stats = func.metrics.get_stats()
if stats.total_calls == 0:
return True, "No calls yet"

if stats.hit_rate >= threshold:
return True, f"Hit rate {stats.hit_rate:.1f}% meets threshold"
else:
return (
False,
f"Hit rate {stats.hit_rate:.1f}% below threshold {threshold}%",
)

# Simulate some usage
print("\nSimulating cache usage...")
for i in range(20):
monitored_operation(i % 5)

# Check health
is_healthy, message = check_cache_health(monitored_operation, threshold=70.0)
print("\nCache health check:")
print(f" Status: {'OK HEALTHY' if is_healthy else 'UNHEALTHY'}")
print(f" {message}")

stats = monitored_operation.metrics.get_stats()
print(f" Details: {stats.hits} hits, {stats.misses} misses")


def main():
"""Run all metrics demonstration examples."""
demo_basic_metrics_tracking()
demo_stale_cache_tracking()
demo_metrics_sampling()
demo_comprehensive_metrics()
demo_programmatic_monitoring()

print("\n" + "=" * 60)
print("Examples complete!")
print("=" * 60)
print("\nKey takeaways:")
print(" - Metrics are opt-in via enable_metrics=True")
print(" - Access metrics via function.metrics.get_stats()")
print(" - Sampling reduces overhead for high-traffic functions")
print(" - Metrics are thread-safe and backend-agnostic")
print(" - Use for production monitoring and optimization")


if __name__ == "__main__":
main()
Loading
Loading