Skip to content

Move delta record/collect coordination from instrument to series level#8313

Open
jack-berg wants to merge 5 commits intoopen-telemetry:mainfrom
jack-berg:delta-aggregator-handle-coordination-1
Open

Move delta record/collect coordination from instrument to series level#8313
jack-berg wants to merge 5 commits intoopen-telemetry:mainfrom
jack-berg:delta-aggregator-handle-coordination-1

Conversation

@jack-berg
Copy link
Copy Markdown
Member

Alternative to #8077. Where #8077 improves performance by striping the AtomicInteger used to coordinate between record and collect threads, this comes at the cost of single threaded performance because selecting the striped instance of AtomicInteger is additional work that is not necessary.

With this PR, I also increase the number of coordinator AtomicInteger instances to reduce contention, but do so by moving the AtomicInteger from the instrument level (i.e. DeltaSynchronoousMetricStorage) to timeseries level (i.e. AggregatorHandle).

  • Each timeseries get its own AtomicInteger
  • At record time, the record thread increments / decrements the timeseries' AtomicInteger instance
  • At collect time, the collect thread marks all timeseries AtomicInteger instances for collect, then waits for all active records to complete, then collects

This means that there's no additional work for the single threaded case, but callers recording against multiple series concurrently get to distribute the contention across as many AtomicInteger instances as there are series. You see this manifest in the benchmark diff in that single threaded cases don't change, and there are significant improvements in the threads=4, temporality=delta, cardinality=100 cases.

One downside of this approach is that multiple threads concurrently recording to a single timeseries still are bottlenecked by a single AtomicInteger. You see this manifest in the benchmark diff in that threads=4, temporality=delta, cardinality=1 cases don't improve.

Note, the benchmarks below incorporate the changes from #8308, which leads to much less variance/noise than previously.

Threads Temporality Cardinality Instrument 0_baseline (ops/s) 2_aggregator_handle_tracking_minamlist (ops/s) Δ ops/s Δ %
1 DELTA 1 COUNTER_SUM 139,309,882 126,772,929 -12,536,953 -9.0%
1 DELTA 1 UP_DOWN_COUNTER_SUM 138,953,746 133,244,856 -5,708,890 -4.1%
1 DELTA 1 GAUGE_LAST_VALUE 37,205,058 41,952,132 +4,747,074 +12.8%
1 DELTA 1 HISTOGRAM_EXPLICIT 76,174,648 72,036,559 -4,138,089 -5.4%
1 DELTA 1 HISTOGRAM_BASE2_EXPONENTIAL 43,956,180 41,842,933 -2,113,247 -4.8%
1 DELTA 100 COUNTER_SUM 88,390,625 101,415,374 +13,024,749 +14.7%
1 DELTA 100 UP_DOWN_COUNTER_SUM 100,682,536 104,228,731 +3,546,195 +3.5%
1 DELTA 100 GAUGE_LAST_VALUE 31,129,986 30,153,053 -976,933 -3.1%
1 DELTA 100 HISTOGRAM_EXPLICIT 79,728,720 71,241,352 -8,487,368 -10.6%
1 DELTA 100 HISTOGRAM_BASE2_EXPONENTIAL 41,119,222 42,712,082 +1,592,860 +3.9%
1 CUMULATIVE 1 COUNTER_SUM 156,207,611 156,051,490 -156,121 -0.1%
1 CUMULATIVE 1 UP_DOWN_COUNTER_SUM 164,723,673 164,464,216 -259,457 -0.2%
1 CUMULATIVE 1 GAUGE_LAST_VALUE 63,483,186 67,329,671 +3,846,485 +6.1%
1 CUMULATIVE 1 HISTOGRAM_EXPLICIT 102,708,149 102,593,029 -115,120 -0.1%
1 CUMULATIVE 1 HISTOGRAM_BASE2_EXPONENTIAL 46,786,722 47,702,621 +915,899 +2.0%
1 CUMULATIVE 100 COUNTER_SUM 93,234,874 92,660,085 -574,789 -0.6%
1 CUMULATIVE 100 UP_DOWN_COUNTER_SUM 102,773,896 93,401,286 -9,372,610 -9.1%
1 CUMULATIVE 100 GAUGE_LAST_VALUE 90,948,347 87,231,953 -3,716,394 -4.1%
1 CUMULATIVE 100 HISTOGRAM_EXPLICIT 78,204,809 77,987,516 -217,293 -0.3%
1 CUMULATIVE 100 HISTOGRAM_BASE2_EXPONENTIAL 44,678,570 44,894,584 +216,014 +0.5%
4 DELTA 1 COUNTER_SUM 20,959,707 16,469,448 -4,490,259 -21.4%
4 DELTA 1 UP_DOWN_COUNTER_SUM 20,991,865 16,210,248 -4,781,617 -22.8%
4 DELTA 1 GAUGE_LAST_VALUE 19,358,172 18,777,835 -580,337 -3.0%
4 DELTA 1 HISTOGRAM_EXPLICIT 11,868,012 11,091,512 -776,500 -6.5%
4 DELTA 1 HISTOGRAM_BASE2_EXPONENTIAL 10,236,804 10,606,743 +369,939 +3.6%
4 DELTA 100 COUNTER_SUM 24,494,401 73,211,762 +48,717,361 +198.9%
4 DELTA 100 UP_DOWN_COUNTER_SUM 24,520,580 67,246,731 +42,726,151 +174.2%
4 DELTA 100 GAUGE_LAST_VALUE 22,992,790 47,030,541 +24,037,751 +104.5%
4 DELTA 100 HISTOGRAM_EXPLICIT 23,253,146 56,965,510 +33,712,364 +145.0%
4 DELTA 100 HISTOGRAM_BASE2_EXPONENTIAL 21,707,199 57,301,938 +35,594,739 +164.0%
4 CUMULATIVE 1 COUNTER_SUM 79,758,391 75,378,143 -4,380,248 -5.5%
4 CUMULATIVE 1 UP_DOWN_COUNTER_SUM 73,199,378 73,274,059 +74,681 +0.1%
4 CUMULATIVE 1 GAUGE_LAST_VALUE 30,208,943 30,232,185 +23,242 +0.1%
4 CUMULATIVE 1 HISTOGRAM_EXPLICIT 18,515,768 18,585,553 +69,785 +0.4%
4 CUMULATIVE 1 HISTOGRAM_BASE2_EXPONENTIAL 18,665,724 16,009,460 -2,656,264 -14.2%
4 CUMULATIVE 100 COUNTER_SUM 108,295,650 99,551,354 -8,744,296 -8.1%
4 CUMULATIVE 100 UP_DOWN_COUNTER_SUM 107,571,455 101,067,120 -6,504,335 -6.0%
4 CUMULATIVE 100 GAUGE_LAST_VALUE 95,965,166 97,450,271 +1,485,105 +1.5%
4 CUMULATIVE 100 HISTOGRAM_EXPLICIT 78,957,306 76,630,707 -2,326,599 -2.9%
4 CUMULATIVE 100 HISTOGRAM_BASE2_EXPONENTIAL 73,614,813 73,852,367 +237,554 +0.3%

@jack-berg jack-berg requested a review from a team as a code owner April 21, 2026 20:37
import java.util.concurrent.ConcurrentHashMap;
import java.util.logging.Level;

class CumulativeSynchronousMetricStorage<T extends PointData>
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this, the code paths for cumulative vs. delta are different enough that including them in the same file just makes everything harder to read / understand.

Separately, just look how simple the cumulative code paths are. With delta we jump through extra hoops required for record/collect coordination, and despite that still can't match the cumulative performance, which is just dirt simple.

Aggregation aggregation,
MemoryMode memoryMode,
InstrumentValueType instrumentValueType) {
for (int repetition = 0; repetition < 50; repetition++) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: revert or make configurable before merging

Increasing the reps of this test really helps build confidence that experiments are correct.

A useful tool when iterating experiments, but increases the test runtime.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 98.19277% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.30%. Comparing base (6336abd) to head (ba71652).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
.../internal/state/DeltaSynchronousMetricStorage.java 98.37% 0 Missing and 2 partials ⚠️
...rnal/state/CumulativeSynchronousMetricStorage.java 97.61% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #8313      +/-   ##
============================================
+ Coverage     90.25%   90.30%   +0.04%     
- Complexity     7687     7724      +37     
============================================
  Files           850      852       +2     
  Lines         23198    23244      +46     
  Branches       2354     2359       +5     
============================================
+ Hits          20938    20990      +52     
+ Misses         1532     1530       -2     
+ Partials        728      724       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant