Move delta record/collect coordination from instrument to series level#8313
Move delta record/collect coordination from instrument to series level#8313jack-berg wants to merge 5 commits intoopen-telemetry:mainfrom
Conversation
| import java.util.concurrent.ConcurrentHashMap; | ||
| import java.util.logging.Level; | ||
|
|
||
| class CumulativeSynchronousMetricStorage<T extends PointData> |
There was a problem hiding this comment.
With this, the code paths for cumulative vs. delta are different enough that including them in the same file just makes everything harder to read / understand.
Separately, just look how simple the cumulative code paths are. With delta we jump through extra hoops required for record/collect coordination, and despite that still can't match the cumulative performance, which is just dirt simple.
| Aggregation aggregation, | ||
| MemoryMode memoryMode, | ||
| InstrumentValueType instrumentValueType) { | ||
| for (int repetition = 0; repetition < 50; repetition++) { |
There was a problem hiding this comment.
TODO: revert or make configurable before merging
Increasing the reps of this test really helps build confidence that experiments are correct.
A useful tool when iterating experiments, but increases the test runtime.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8313 +/- ##
============================================
+ Coverage 90.25% 90.30% +0.04%
- Complexity 7687 7724 +37
============================================
Files 850 852 +2
Lines 23198 23244 +46
Branches 2354 2359 +5
============================================
+ Hits 20938 20990 +52
+ Misses 1532 1530 -2
+ Partials 728 724 -4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Alternative to #8077. Where #8077 improves performance by striping the
AtomicIntegerused to coordinate between record and collect threads, this comes at the cost of single threaded performance because selecting the striped instance ofAtomicIntegeris additional work that is not necessary.With this PR, I also increase the number of coordinator
AtomicIntegerinstances to reduce contention, but do so by moving theAtomicIntegerfrom the instrument level (i.e. DeltaSynchronoousMetricStorage) to timeseries level (i.e. AggregatorHandle).This means that there's no additional work for the single threaded case, but callers recording against multiple series concurrently get to distribute the contention across as many AtomicInteger instances as there are series. You see this manifest in the benchmark diff in that single threaded cases don't change, and there are significant improvements in the threads=4, temporality=delta, cardinality=100 cases.
One downside of this approach is that multiple threads concurrently recording to a single timeseries still are bottlenecked by a single AtomicInteger. You see this manifest in the benchmark diff in that threads=4, temporality=delta, cardinality=1 cases don't improve.
Note, the benchmarks below incorporate the changes from #8308, which leads to much less variance/noise than previously.