Conversation
|
Analysis performed by Claude, data shows code size in bytes: Update path (
|
| Method | EventPolicy | NoEventPolicy | Δ |
|---|---|---|---|
AddOrUpdate |
897 | 802 | −95 |
Maintenance path (DoMaintenance → Evict → EventInliner.OnRemovedEvent)
| Method | EventPolicy | NoEventPolicy | Δ |
|---|---|---|---|
AddOrUpdate (drives writes) |
977 | 889 | −88 |
OnWrite (drains buffered removes) |
1,503 | 1,391 | −112 |
EvictFromMain (eviction loop) |
1,653 | 1,545 | −108 |
Evict (single-victim teardown) |
358 | 260 | −98 |
EventPolicy.OnItemRemoved (standalone) |
112 | not emitted | −112 |
Maintenance |
2,467 | 2,495 | +28 (layout) |
DoMaintenance |
1,285 | 1,302 | +17 (layout) |
AfterWrite, EvictFromWindow, OptimizePartitioning, TryScheduleDrain, ScheduleAfterWrite, AdmitCandidate |
identical | identical | 0 |
Assembly evidence
In EventPolicy Evict (and OnWrite):
mov rcx, offset MT_BitFaster.Caching.ItemRemovedEventArgs<Int32, Int32>
call CORINFO_HELP_NEWSFAST ; allocate args
...
call qword ptr [7FFC...] ; EventPolicy.OnItemRemoved(Int32, Int32, ItemRemovedReason)
call qword ptr [r14+18] ; invoke delegateIn NoEventPolicy Evict (and OnWrite): zero matches for ItemRemovedEventArgs, OnItemRemoved, or eventPolicy field loads. The epilogue goes straight from evictedCount++ to ret.
Verdict
EventInliner.IsEnabled = typeof(E) == typeof(EventPolicy<K,V>) is folded as a JIT-time constant per generic instantiation, eliminating the event branch and everything inside it: oldValue capture, delegate field loads, null check, args allocation, invocation.
What causes the layout difference to increase code size?
The JIT compiles ConcurrentLfuCore<...EventPolicy> and ConcurrentLfuCore<...NoEventPolicy> as two separate methods
(struct generics → distinct codegen, no canonicalization). Even when the source is identical and no instructions are
added/removed, the emitted byte count can drift by tens of bytes from second-order effects:
- Branch encoding. x86 has jmp short (2 bytes, ±127B reach) vs jmp near (5 bytes). When the surrounding code shrinks
because events were elided, some forward jumps that were near can switch to short — or vice versa. A single such flip
is ±3 bytes. - Register allocation differences. Registers r8–r15 require a 1-byte REX prefix; rax–rdi don't. Different live-range
pressure between the two instantiations can shift one variable from rdi to r12, which silently grows every instruction
touching it. - Basic-block reordering. The JIT orders blocks by edge weight / heuristics. Different inlining decisions in callees
can change perceived hotness and reorder blocks, which changes which edges are fall-through vs. branch. - Alignment padding. The JIT inserts NOPs ahead of loop heads (often 16-byte alignment). When earlier code shrinks,
the loop head's natural address shifts, so the padding changes. - Profile counter placement. The tier-1 JIT inserts CORINFO_HELP_COUNTPROFILE32 calls; placement isn't bitwise
identical across instantiations.
End-to-end latency: main vs this branch (NoEventPolicy)
The |
|
Before (main)
After (8d4ee12)
Not clear why threadpool became so fast as a one off, does not repro. |
This PR implements events for
ConcurrentLfuas a switchable policy. When disabled all event code is fully elided at runtime by the JIT compiler.Events are considered perf critical in
ConcurrentLfubecause the ItemRemoved logic is invoked as part of the maintenance cycle, this introduces overhead even when there are no events registered. The Maintenance method latency determines cache throughput at the limit, so any overhead here is not desired. Later, these events could be captured in a list and processed asynchronously via the scheduler.In this implementation, calling
TryRemovedefers event processing to the maintenance cycle (thusTryRemoveand policy based eviction behave the same), whereasTryUpdateexecutes the event handler immediately whenTryUpdateis called.Based on this earlier PR: #727