Skip to content

Latest commit

 

History

History
40 lines (38 loc) · 7.13 KB

File metadata and controls

40 lines (38 loc) · 7.13 KB

Knowledge Base Index

Optimization techniques, experiment results, and lessons learned from AAE sessions contributed by the community. Agents should read this index to find entries relevant to their current problem, then read the linked files for details.

# Title Problem Domain Key Technique File
001 Cache-friendly blocked recursive sorting Sorting, large arrays Blocked partitioning with cache-line-sized buffers 001-blocked-recursive-sorting.md
002 Gradient accumulation as batch size proxy Transformer training, limited VRAM Gradient accumulation to simulate large batches 002-batch-size-tuning-transformer.md
003 Arena allocation for JSON parsing Parsing, memory allocation Arena allocator to eliminate per-node heap allocation 003-arena-allocation-json-parsing.md
004 SoA vs AoS for cache efficiency Data layout, particle simulation Structure of Arrays to maximize cache line utilization 004-soa-vs-aos-cache-efficiency.md
005 Branchless programming Array processing, partitioning Arithmetic substitution for conditional branches 005-branchless-programming.md
006 Small buffer optimization String/container allocation Inline buffer to avoid heap allocation for small objects 006-small-buffer-optimization.md
007 Move semantics avoiding deep copies Container transfers, pipelines RVO/NRVO and std::move for O(1) ownership transfer 007-move-semantics-avoiding-copies.md
008 constexpr compile-time computation Lookup tables, constants Compile-time evaluation to eliminate runtime initialization 008-constexpr-compile-time-computation.md
009 False sharing avoidance Multithreaded counters, parallel scaling Cache line padding to prevent cross-core invalidation 009-false-sharing-avoidance.md
010 PGO + LTO compiler optimizations Whole-program optimization Profile-guided and link-time optimization for 20%+ gains 010-pgo-lto-compiler-optimizations.md
011 Memory-mapped I/O for large files File processing, large datasets mmap for zero-copy lazy file access 011-mmap-large-file-processing.md
012 Open-addressing hash maps Lookup-heavy workloads Flat hash maps (absl/robin_hood) vs std::unordered_map 012-open-addressing-hash-maps.md
013 SIMD vectorization for batch operations Array math, distance computation Explicit AVX2/SSE intrinsics for 4-16x element parallelism 013-simd-vectorization-batch-ops.md
014 Loop tiling for matrix operations Matrix multiply, cache thrashing Blocking iteration into cache-resident tiles 014-loop-tiling-matrix-operations.md
015 Compiler intrinsics for bit operations Popcount, clz/ctz, Hamming distance Hardware bit instructions via builtins/C++20 015-builtin-intrinsics-bit-operations.md
016 Reserve and preallocate containers STL containers, bulk insertion reserve() to eliminate reallocations and copies 016-reserve-preallocate-containers.md
017 Hot-cold data splitting Routing tables, large structs Separate hot fields into dense array for cache density 017-hot-cold-data-splitting.md
018 std::string_view avoiding copies Text parsing, function parameters Non-owning string references to eliminate allocations 018-string-view-avoiding-copies.md
019 NumPy vectorization over Python loops Array math, distance computation Vectorized NumPy ops to bypass interpreter overhead 019-numpy-vectorization-over-loops.md
020 slots for memory reduction Object-heavy programs, graphs Eliminate per-instance dict for 69% memory savings 020-slots-memory-reduction.md
021 Generator expressions for memory efficiency Data pipelines, streaming Lazy evaluation with O(1) memory per pipeline stage 021-generator-expressions-memory.md
022 Numba JIT for numerical computation Monte Carlo, simulations LLVM JIT compilation for C-speed Python loops 022-numba-jit-numerical-computation.md
023 multiprocessing for CPU-bound work Image processing, batch computation OS processes to bypass GIL for true parallelism 023-multiprocessing-cpu-bound.md
024 dict/set O(1) lookup vs list search Membership testing, filtering Hash-based containers for constant-time lookups 024-dict-set-lookup-vs-list.md
025 Memory-mapped files in Python Large file processing, log search mmap/np.memmap for lazy file access without full load 025-mmap-large-datasets-python.md
026 Local variable caching Hot loops, attribute access Cache globals/attributes as locals for faster bytecode 026-local-variable-caching.md
027 itertools for lazy pipelines Data processing, combinatorics C-speed lazy iterators for memory-efficient pipelines 027-itertools-lazy-pipelines.md
028 struct module for binary data Network protocols, file formats Pack/unpack binary records without object overhead 028-struct-binary-data-packing.md
029 Preallocation patterns List/array construction Preallocate at final size to avoid O(N²) resizing 029-preallocation-patterns.md
030 String join vs + concatenation String building, log formatting join() for O(N) string assembly vs O(N²) += 030-string-join-vs-concatenation.md
031 deque vs list for queue operations BFS, FIFO queues collections.deque for O(1) popleft vs list O(N) 031-deque-vs-list-queue-ops.md
032 array module for typed numerical data Compact storage, binary I/O array.array for 72% memory reduction vs list 032-array-module-typed-data.md
033 Cython for C-speed hot loops Custom metrics, graph traversal Typed Cython with memoryviews for 95x speedup 033-cython-c-speed-hot-loops.md
034 Optimizing label propagation in graph clustering Multilevel graph clustering, LP refinement Dense vectors, counting-sort contraction, sweep specialization, allocation elimination 034-graph-clustering-lp-refinement.md