Knowledge Base Index

Optimization techniques, experiment results, and lessons learned from AAE sessions contributed by the community. Agents should read this index to find entries relevant to their current problem, then read the linked files for details.

#	Title	Problem Domain	Key Technique	File
001	Cache-friendly blocked recursive sorting	Sorting, large arrays	Blocked partitioning with cache-line-sized buffers	001-blocked-recursive-sorting.md
002	Gradient accumulation as batch size proxy	Transformer training, limited VRAM	Gradient accumulation to simulate large batches	002-batch-size-tuning-transformer.md
003	Arena allocation for JSON parsing	Parsing, memory allocation	Arena allocator to eliminate per-node heap allocation	003-arena-allocation-json-parsing.md
004	SoA vs AoS for cache efficiency	Data layout, particle simulation	Structure of Arrays to maximize cache line utilization	004-soa-vs-aos-cache-efficiency.md
005	Branchless programming	Array processing, partitioning	Arithmetic substitution for conditional branches	005-branchless-programming.md
006	Small buffer optimization	String/container allocation	Inline buffer to avoid heap allocation for small objects	006-small-buffer-optimization.md
007	Move semantics avoiding deep copies	Container transfers, pipelines	RVO/NRVO and std::move for O(1) ownership transfer	007-move-semantics-avoiding-copies.md
008	constexpr compile-time computation	Lookup tables, constants	Compile-time evaluation to eliminate runtime initialization	008-constexpr-compile-time-computation.md
009	False sharing avoidance	Multithreaded counters, parallel scaling	Cache line padding to prevent cross-core invalidation	009-false-sharing-avoidance.md
010	PGO + LTO compiler optimizations	Whole-program optimization	Profile-guided and link-time optimization for 20%+ gains	010-pgo-lto-compiler-optimizations.md
011	Memory-mapped I/O for large files	File processing, large datasets	mmap for zero-copy lazy file access	011-mmap-large-file-processing.md
012	Open-addressing hash maps	Lookup-heavy workloads	Flat hash maps (absl/robin_hood) vs std::unordered_map	012-open-addressing-hash-maps.md
013	SIMD vectorization for batch operations	Array math, distance computation	Explicit AVX2/SSE intrinsics for 4-16x element parallelism	013-simd-vectorization-batch-ops.md
014	Loop tiling for matrix operations	Matrix multiply, cache thrashing	Blocking iteration into cache-resident tiles	014-loop-tiling-matrix-operations.md
015	Compiler intrinsics for bit operations	Popcount, clz/ctz, Hamming distance	Hardware bit instructions via builtins/C++20	015-builtin-intrinsics-bit-operations.md
016	Reserve and preallocate containers	STL containers, bulk insertion	reserve() to eliminate reallocations and copies	016-reserve-preallocate-containers.md
017	Hot-cold data splitting	Routing tables, large structs	Separate hot fields into dense array for cache density	017-hot-cold-data-splitting.md
018	std::string_view avoiding copies	Text parsing, function parameters	Non-owning string references to eliminate allocations	018-string-view-avoiding-copies.md
019	NumPy vectorization over Python loops	Array math, distance computation	Vectorized NumPy ops to bypass interpreter overhead	019-numpy-vectorization-over-loops.md
020	slots for memory reduction	Object-heavy programs, graphs	Eliminate per-instance dict for 69% memory savings	020-slots-memory-reduction.md
021	Generator expressions for memory efficiency	Data pipelines, streaming	Lazy evaluation with O(1) memory per pipeline stage	021-generator-expressions-memory.md
022	Numba JIT for numerical computation	Monte Carlo, simulations	LLVM JIT compilation for C-speed Python loops	022-numba-jit-numerical-computation.md
023	multiprocessing for CPU-bound work	Image processing, batch computation	OS processes to bypass GIL for true parallelism	023-multiprocessing-cpu-bound.md
024	dict/set O(1) lookup vs list search	Membership testing, filtering	Hash-based containers for constant-time lookups	024-dict-set-lookup-vs-list.md
025	Memory-mapped files in Python	Large file processing, log search	mmap/np.memmap for lazy file access without full load	025-mmap-large-datasets-python.md
026	Local variable caching	Hot loops, attribute access	Cache globals/attributes as locals for faster bytecode	026-local-variable-caching.md
027	itertools for lazy pipelines	Data processing, combinatorics	C-speed lazy iterators for memory-efficient pipelines	027-itertools-lazy-pipelines.md
028	struct module for binary data	Network protocols, file formats	Pack/unpack binary records without object overhead	028-struct-binary-data-packing.md
029	Preallocation patterns	List/array construction	Preallocate at final size to avoid O(N²) resizing	029-preallocation-patterns.md
030	String join vs + concatenation	String building, log formatting	join() for O(N) string assembly vs O(N²) +=	030-string-join-vs-concatenation.md
031	deque vs list for queue operations	BFS, FIFO queues	collections.deque for O(1) popleft vs list O(N)	031-deque-vs-list-queue-ops.md
032	array module for typed numerical data	Compact storage, binary I/O	array.array for 72% memory reduction vs list	032-array-module-typed-data.md
033	Cython for C-speed hot loops	Custom metrics, graph traversal	Typed Cython with memoryviews for 95x speedup	033-cython-c-speed-hot-loops.md
034	Optimizing label propagation in graph clustering	Multilevel graph clustering, LP refinement	Dense vectors, counting-sort contraction, sweep specialization, allocation elimination	034-graph-clustering-lp-refinement.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Knowledge Base Index

FilesExpand file tree

INDEX.md

Latest commit

History

INDEX.md

File metadata and controls

Knowledge Base Index