Python code that builds large lists or arrays incrementally using .append() in a loop. The metric was construction time and peak memory. Python lists over-allocate by ~12.5% on growth and reallocate+copy on overflow. For NumPy arrays, calling np.append in a loop is O(N²) because it copies the entire array each time.
Preallocating the output container at its final size and filling it by index. This eliminates all intermediate reallocations and copies.
For building a 10M-element list, preallocation was 1.4x faster. For building a 10M-element NumPy array, preallocation was 150x faster (avoiding O(N²) copies from repeated np.append).
| Pattern | List 10M (ms) | ndarray 10M (ms) |
|---|---|---|
.append() loop |
680 | 75,000 (np.append) |
| Preallocated + index fill | 480 | 500 (np.empty + fill) |
| List comprehension | 420 | — |
import numpy as np
# === Lists ===
# OK but slow: append with dynamic growth
result = []
for i in range(10_000_000):
result.append(compute(i))
# BETTER: list comprehension (internally preallocates)
result = [compute(i) for i in range(10_000_000)]
# FAST: preallocate with known size
result = [None] * 10_000_000
for i in range(10_000_000):
result[i] = compute(i)
# === NumPy Arrays ===
# TERRIBLE: O(N²) — copies entire array on each append
arr = np.array([])
for i in range(10_000_000):
arr = np.append(arr, compute(i)) # copies all of arr every time!
# GOOD: preallocate and fill
arr = np.empty(10_000_000, dtype=np.float64)
for i in range(10_000_000):
arr[i] = compute(i)
# BEST: vectorize (when possible)
arr = compute_vectorized(np.arange(10_000_000))
# When final size is unknown: collect in list, convert once
parts = [compute(i) for i in range(n)]
arr = np.array(parts) # single allocation + copynp.zerosvsnp.empty:np.zerosinitializes memory to zero, adding overhead. Usenp.emptywhen every element will be written before being read. But beware:np.emptycontains garbage values — reading before writing is a bug.- List multiplication for mutable objects:
[[] for _ in range(N)]is correct;[[]] * Ncreates N references to the same list. Only use multiplication for immutable values.
Python 3.8+, NumPy 1.20+. List comprehensions are the idiomatic Python way; prefer them over manual preallocation for readability unless the loop body is complex.