Python programs serializing/deserializing large volumes of structured binary data (network packets, file formats, sensor readings) using Python objects. The metric was throughput (records per second). Converting between Python objects and binary formats using ad-hoc byte manipulation or JSON/pickle is slow and memory-heavy.
Using Python's struct module to pack/unpack fixed-format binary data directly. struct.pack/unpack translate between Python values and C-style binary representations in a single call, with minimal overhead.
For parsing 10M binary records (each: uint32 timestamp, float64 value, uint8 flags = 13 bytes), struct.unpack was 8.5x faster than equivalent manual byte slicing and 45x faster than JSON deserialization of the same data.
Key patterns:
struct.Structpre-compiled format: Compile the format string once, reuse theStructobject for repeated pack/unpack — avoids re-parsing the format string.struct.iter_unpack(Python 3.4+): Iterates over a buffer, unpacking repeated structures without Python loop overhead.
| Method | Records/sec (M) | Memory/record |
|---|---|---|
| JSON parse | 0.22 | ~400 bytes |
| Manual byte slicing | 1.18 | ~200 bytes |
| struct.unpack (per-record) | 4.80 | 13 bytes |
| struct.Struct.iter_unpack | 10.05 | 13 bytes |
import struct
# Define format once (uint32, float64, uint8)
record_fmt = struct.Struct('<I d B') # 13 bytes per record, little-endian
# Pack: Python values → bytes
data = record_fmt.pack(1679500000, 98.6, 0x01)
# Unpack single record
timestamp, value, flags = record_fmt.unpack(data)
# Bulk unpack: iterate over a large buffer with zero Python loop overhead
buffer = read_binary_file('sensor_data.bin') # 130MB = 10M records
for timestamp, value, flags in record_fmt.iter_unpack(buffer):
process(timestamp, value, flags)
# Even faster: unpack entire buffer into NumPy array
import numpy as np
dt = np.dtype([('timestamp', '<u4'), ('value', '<f8'), ('flags', '<u1')])
records = np.frombuffer(buffer, dtype=dt) # zero-copy view, instantPython 3.4+ (for iter_unpack). struct is a stdlib C extension. For maximum performance on large homogeneous data, np.frombuffer with a custom dtype is even faster than struct because it creates a zero-copy view.