Add compression to replicator#6013
Conversation
|
Can you please use our PR template? |
There was a problem hiding this comment.
Nice work @lacklacklack!
A few suggestions / ideas as bullet points:
-
We're trying to do bit of both request and response sides; let's keep it simpler at first and focus on just the request side (_bulk_docs and _revs_diff). Since we already have server side handling for gzip, let's handle gzip only for the request side. Then we can cleanly test the feature in CI without extra proxies or having to add server side compression sending here too.
-
Let's skip
deflateand use gzip only. But add the possibility to addzstdin the future (not this PR though) so keep the configurable compression algorithm. The reason to skip deflate is simplicity (our server handle gzip), and deflate is a bit of a mess according to https://en.wikipedia.org/wiki/HTTP_compression
Another problem found while deploying HTTP compression on large scale is due to the deflate encoding definition: while HTTP 1.1 defines the deflate encoding as data compressed with deflate (RFC 1951) inside a zlib formatted stream (RFC 1950), Microsoft server and client products historically implemented it as a "raw" deflated stream making its deployment unreliable. For this reason, some software, including the Apache HTTP Server, only implements gzip encoding.
-
As it stands the most important thing to compress (_bulk_docs body) won't actually be compressed. The body there isn't an iolist or binary but
{BodyFun, [prefix | Docs]}}. So that makes me think maybe a better place for this is not in httpc but in api_wrap -
Do not set
AcceptEncodings = config:get("replicator", "accept_encodings", "gzip, deflate, zstd")unless we can always handle these responses and decompress them. If the server then sends us zstd data and we're on OTP 27 we won't be able to handle it and fail the request. For this pr let's just skip setting that altogether -
Do not enable gzip compression by default. Since that is not a negotiated setting, if the replicator was talking to an older CouchDB or other server not implementing gzip decompression we'd break a customers' setup as soon as they upgrade.
-
Don't forget to fill out the template like Jan suggested
Adds configurable HTTP compression (gzip/deflate) to reduce bandwidth during replication.