Skip to content

feat: Add replicator DNS override support for outbound requests.#5983

Open
willholley wants to merge 7 commits intomainfrom
wh/connect_to
Open

feat: Add replicator DNS override support for outbound requests.#5983
willholley wants to merge 7 commits intomainfrom
wh/connect_to

Conversation

@willholley
Copy link
Copy Markdown
Member

@willholley willholley commented Apr 27, 2026

Overview

This adds a feature to the CouchDB replicator to override the DNS target for specific host
patterns (including wildcards) when making outbound requests. The use case is when requests need to be routed via a transparent SNI proxy e.g. for network egress monitoring. Common approaches to enable host overrides e.g. modifying /etc/hosts are not sufficient because they do not support wildcard routing - this feature avoids having to run a local DNS server such as CoreDNS to direct traffic through the proxy.

There is a new configuration option to specify the overrides:

[replicator]
dns_overrides = host:target, host2:target

The replicator resolves the configured host patterns to the alternative connection targets while
preserving the request URL host (applies to regular requests and session-auth requests).

Note this depends on the connect_to option in ibrowse, which is a custom feature in the CouchDB ibrowse fork.

Testing recommendations

Testing this is with TLS a bit involved as it relies on setting up an SNI proxy. I did it using nginx in docker with the configuration attached to proxy to a cloudant.com database. The proxy was running on a non-standard port (e.g. 8443) so that any replications connecting directly to cloudant.com would fail.

I then set dns_overrides = *.cloudant.com:127.0.0.1 in default.ini and configured a replication from myaccount.cloudant.com:8443/mydb. The test succeeds if the proxy logged the connection and the replication completed.

The feature also works without TLS - you can just use it to direct an arbitrary hostname to your local couchdb, for instance. e.g. if you use dns_overrides = *.cloudant.com:127.0.0.1 and couchdb is running on 127.0.0.1:15984, set up a replication with source or target as http://foo.cloudant.com:15984/db1 and it will be redirected to 127.0.0.1:15984/db1.

nginx.conf.zip

Related Issues or Pull Requests

Checklist

  • This is my own work, I did not use AI, LLM's or similar technology
  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • Documentation changes were made in the src/docs folder
  • Documentation changes were backported (separated PR) to affected branches

Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
@willholley willholley force-pushed the wh/connect_to branch 3 times, most recently from 6d51f89 to 13796f6 Compare April 28, 2026 20:37
Comment thread src/couch_replicator/test/eunit/couch_replicator_dns_tests.erl Outdated
Copy link
Copy Markdown
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice! I didn't get to play with it locally just did a quick look-over with some comments first

Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_auth_session.erl Outdated
Body = get_value(body, Params, []),

% Apply DNS override using connect_to ibrowse option
#url{host = Host, protocol = Protocol} = ibrowse_lib:parse_url(Url),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd be parsing the url on every request, I wonder if it would work to cache in the httpdb record the host already parsed? Or maybe we could cache the override ssl / connect_to settings in #httpd{}...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did wonder about the performance overhead but it seems negligible in the grand scheme of things? If you think it beneficial, I can explore it (may need guidance though!).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this and it seems like the complexity wasn't really worthwhile. The persistent term config cache gives a decent speedup.

Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
end.

-spec get_overrides() -> [dns_override()].
get_overrides() ->
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have to reparse everything on each resolve. We could use a persistent_term perhaps, but it would add a more code here...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit which adds a cache using a persistent_term.

Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_dns.erl
Comment thread src/couch_replicator/src/couch_replicator_dns.erl
Comment thread src/couch_replicator/src/couch_replicator_httpc.erl Outdated
resolve_host/1,
parse_config/1,
match_pattern/2,
get_overrides/0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exported for testing mostly? I don't think it's used otherwise, if so we could skip exporting

Opts =
case {Proto, OriginalHost} of
{https, OrigHost} when is_list(OrigHost) ->
case inet:is_ip_address(OrigHost) of
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_ip_address/1 takes an already parsed tuple only

We could use inet:parse_address/1

> inet:parse_address(string:trim("[::0]", both, "[]")).
{ok,{0,0,0,0,0,0,0,0}}

> inet:parse_address("[::0]").
{error,einval}

> inet:parse_address("::0").
{ok,{0,0,0,0,0,0,0,0}}

> inet:parse_address("127").
{ok,{0,0,0,127}}

> inet:parse_address("1.2.3.4").
{ok,{1,2,3,4}}

>inet:parse_address("a.c.d.com").
{error,einval}

Also ibrowse_lib:parse_url{} also return a host type

> ibrowse_lib:parse_url("http://127.0.0.1").
#url{abspath = "http://127.0.0.1",host = "127.0.0.1",
     port = 80,username = undefined,password = undefined,
     path = "/",protocol = http,host_type = ipv4_address}

> ibrowse_lib:parse_url("http://[::0]").
#url{abspath = "http://[::0]",host = "::0",port = 80,
     username = undefined,password = undefined,path = "/",
     protocol = http,host_type = ipv6_address}

> ibrowse_lib:parse_url("http://foo.bar.baz.com").
#url{abspath = "http://foo.bar.baz.com",
     host = "foo.bar.baz.com",port = 80,username = undefined,
     password = undefined,path = "/",protocol = http,
     host_type = hostname}
>

case binary:split(Entry, <<":">>) of
[Pattern0, Target0] ->
Pattern = string:trim(Pattern0),
Target = string:trim(Target0),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the IPv6 is passed in with brackets we should see if ibrowse knows how to connect to a bracketed address. It may have to be stripped of brackets and/or also parsed into an ipv6 address tuple

{<<"[", _/binary>>, _} ->
invalid_entry_reason(Entry, "IPv6 addresses cannot be used as patterns");
_ ->
{true, {Pattern, Target}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would *example:127.0.0.1 work or *:127.0.0.1 work?

Url = full_url(HttpDb, Params),
Body = get_value(body, Params, []),

% Apply DNS override using connect_to ibrowse option
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the "apply dns override" seems to be similar with that we do in auth_session? Wonder if a helper function here work and call that from auth_session, or some common utility library. If they diverge enough, it may not work though.

willholley added 5 commits May 4, 2026 20:00
This adds a feature to the CouchDB replicator
to override the DNS target for specific host
patterns (including wildcards) when making
outbound requests. The use case is when requests need
to be routed via a transparent SNI proxy e.g.
for network egress monitoring and specifying
overrides in /etc/hosts or similar isn't suffient
/ possible (e.g. due to lack of wildcard support).

There is adds a new configuration option to
specify the overrides:

```
[replicator]
dns_overrides = host:target, host2:target
```

The replicator resolves the configured host patterns
to the alternative connection targets while
preserving the request URL host (applies to
regular requests and session-auth requests).
 - Use inet:is_ip_address to detect whether the
   original target is an IP address. If it is, do
   not add the SNI header, since this is only
   valid for hostnames.
 - Remove unicode support.
 - Add support for IPv6 targets. This is a little
   awkward because IPv6 addresses use the same
   `:` delimiter as our config, so reqiure them
   to be bracketed.
 - Clarify documentation / default.ini examples
   around wildcard support.
Use inet:parse_address/1 to detect valid
IP addresses.
Extracts a helper function
`couch_replicator_dns:apply_dns_override/2` which
applies the `connect_to` ibrowse option and
optional SNI header to replication requests
in both `couch_replicator_auth_session` and
`couch_replicator_httpc`.
willholley added 2 commits May 5, 2026 13:59
Cached the parsed dns_overrides configuration.
I did consider adding per-connection caching as
well, but this seemed to be of limited benefit
in microbenchmarks. Adding the persistent_term
cache yields approx a 3x performance improvement
in resolution time (amounts to 150ms over
10k resolutions on my machine).
@willholley willholley marked this pull request as ready for review May 5, 2026 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants