Tunnel Mode Overview

SandD supports secure tunnel mode for production deployments using mesh VPN technology.

When to Use Tunnel Mode

Use tunnel mode when:

Deploying across multiple clouds (AWS + GCP + Azure)
Controller should not be publicly accessible
Need automatic NAT traversal
Want network-level isolation

Use direct mode when:

Single datacenter / trusted network
Development and testing
Quick prototyping

Direct Mode vs Tunnel Mode (VPN)

Visual Comparison

Direct Mode (No VPN):

┌──────────┐                        ┌──────────┐
│ Daemon   │──── WebSocket over ───→│Controller│
│          │     public internet    │Public IP │
└──────────┘                        └──────────┘

- Direct WebSocket connection
- No VPN
- Controller needs public IP
- Daemons connect over internet

Tunnel Mode (Mesh VPN):

┌──────────┐                        ┌──────────┐
│ Daemon   │════ VPN tunnel ════════│Controller│
│ Mesh IP  │  WireGuard encrypted   │ Mesh IP  │
└──────────┘                        └──────────┘
     ↓                                   ↓
  Join VPN                            Join VPN
     ↓                                   ↓
┌────────────────────────────────────────────┐
│      Headscale (VPN coordinator)           │
└────────────────────────────────────────────┘

- VPN mesh network
- Encrypted tunnels between nodes
- Private mesh IPs
- No public IPs needed

Feature Comparison

Feature	Direct Mode	Tunnel Mode (VPN)
Setup complexity	Simple (5 min)	Medium (15 min)
Controller IP	Must be public	Can be private
Daemon location	Anywhere (outbound)	Anywhere (mesh)
NAT traversal	Manual (firewall rules)	Automatic (hole punching)
Encryption	Need to add TLS	Built-in (WireGuard)
Port exposure	Public (attack surface)	Hidden (mesh only)
Multi-cloud	Need VPC peering	Works automatically
Use case	Single cloud/datacenter	Cross-cloud, laptop↔cloud

When to Use Each

Use Direct Mode when:

✅ Controller has stable public IP
✅ Single cloud or trusted network
✅ Development and testing
✅ Simple setup preferred

Use Tunnel Mode (VPN) when:

✅ Controller behind NAT (laptop, home, corporate)
✅ Multiple clouds (AWS + GCP + Azure)
✅ Don't want exposed ports
✅ Need encrypted communication
✅ Dynamic IPs or ephemeral instances

How Tunnel Mode Works

The Problem: NAT and Private Networks

Why you can't connect directly:

Laptop (Controller)          Cloud VM (Daemon)
Private: 192.168.1.100       Private: 10.0.1.20
Behind home router           Behind cloud firewall

❌ Can't reach each other's private IPs
❌ Need to expose public ports (security risk)
❌ Need VPN peering between networks (complex)

The Solution: Four Components Working Together

Secure mesh requires ALL four pieces:

┌────────────────────────────────────────┐
│ 1. Coordination (Headscale)            │
│    "Who can join? Where are they?"     │
│    → Authentication & peer discovery   │
└────────────────────────────────────────┘
                 +
┌────────────────────────────────────────┐
│ 2. NAT Traversal (Hole Punching)       │
│    "How do I reach you behind NAT?"    │
│    → Makes devices reachable           │
└────────────────────────────────────────┘
                 +
┌────────────────────────────────────────┐
│ 3. Encryption (WireGuard)              │
│    "How do I protect the data?"        │
│    → Confidentiality & integrity       │
└────────────────────────────────────────┘
                 +
┌────────────────────────────────────────┐
│ 4. Identity (Cryptographic Keys)       │
│    "How do I verify who you are?"      │
│    → Node authentication               │
└────────────────────────────────────────┘
                 =
         Secure Mesh Network

What each component does:

Component	Problem Solved	Without It
Headscale	Who's allowed? Where are peers?	Can't find each other
Hole Punching	How to reach through NAT?	Can't connect
WireGuard	How to protect data?	Traffic readable
Keys	How to verify identity?	Anyone can impersonate

Step-by-Step: How Connection Happens

1. Both sides connect OUT to Headscale

┌──────────────────────────────┐
│ Headscale (Public)           │
│ 203.0.113.100:8080           │
└──────────────────────────────┘
     ↑                    ↑
     │ Outbound ✓         │ Outbound ✓
     │ (firewalls allow)  │
┌────┴─────┐         ┌────┴─────┐
│ Laptop   │         │ Cloud VM │
│ NAT hole │         │ NAT hole │
│ created  │         │ created  │
└──────────┘         └──────────┘

2. Headscale learns each node's "hole"

Laptop connects → Headscale sees: 203.0.113.50:60001
VM connects → Headscale sees: 198.51.100.25:41234

Headscale tells each about the other:
→ Laptop: "VM is at 198.51.100.25:41234"
→ VM: "Laptop is at 203.0.113.50:60001"

3. Nodes punch holes simultaneously

Both send packets at same time:
→ Laptop sends to VM's address
→ VM sends to Laptop's address

NATs see outbound packets, allow replies
Result: Direct encrypted tunnel! ✓

4. WireGuard encrypts all traffic

Every packet encrypted with:
- ChaCha20-Poly1305 (cipher)
- Curve25519 (key exchange)
- Authentication tags

Even if intercepted: unreadable gibberish

Architecture

Components

Headscale (Server)

Coordination server for mesh network
Runs separately (single instance for entire mesh)
Issues keys, manages peer discovery

Tailscale Client

VPN client that connects to Headscale
Runs in each container (installed via hack/docker/Dockerfile.tunnel)
Joins the mesh, creates tunnel interface

Your Application = Controller

When you call Server(), you ARE the controller
It starts a WebSocket server that daemons connect to
In tunnel mode, your app needs Tailscale to join the mesh

Direct Mode

Daemon → Internet → Controller (public IP:8765)

Tunnel Mode

┌─────────────────────────────────────────────────────────┐
│                    Headscale Server                     │
│                  (runs once, centrally)                 │
└─────────────────────────────────────────────────────────┘
           ↑                                   ↑
           │                                   │
┌──────────▼──────────────────────┐  ┌─────────▼─────────┐
│  Your Application (Controller)  │  │  Daemon           │
│                                 │  │  (worker)         │
│  Server() starts WebSocket srv  │  │                   │
│  (Tailscale client)             │  │ (Tailscale client)│
│  10.200.0.1                     │  │  10.200.0.2       │
└─────────────────────────────────┘  └───────────────────┘
                   Private Mesh Network

Key: hack/docker/Dockerfile.tunnel installs Tailscale client (not Headscale server). Headscale runs separately.

Using Tunnel Mode

In Your Application

from sandd import Server, TunnelConfig

# Direct mode (default)
server = Server()

# Tunnel mode
config = TunnelConfig(
    authkey="your-headscale-preauth-key",
    server="http://headscale:8080"
)
server = Server(connect="tunnel", tunnel_config=config)

Docker Image

Use the tunnel-enabled image. Build it yourself like this:

docker build -f hack/docker/Dockerfile.tunnel -t my-app:tunnel .

Running

# Your app code contains TunnelConfig with auth key and server URL
docker run \
  --cap-add NET_ADMIN \
  --device /dev/net/tun \
  my-app:tunnel

Setup Steps

1. Build Tunnel Image

# From SandD repo
docker build -f hack/docker/Dockerfile.tunnel -t inftyai/sandd-server:latest-tunnel .

2. Run Headscale

docker run -d \
  -p 8080:8080 \
  -v headscale-data:/var/lib/headscale \
  headscale/headscale:latest serve

3. Generate Auth Keys

# Create user
docker exec headscale headscale users create sandd

# Generate keys (save this!)
docker exec headscale headscale preauthkeys create --user sandd --expiration 24h
# Output: key-abc123def456...

4. Write Your Controller App

# controller.py
from sandd import Server, TunnelConfig
import time

config = TunnelConfig(
    authkey="key-abc123def456",  # From step 3
    server="http://headscale:8080"
)

server = Server(connect="tunnel", tunnel_config=config)
print("Controller ready, waiting for daemons...")

while True:
    daemons = server.list_daemons()
    print(f"Connected: {len(daemons)}")
    time.sleep(5)

5. Run Your Controller

docker run \
  --cap-add NET_ADMIN \
  --device /dev/net/tun \
  -v $(pwd)/controller.py:/app/controller.py \
  inftyai/sandd-server:latest-tunnel \
  python /app/controller.py

Complete Example

See examples/tunnel-simple/ for a working docker-compose setup.

cd examples/tunnel-simple
docker-compose up

Communication Flow

Controller (Your App)

Container launches
Server(connect="tunnel", tunnel_config=config) called
Controller automatically starts Tailscale and joins mesh
Gets mesh IP (10.200.0.1)
WebSocket server starts on 10.200.0.1:8765

Daemon

Run with --tunnel flag
sandd automatically starts Tailscale and joins mesh
Gets mesh IP (10.200.0.2)
Connects to controller at ws://10.200.0.1:8765/ws
Ready to execute commands

One command:

sandd --server-url ws://10.200.0.1:8765/ws \
      --daemon-id worker-1 \
      --tunnel \
      --tunnel-authkey YOUR_KEY \
      --tunnel-server http://headscale:8080

Security Model

What's Protected

✅ Data in Transit

All traffic encrypted with WireGuard
ChaCha20-Poly1305 cipher (military-grade)
Perfect forward secrecy

✅ Authentication

Pre-auth keys control mesh access
Public key cryptography (Curve25519)
Each node has unique identity

✅ Network Isolation

Ports not exposed to internet
Only mesh nodes can communicate
Automatic NAT traversal (no manual firewall rules)

Key Types and Security

1. Auth Key (Pre-Auth Key)

# Single-use (recommended)
headscale preauthkeys create --user sandd --expiration 1h

# Each node gets unique key
# Expires after first use

If leaked: Attacker can join mesh ❌

Protection:

Use single-use keys
Short expiration (1-24h)
Rotate regularly
Never commit to git

2. WireGuard Private Key

Stored: /var/lib/tailscale/tailscaled.state

If leaked: Attacker can decrypt all traffic to/from that node ❌

Protection:

# File permissions
chmod 600 /var/lib/tailscale/tailscaled.state

# Docker: use named volumes
volumes:
  - tailscale-state:/var/lib/tailscale

3. Shared Secret

How it works: Computed from your private key + peer's public key

Security: Never transmitted, only exists in RAM ✓

Comparison with Other Approaches

Security Aspect	Plain ws://	wss:// (TLS)	Tailscale
Encryption	❌ None	✅ TLS 1.3	✅ WireGuard
Authentication	Manual	SSL certs	✅ Built-in
Port exposure	❌ Public	❌ Public	✅ Hidden
NAT traversal	Manual	Manual	✅ Automatic
Setup complexity	Simple	Medium (certs)	Medium (Headscale)
Zero-trust	❌	⚠️ CA-based	✅ Crypto keys

Attack Scenarios and Mitigations

Scenario 1: Auth Key Leaked

Impact: Attacker joins mesh, accesses services

Mitigation:
1. Revoke compromised key
   headscale preauthkeys expire --prefix tskey-abc

2. Remove unauthorized nodes
   headscale nodes list
   headscale nodes delete --identifier <id>

3. Generate new keys
4. Update all legitimate nodes

Scenario 2: Node Compromised (Root Access)

Impact: Attacker steals WireGuard key, decrypts traffic

Mitigation:
1. Remove node from mesh
   headscale nodes delete --identifier <id>

2. Delete state file on node
   rm -rf /var/lib/tailscale/tailscaled.state

3. Investigate compromise
4. Rejoin with new keys

Scenario 3: Headscale Server Compromised

Impact:
- Can see who's connected (metadata)
- Cannot decrypt traffic (end-to-end encrypted)

Mitigation:
- Headscale doesn't store private keys
- Data never decrypted at coordinator
- Limit: Can kick nodes off, but can't read data

Best Practices

Key Management:

# ✅ DO: Single-use, short-lived
headscale preauthkeys create --expiration 1h

# ❌ DON'T: Reusable, long-lived
headscale preauthkeys create --reusable --expiration 8760h

Secrets Storage:

# ✅ DO: Use secrets management
export KEY=$(vault read -field=key secret/sandd)

# ❌ DON'T: Hardcode in files
SANDD_TUNNEL_AUTH_KEY=tskey-abc123  # Never commit!

Monitoring:

# Check for unauthorized nodes
headscale nodes list --output json | \
  jq '.[] | select(.created > "2024-01-01")'

FAQ

Q: Is hole punching safe? A: Yes. Hole punching only finds the network path. All data is encrypted with WireGuard. Think of it like finding a road (hole punching) vs using an armored truck (encryption).

Q: Why not just use WebSocket with TLS? A: WebSocket needs a public IP and open ports. Tailscale works when controller is behind NAT (laptop, private cloud) and provides automatic encryption.

Q: Can Headscale read my data? A: No. Headscale only coordinates connections. Data is encrypted end-to-end between nodes. Headscale never sees decrypted traffic.

Q: What if my auth key leaks? A: Attacker can join your mesh. Use single-use keys and revoke immediately if leaked. See Security Model section.

Q: Why not install Headscale in my container? A: Headscale is a coordination server - you only need one for the entire mesh. Like DNS: one server, many clients.

Q: What's in hack/docker/Dockerfile.tunnel? A: Python 3.11, SandD library, and Tailscale client (not Headscale server).

Q: Do I need NET_ADMIN? A: Yes. VPN requires --cap-add NET_ADMIN --device /dev/net/tun

Troubleshooting

Check tunnel status

# Inside container
docker exec <container> tailscale status
docker exec <container> tailscale ip

Permission denied

Ensure container has required capabilities:

--cap-add NET_ADMIN --device /dev/net/tun

Can't reach controller

Verify mesh IP:

docker exec controller tailscale ip -4
# Use this IP in CONTROLLER_URL

FilesExpand file tree

TUNNEL.md

Latest commit

History