diff --git a/static/img/k8s-arch.png b/static/img/k8s-arch.png
new file mode 100644
index 000000000..a6b33ac8d
Binary files /dev/null and b/static/img/k8s-arch.png differ
diff --git a/versioned_docs/version-4.0.0/keploy-cloud/kubernetes.md b/versioned_docs/version-4.0.0/keploy-cloud/kubernetes.md
index ab9d539c9..e5414384f 100644
--- a/versioned_docs/version-4.0.0/keploy-cloud/kubernetes.md
+++ b/versioned_docs/version-4.0.0/keploy-cloud/kubernetes.md
@@ -37,6 +37,10 @@ This guide walks you through creating a local **Kind** Kubernetes cluster and co
 > [!NOTE]
 > This documentation covers **local Kind cluster** setup. Documentation for **hosted cluster** setup is coming soon.
 
+## How Keploy works on Kubernetes
+
+![How Keploy works on Kubernetes architecture overview](/img/k8s-arch.png)
+
 ---
 
 ## Prerequisites
diff --git a/versioned_docs/version-4.0.0/running-keploy/k8s-proxy-api.md b/versioned_docs/version-4.0.0/running-keploy/k8s-proxy-api.md
index ff8fc518c..752182a72 100644
--- a/versioned_docs/version-4.0.0/running-keploy/k8s-proxy-api.md
+++ b/versioned_docs/version-4.0.0/running-keploy/k8s-proxy-api.md
@@ -48,27 +48,100 @@ The same `/record/start`, `/record/stop`, `/test/start`, `/deployments`, and rep
 
 ---
 
-## Why the Kubernetes Proxy instead of `keploy enterprise` directly?
+## Why the Kubernetes Proxy instead of keploy enterprise directly?
 
-Running the Keploy enterprise CLI inside a Pod works, but it is a per-app, per-node model: each Deployment you want to record needs its own sidecar plumbing, image rebuild, or pod restart. The Kubernetes Proxy removes that friction:
+Running the Keploy enterprise CLI inside a Pod works, but it is a per-app, per-node model: each Deployment you want to record needs its own sidecar plumbing, image rebuild, or pod restart. The Kubernetes Proxy is a single in-cluster control plane that turns _record-and-replay_ into a few API calls, and layers on top of that a set of capabilities you do not get when you run the agent on its own. The benefits below are the reason teams pick the proxy over wiring the CLI in by hand.
 
-- **Zero-touch agent setup.** The proxy registers a `MutatingAdmissionWebhook` (`/mutate`) so the Keploy recording agent is injected into target Pods on the next rollout. No image rebuild, sidecar template change, or per-app config knob is required.
-- **One API for every Deployment.** A single shared-token-authenticated endpoint starts or stops recording for any Deployment in the watched scope. `podsCount` controls how many pods are recorded and is capped by the Deployment replicas or HPA max replicas.
-- **Cluster-wide or namespace-scoped.** Install once per cluster, or set `watchNamespace` to pin the proxy to a single team's namespace. Cross-namespace calls are rejected with `403`.
-- **Stored session outputs.** Recording, replay, and schema-coverage outputs are persisted through the configured platform storage. Per-session and proxy logs are available through the log endpoints when log retention/support-bundle storage is enabled.
-- **Auto-replay loop.** A recording session can kick off an auto-replay on a cadence (`autoReplayInterval`) against freshly recorded test sets, giving you self-validating live traffic without a separate pipeline.
-- **Self-updating.** The proxy can roll itself (and the injected agent) forward via `POST /proxy/update`, so upgrades do not require kubectl or a GitOps round-trip—unless you _want_ GitOps to stay authoritative (the proxy detects and reports reverts).
-- **Static deduplication at the edge.** Enable `static_dedup` in the recording config to drop schema-identical traffic _before_ it is ever written as a test case. See [Static Deduplication](/docs/keploy-cloud/static-deduplication/).
+### 1. Auto-replay
+
+The proxy auto-replays captured traffic against a fresh Pod and reports back which captures behave as deterministic tests. This happens continuously while you record (every `autoReplayInterval` minutes, default 5) and once more, against any trailing captures, the moment you call `POST /record/stop`. The whole loop collapses the "record now, find out tomorrow which tests are flaky" wait into the recording session itself.
+
+This is also why replay belongs in the Kubernetes Proxy instead of a one-off CLI run: the proxy has the cluster, Deployment, release, and recording-session context needed to make replay release-aware. For a new-release flow, the proxy layer is designed to coordinate replaying the traffic captured for the new release together with the historical traffic the service has recorded before, so a release is checked against both the latest behavior and the behavior users already depended on. When the smart test set gets shipped, that historical input becomes the curated smart set: newly recorded traffic plus the latest duplicate-free version of the service's long-lived test coverage.
+
+Each test case is exercised once against a freshly rolled Pod and classified into one of three buckets:
+
+- **Pass:** kept as a real, stable test case.
+- **High-risk failure:** marked as failed (a real regression to investigate).
+- **Low-risk failure with extractable noise:** marked as noisy and kept in the test set but excluded from failure counts. These are typically captures whose only diff is a timestamp, request-ID, or generated UUID. If the proxy cannot extract the noisy fields, the capture is kept as a failed test case so it can be investigated.
+
+While the proxy is at it, mocks the test case did not actually need are pruned. So even if the original Pod made 200 dependency calls per request, the final test case only carries the mocks it depended on, and your test sets stay tiny.
+
+The cadence is configured via `autoReplayInterval` (see [Auto-replay configuration](#auto-replay-configuration) below). When auto-replay runs as part of a re-record job, the proxy first asks the API server which suites passed last time. Replay still runs against the recorded test sets, but suite linking is gated so only suites that passed upstream and pass the fresh replay are linked forward.
+
+### 2. Deduplication
+
+A naive recorder turns a load test of `GET /users/42` into 50,000 identical test cases. Keploy's deduplication keeps the canonical capture, counts the rest, and drops them, so a real test set comes out of even a noisy production traffic sample.
+
+Enable per-recording with `record_config.static_dedup`, and optionally narrow the dedup key per endpoint with `record_config.custom_dedup_fields`, which declares which JSON paths in the request body, plus method/path/status, define "the same test." The agent enforces this _at capture time_ before anything is written to storage, and per-pod dedup stats stream back into the recording status endpoint so you can watch duplicates being dropped live. See [Static Deduplication](/docs/keploy-cloud/static-deduplication/) for the full configuration reference.
+
+### 3. REST API _and_ MCP server
+
+Keploy exposes automation surfaces at two layers:
+
+- The **Kubernetes Proxy REST API** ([Endpoint reference](#endpoint-reference)) handles in-cluster operations such as starting/stopping recording, kicking off replay, fetching session status, and reading logs or reports. All routes outside `/healthz` and the admission webhook sit behind shared-token Bearer auth.
+- The **Keploy API server MCP endpoint** exposes higher-level tools for AI coding tools, including Claude Code, Cursor, Windsurf, and VS Code. This is how an AI agent in your editor authors test suites, runs replays, and scaffolds CI pipelines without you copy-pasting curl commands.
+
+The MCP surface includes around a dozen tools. The headline ones:
+
+- `generate_and_wait`: build a suite from an OpenAPI spec.
+- `run_and_report`: run a suite and return failures + coverage.
+- `get_coverage_gaps`: list which endpoints lack test coverage.
+- `create_test_suite` / `update_test_suite`: programmatically author and validate suites. Writes are gated through Keploy's own branching model (parallel to git), so AI agents can iterate without polluting `main`.
+- `start_rerecord_session` / `start_integration_test_session`: kick off a sandbox session locally.
+- `scaffold_pipeline_workflow`: generate a CI workflow file (covered in benefit 6).
+
+A non-obvious detail: when an AI agent authors a test suite that mutates state (POST/PUT/PATCH), the MCP refuses to insert it unless every mutating step's body references at least one per-run dynamic variable, and the rejection error _names_ the dynamic variables already in scope. The result is suites that survive a second run by construction. They are authored and verified to be safe to retry.
+
+### 4. Schema Generation and Management
+
+**Schema generation and per-release storage.** Recording produces a free OpenAPI 3.0.3 schema as a side effect. The proxy infers it from the captured traffic, so the spec reflects the requests and responses Keploy actually observed during that recording. Schema records store namespace, deployment, app name, app release, cluster name, and tenant metadata; release-specific versions are selected by `appRelease`, which is typically your image tag or git SHA. That lets `orders-api@v1.4.2` and `orders-api@v1.4.3` be stored as separate release snapshots instead of overwriting each other. Coverage reporting layered on top tells you which endpoints have been exercised, and how deeply. Endpoints are documented under [Reports and schema coverage](#reports-and-schema-coverage).
+
+**Schema-conflict detection during auto-replay.** During auto-replay, the proxy fetches the latest stored schema, generates a schema from the recorded test cases, and merges the two. If replay produces failed test-case details, the report records a schema conflict and stores the new schema accordingly. Compatible same-release additions update the existing schema; new releases or replay-detected conflicts insert a new schema document.
+
+### 5. Smart test set (Upcoming)
+
+The direct user benefit is simple: this layer is meant to maintain a smart, replay-ready test set for the service instead of leaving you to manage scattered recording sessions by hand. The goal is to keep the latest useful, duplicate-free version of the behavior your service has recorded over time. When you add an API endpoint, the new behavior can be folded in; when you delete an endpoint, stale coverage can be removed; when you change a request or response shape, the latest captured version can replace the old one. Future release replays can then run the newly recorded traffic plus this smart test set instead of every duplicate capture ever seen, which keeps replay fast, efficient, and easier to trust as a CI gate.
+
+Current auto-replay already performs the in-session curation work:
+
+- **Cross-pod uniqueness within a session.** When a Deployment with `replicas=5` records into the same session, each pod's local `test-N` counter does not collide with any other pod's. The proxy keeps them distinct so you don't end up with five different captures all named `test-1`.
+- **Noise vs. failure separation.** During auto-replay, captures with extractable timestamp/UUID-style diffs are tagged as noisy and kept in the test set (excluded from failure counts), while real regressions and low-risk captures without extractable noise are tagged as failures. The noise tag itself is useful information because it tells later replays which fields to ignore for that endpoint.
+- **Fresh-capture curation.** Current auto-replay curates the test sets produced by the active recording session. Historical-testset consolidation support exists in the codebase, but it is not active in the current record/start path.
+
+Combined with capture-time static deduplication (benefit 2), this keeps the current replay set small, stable, and CI-gateable even when the underlying traffic is noisy.
+
+### 6. Local CI replay
+
+> **Status:**
+>
+> - **The replay step itself:** Shipped for any CI provider. It is invoked through the `keploy test sandbox` CLI, which runs unchanged on GitHub Actions, GitLab CI, CircleCI, Jenkins, Bitbucket Pipelines, Azure Pipelines, or a self-hosted runner.
+> - **Auto-generated workflow YAML:** Shipped for **GitHub Actions** via the MCP `scaffold_pipeline_workflow` tool. Native scaffolds for other CI providers are upcoming. Until then, the GitHub Actions workflow can be hand-ported because the replay command and its flags are identical across providers, and only the surrounding CI syntax changes.
+
+Local CI replay runs your Keploy test suites against a fresh build of your service _inside the CI runner_, on every pull request. Docker Compose brings the service's dependencies up; the Keploy enterprise CLI starts the service itself under instrumentation, replays each recorded request, serves the recorded mocks for every outbound dependency call, and byte-compares the live response against the captured one. The aggregated pass/fail becomes the PR gate.
+
+The "local" qualifier distinguishes this path from the SaaS replay path (`run_and_report`), which targets a publicly reachable URL such as staging and rejects local-only URLs. Local CI replay targets `http://localhost:$APP_PORT` inside the runner, so the thing under test is the code on the pull request branch, not staging.
+
+The generated workflow performs the following steps:
+
+1. Checks out the repository and installs the Keploy enterprise CLI.
+2. Brings the Docker Compose stack up with `docker compose up -d --wait`, then stops and removes the application service so the CLI can start it under instrumentation.
+3. Runs `keploy test sandbox` with the appropriate flags, including `--create-branch "${{ github.head_ref }}"`. Keploy branches use find-or-create semantics: the first run on a pull request creates a Keploy branch named after the git branch, and subsequent retries reuse it. The workflow is therefore idempotent across force-pushes.
+4. Uploads `keploy/reports` as a workflow artifact on every run, including failures.
+5. Dumps Docker Compose logs on failure.
+6. Tears the Compose stack down.
+
+The pre-flight check counts how many sandbox suites are linked to the app. When zero suites are linked, the scaffold response warns you before you add the workflow; the CLI run itself expects linked sandbox suites and will fail until suites are created. The generated YAML is annotated `# Auto-generated by keploy scaffold_pipeline_workflow: edit freely` and can be modified or extended without losing the ability to regenerate.
 
 ---
 
 ## Authentication
 
-:::info In development
-The authentication flow is currently in development.
-:::
+Authenticating to the proxy is a **two-step exchange**:
+
+1. Create a **Personal Access Token (PAT)** in the Keploy Console.
+2. `POST /get-shared-token` with that PAT to receive the cluster's **shared token**.
 
-Every protected proxy endpoint requires the cluster **shared token**. Send it as a Bearer token:
+Every other protected route on the proxy is then gated on the shared token sent as a Bearer header:
 
 ```text
 Authorization: Bearer <K8S_PROXY_SHARED_TOKEN>
@@ -82,70 +155,69 @@ curl -sf https://$PROXY/healthz
 # {"status":"ok"}
 ```
 
-### How the token is provisioned
+### Why a two-step exchange?
 
-The shared token is generated **at Helm install time** and stored as a Kubernetes Secret named `<release>-shared-token` in the proxy's namespace. The chart's pre-render step uses Helm's `randAlphaNum 48` to produce the value on the very first install and a `lookup` + `helm.sh/resource-policy: keep` annotation to preserve it across upgrades, so the token is **stable for the lifetime of the release**—Pod restarts and chart upgrades do not rotate it.
+The PAT identifies a specific user; the shared token authorizes calls against a specific cluster. Splitting the two means callers (CI scripts, AI agents, internal tooling) only ever hold a short-scoped credential they can rotate per-user from the Console, while the proxy itself is the only thing that ever sees the cluster-wide shared token. CI/CD pipelines never need `kubectl` access to the proxy's namespace and never need an interactive user login — they store one PAT and exchange it on every run.
 
-The k8s-proxy Deployment and the per-node DaemonSet both mount the Secret as the `KEPLOY_SHARED_TOKEN` env var via `secretKeyRef`. On startup the proxy reports the value to the Keploy API server in its first heartbeat (`POST /cluster/status`) so the Console can display it under the cluster's app entries.
+### 1. Issue a PAT
 
-For local/dev runs without a Secret, if `KEPLOY_SHARED_TOKEN` is unset the proxy falls back to generating a random 32-byte value via `crypto/rand` (hex-encoded). This fallback is fresh on every restart and is **not** the path used in any Helm-managed deployment.
+In the Keploy Console, open **Settings → Personal Access Tokens** and click **Create token**. PATs are 47-character strings prefixed with `kep_`.
 
-### Retrieve the token
+- The PAT must belong to the same tenant (`cid`) as the cluster the proxy is registered to. The proxy will reject cross-tenant PATs with `403 Forbidden`.
+- Treat the PAT like a password — it is the long-lived credential. Store it in your CI provider's secret store, not in the repo.
+- A user can have multiple PATs. Revoke or rotate them from the same Console screen; revoked PATs stop working immediately.
 
-Two equally valid paths.
-
-**(a) Read it directly from the Secret** if you have `kubectl` access to the proxy namespace:
+### 2. Exchange the PAT for the shared token
 
 ```bash
-kubectl -n keploy get secret <release>-shared-token -o jsonpath='{.data.token}' | base64 -d
-```
+PROXY="https://your-proxy-ingress"
+PAT="kep_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
 
-**(b) Fetch it from the Keploy API server**, which mirrors what the proxy reported in its last heartbeat. Log in once to obtain a user JWT, then look up the proxy app for the Deployment you want to drive:
+RESP=$(curl -sS -X POST "$PROXY/get-shared-token" \
+  -H "Authorization: Bearer $PAT")
 
-```bash
-API_SERVER="https://api.keploy.io"
-NS="default"
-DEPLOY="orders-api"
-CLUSTER="prod-use1"
+K8S_PROXY_SHARED_TOKEN=$(echo "$RESP" | jq -r '.sharedToken')
+INGRESS_URL=$(echo "$RESP" | jq -r '.ingressUrl')
 
-# 1. Authenticate as a Keploy user (admin, user, or cicd role)
-JWT=$(curl -s -X POST "$API_SERVER/login" \
-  -H "Content-Type: application/json" \
-  -d '{"email":"you@example.com","password":"..."}' | jq -r '.token')
+AUTH="Authorization: Bearer $K8S_PROXY_SHARED_TOKEN"
+```
 
-# 2. Look up the proxy app for this Deployment and read its sharedToken
-K8S_PROXY_SHARED_TOKEN=$(curl -s -H "Authorization: Bearer $JWT" \
-  "$API_SERVER/cluster/getApp?namespace=$NS&deployment=$DEPLOY&clusterName=$CLUSTER" \
-  | jq -r '.sharedToken')
+A successful exchange returns:
 
-AUTH="Authorization: Bearer $K8S_PROXY_SHARED_TOKEN"
+```json
+{
+  "ingressUrl": "https://your-proxy-ingress",
+  "sharedToken": "3e14be232bce3e3cf6f6d58f284b6eb88db3280c54d93a7951e5000c6bbe3e9a",
+  "deploymentType": "saas"
+}
 ```
 
-`GET /cluster/getApps` returns the same `sharedToken` field for every proxy-managed app in your organization in a single response, which is convenient when you want to script across many Deployments at once.
+- `sharedToken` — use this on every subsequent call as `Authorization: Bearer <sharedToken>`.
+- `ingressUrl` — echoes back the address the proxy was installed with, so a script can derive every other URL from one bootstrap call.
+- `deploymentType` — `"saas"` for the hosted control plane, `"self-hosted"` for self-hosted installs.
 
-> The proxy shared token is cluster-wide, not per-user. The API server still uses normal user JWT/cookie authentication on its own routes (including `/cluster/getApp`). The token is sticky across Pod restarts and chart upgrades, so callers can cache it for the lifetime of the Helm release.
+The shared token is **stable for the lifetime of the Helm release** — Pod restarts and chart upgrades do not rotate it — so a caller can exchange the PAT once at the start of a CI job and cache the result for the rest of the run.
 
----
+### Exchange failure modes
 
-## Response format
+| Status | When                                                                                  |
+| ------ | ------------------------------------------------------------------------------------- |
+| `401`  | Missing/empty `Authorization` header, or the PAT is invalid, revoked, or expired.     |
+| `403`  | The PAT is valid but belongs to a different tenant than this proxy's cluster.         |
+| `502`  | The proxy could not reach the API server to validate the PAT (transient — retry).     |
+| `503`  | The proxy is still booting and has not authenticated to the API server yet (retry).   |
 
-Handlers return JSON with `application/json` on success. Validation failures usually return `{"error": "..."}` with a 4xx status; shared-token auth failures return `{"success": false, "message": "Unauthorized: ..."}`. A handful of endpoints stream newline-delimited JSON instead - they are called out explicitly below.
+Under the hood, `POST /get-shared-token` calls `POST /cluster/pat/validate` on the API server (using the proxy's own cluster JWT) to verify the PAT, then returns the cached shared token only on success. The PAT is never echoed back, never stored on the proxy, and never logged in cleartext.
 
-```js
-// Successful record start (200)
-{ "record": "started", "id": "default-orders-api" }
+> The shared token is cluster-wide, not per-user. The PAT-exchange path makes that distinction safe: every caller authenticates as themselves with a PAT, and the shared token never leaves the boundary of the bootstrap response.
 
-// Validation error (400)
-{ "error": "namespace and deployment are required" }
+---
 
-// Auth error (401)
-{ "success": false, "message": "Unauthorized: Missing authorization header" }
+## Response format
 
-// Namespace-scoped proxy rejecting a cross-namespace call (403)
-{ "error": "this proxy is scoped to namespace \"payments\"" }
-```
+Most routes return `application/json`. Successful responses are handler-specific (e.g. `{"record":"started","id":"default-orders-api"}`); validation errors are always `{"error": "..."}` with a 4xx status. Auth-failure shape is covered in [Authentication](#authentication).
 
-### Error status codes
+A few endpoints stream **newline-delimited JSON** (`application/x-ndjson`) instead — `/record/status` and `/test/status`. Read these line-by-line, not as a single JSON document.
 
 | HTTP | When it happens                                                                                 |
 | ---- | ----------------------------------------------------------------------------------------------- |
@@ -153,7 +225,6 @@ Handlers return JSON with `application/json` on success. Validation failures usu
 | 401  | Missing or invalid `Authorization: Bearer` header                                               |
 | 403  | Request touches a namespace outside `watchNamespace`, or image repo mismatch on `/proxy/update` |
 | 404  | Recording/replay session ID not found, or deployment/config does not exist                      |
-| 405  | Wrong HTTP method for the route                                                                 |
 | 500  | Kubernetes API error, storage backend unavailable, or unexpected server error                   |
 | 503  | Kubernetes client or self-discovery not initialised (proxy is still starting or missing RBAC)   |
 
@@ -161,15 +232,21 @@ Handlers return JSON with `application/json` on success. Validation failures usu
 
 ## Quick start: Trigger and watch a live recording
 
-The golden path: pick a Deployment, start a recording, stream its status, and stop it when you have the traffic you need.
+The golden path: authenticate, pick a Deployment, start a recording, stream its status, and stop it when you have the traffic you need.
 
-### 1. Set up variables
+### 1. Set up variables and authenticate
 
 ```bash
 PROXY="https://k8s-proxy.example.com"    # ingressUrl from Helm install
-AUTH="Authorization: Bearer $K8S_PROXY_SHARED_TOKEN"
+PAT="kep_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
 NS="default"
 DEPLOY="orders-api"
+
+# One-time PAT → shared-token exchange (see Authentication above)
+K8S_PROXY_SHARED_TOKEN=$(curl -sS -X POST "$PROXY/get-shared-token" \
+  -H "Authorization: Bearer $PAT" | jq -r '.sharedToken')
+
+AUTH="Authorization: Bearer $K8S_PROXY_SHARED_TOKEN"
 ```
 
 ### 2. Discover target Deployments
@@ -297,6 +374,14 @@ All paths are relative to the proxy base URL. Unless noted, every route requires
 | `GET`  | `/healthz` | No   | Liveness probe. Returns `{"status":"ok"}`.                          |
 | `POST` | `/mutate`  | No   | Kubernetes MutatingAdmissionWebhook endpoint. Do not call directly. |
 
+### Bootstrap
+
+| Method | Path                 | Auth                            | Description                                                                                                                  |
+| ------ | -------------------- | ------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
+| `POST` | `/get-shared-token`  | `Authorization: Bearer <PAT>`   | Exchange a Personal Access Token for the proxy's shared token. See [Retrieve the token (c)](#retrieve-the-token) for details. |
+
+This is the only protected endpoint that does **not** use the shared token; it gates on a PAT instead because the caller does not yet have the shared token. Every other route below requires `Authorization: Bearer <K8S_PROXY_SHARED_TOKEN>`.
+
 ### Deployments
 
 | Method | Path                          | Description                                                          |