From 447024ba1babe52953cd982eb050349b6e9e1167 Mon Sep 17 00:00:00 2001 From: Yash Mehrotra Date: Thu, 29 Jan 2026 17:05:16 +0530 Subject: [PATCH 1/5] chore: add distributed canaries docs + blog --- .../docs/concepts/distributed-canaries.md | 102 ++++++ .../blog/distributed-canaries/index.mdx | 293 ++++++++++++++++++ 2 files changed, 395 insertions(+) create mode 100644 canary-checker/docs/concepts/distributed-canaries.md create mode 100644 mission-control/blog/distributed-canaries/index.mdx diff --git a/canary-checker/docs/concepts/distributed-canaries.md b/canary-checker/docs/concepts/distributed-canaries.md new file mode 100644 index 00000000..8ba27a8a --- /dev/null +++ b/canary-checker/docs/concepts/distributed-canaries.md @@ -0,0 +1,102 @@ +--- +title: Distributed Canaries +sidebar_custom_props: + icon: network +sidebar_position: 6 +--- + +Distributed canaries allow you to define a check once and have it automatically run on multiple agents. This is useful for monitoring services from different locations, clusters, or network segments. + +## How It Works + +When you specify an `agentSelector` on a canary: + +1. The canary does **not** run locally on the server +2. A copy of the canary is created for each matched agent +3. Each agent runs the check independently and reports results back +4. The copies are kept in sync with the parent canary + +A background job syncs agent selector canaries every 5 minutes. When agents are added or removed, the derived canaries are automatically created or cleaned up. + +## Agent Selector Patterns + +The `agentSelector` field accepts a list of patterns to match agent names: + +| Pattern | Description | +|---------|-------------| +| `agent-1` | Exact match | +| `eu-west-*` | Prefix match (glob) | +| `*-prod` | Suffix match (glob) | +| `!staging` | Exclude agents matching this pattern | +| `team-*`, `!team-b` | Match all `team-*` except `team-b` | + +## Example: HTTP Check on All Agents + +This example creates an HTTP check for a Kubernetes service that runs on every agent matching the pattern: + +```yaml title="distributed-http-check.yaml" +apiVersion: canaries.flanksource.com/v1 +kind: Canary +metadata: + name: api-health + namespace: monitoring +spec: + schedule: "@every 1m" + http: + - name: api-endpoint + url: http://api-service.default.svc.cluster.local:8080/health + responseCodes: [200] + test: + expr: json.status == 'healthy' + agentSelector: + - "*" # Run on all agents +``` + +When this canary is created: + +1. The check is executed locally only when `local` agent is provided in selector +2. A derived canary is created for each registered agent +3. Each agent executes the HTTP check against `api-service.default.svc.cluster.local:8080/health` in its own cluster +4. Results from all agents are aggregated and visible in the UI + +## Example: Regional Monitoring + +Monitor an external API from specific regions: + +```yaml title="regional-monitoring.yaml" +apiVersion: canaries.flanksource.com/v1 +kind: Canary +metadata: + name: external-api-latency +spec: + schedule: "@every 5m" + http: + - name: payment-gateway + url: https://api.payment-provider.com/health + responseCodes: [200] + maxResponseTime: 500 + agentSelector: + - "eu-*" # All EU agents + - "us-*" # All US agents + - "!us-test" # Exclude test agent + - "local" # Run on local instance as well +``` + +## Example: Exclude Specific Agents + +Run checks on all agents except those in a specific environment: + +```yaml title="production-only.yaml" +apiVersion: canaries.flanksource.com/v1 +kind: Canary +metadata: + name: production-checks +spec: + schedule: "@every 2m" + http: + - name: internal-service + url: http://internal.example.com/status + agentSelector: + - "!*-dev" # Exclude all dev agents + - "!*-staging" # Exclude all staging agents +``` diff --git a/mission-control/blog/distributed-canaries/index.mdx b/mission-control/blog/distributed-canaries/index.mdx new file mode 100644 index 00000000..ad90b1dd --- /dev/null +++ b/mission-control/blog/distributed-canaries/index.mdx @@ -0,0 +1,293 @@ +--- +title: "Monitoring From Every Angle: A Guide to Distributed Canaries" +description: Learn how to run the same health check across multiple clusters and regions with a single canary definition +slug: distributed-canaries-tutorial +authors: [yash] +tags: [canary-checker, distributed, multi-cluster, agents] +hide_table_of_contents: false +--- + +# Monitoring From Every Angle: A Guide to Distributed Canaries + +If you've ever managed services across multiple Kubernetes clusters, you know the pain. You write the same health check for cluster A, copy-paste it for cluster B, tweak it for cluster C, and before you know it, you're maintaining a dozen nearly-identical YAML files. When something changes, you're updating them all. It's tedious, error-prone, and frankly, a waste of time. + +What if you could define a check once and have it automatically run everywhere you need it? + +That's exactly what distributed canaries do. + + + +## The Problem With Multi-Cluster Monitoring + +Let's say you're running an API service that's deployed across three clusters: one in `eu-west`, one in `us-east`, and one in `ap-south`. You want to monitor the `/health` endpoint from each cluster to ensure the service is responding correctly in all regions. + +The naive approach looks something like this: + +```yaml title="eu-west-cluster/api-health.yaml" +apiVersion: canaries.flanksource.com/v1 +kind: Canary +metadata: + name: api-health +spec: + schedule: "@every 5m" + http: + - name: api-endpoint + url: http://api-service.default.svc:8080/health + responseCodes: [200] +``` + +Now multiply that by three clusters. And then by every service you want to monitor. You see where this is going. + +## Enter Agent Selector + +Canary Checker has a feature called `agentSelector` that solves this problem elegantly. Instead of deploying canaries to each cluster individually, you deploy agents to your clusters and define your canaries centrally with an `agentSelector` that specifies where they should run. + +Here's the same check, but now it runs on all your agents: + +```yaml title="api-health.yaml" +apiVersion: canaries.flanksource.com/v1 +kind: Canary +metadata: + name: api-health +spec: + schedule: "@every 5m" + http: + - name: api-endpoint + url: http://api-service.default.svc:8080/health + responseCodes: [200] + agentSelector: + - "*" # Run on all agents +``` + +That's it. One file, all clusters. + +## How It Actually Works + +When you create a canary with an `agentSelector`, something interesting happens: the canary doesn't run on the central server at all. Instead, the system: + +1. Looks at all registered agents +2. Matches agent names against your selector patterns +3. Creates a copy of the canary for each matched agent +4. Each agent runs the check independently and reports results back + +The copies are kept in sync automatically. If you update the parent canary, all the derived canaries update too. If you add a new agent that matches the pattern, it gets the canary within a few minutes. If you remove an agent, its canary is cleaned up. + +## Tutorial: Setting Up Distributed Monitoring + +Let's walk through a practical example. We'll set up monitoring for an internal service that needs to be checked from multiple clusters. + +### Prerequisites + +You'll need: +- A central Mission Control instance +- At least two Kubernetes clusters with agents installed + +### Step 1: Register Your Agents + +First, make sure your agents are registered with meaningful names. When you [install the agent helm chart](/docs/installation/saas/agent), you specify the agent name: + +```bash +helm install mission-control-agent flanksource/mission-control-agent \ + --set clusterName= \ + --set upstream.agent=YOUR_LOCAL_NAME \ + --set upstream.username=token \ + --set upstream.password= \ + --set upstream.host= \ + -n mission-control --create-namespace \ + --wait +``` + +Do this for each cluster with descriptive names like `eu-west-prod`, `us-east-prod`, `ap-south-prod`. + +### Step 2: Create Your Distributed Canary + +Now create a canary that targets all production agents: + +```yaml title="distributed-service-check.yaml" +apiVersion: canaries.flanksource.com/v1 +kind: Canary +metadata: + name: payment-service-health + namespace: monitoring +spec: + schedule: "@every 30s" + http: + - name: payment-api + url: http://payment-service.payments.svc.cluster.local:8080/health + responseCodes: [200] + maxResponseTime: 500 + test: + expr: json.status == 'healthy' && json.database == 'connected' + agentSelector: + - "*-prod" # All agents ending with -prod +``` + +Apply this to your central Mission Control instance: + +```bash +kubectl apply -f distributed-service-check.yaml +``` + +### Step 3: Verify It's Working + +Within a few minutes, you should see derived canaries created for each agent. You can verify this in the Mission Control UI, or by checking the canaries list: + +```bash +kubectl get canaries -A +``` + +You'll see the original canary plus one derived canary per matched agent. + +## Pattern Matching Deep Dive + +The `agentSelector` field is quite flexible. Here are some patterns you'll find useful: + +### Select All Agents + +```yaml +agentSelector: + - "*" +``` + +### Select by Prefix (Regional) + +```yaml +agentSelector: + - "eu-*" # All European agents + - "us-*" # All US agents +``` + +### Select by Suffix (Environment) + +```yaml +agentSelector: + - "*-prod" # All production agents + - "*-staging" # All staging agents +``` + +### Exclude Specific Agents + +```yaml +agentSelector: + - "*-prod" # All production agents + - "!us-east-prod" # Except US East (maybe it's being decommissioned) +``` + +### Exclusion-Only Patterns + +You can also just exclude, which means "all agents except these": + +```yaml +agentSelector: + - "!*-dev" # All agents except dev + - "!*-test" # And except test +``` + +## Real-World Use Cases + +### Geographic Latency Monitoring + +Monitor an external API from all your regions to compare latency: + +```yaml +apiVersion: canaries.flanksource.com/v1 +kind: Canary +metadata: + name: stripe-api-latency +spec: + schedule: "@every 5m" + http: + - name: stripe-health + url: https://api.stripe.com/v1/health + responseCodes: [200] + maxResponseTime: 1000 + agentSelector: + - "*" +``` + +Now you can see if Stripe is slower from one region than another. + +### Internal Service Mesh Validation + +Verify that internal services are reachable from all clusters: + +```yaml +apiVersion: canaries.flanksource.com/v1 +kind: Canary +metadata: + name: mesh-connectivity +spec: + schedule: "@every 1m" + http: + - name: auth-service + url: http://auth.internal.example.com/health + - name: user-service + url: http://users.internal.example.com/health + - name: orders-service + url: http://orders.internal.example.com/health + agentSelector: + - "*-prod" +``` + +### Gradual Rollout Monitoring + +When rolling out a new service version, monitor it from a subset of clusters first: + +```yaml +agentSelector: + - "us-east-prod" # Canary region first +``` + +Then expand: + +```yaml +agentSelector: + - "us-*-prod" # All US production +``` + +And finally: + +```yaml +agentSelector: + - "*-prod" # All production +``` + +## What Happens Under the Hood + +The system runs a background sync job every 5 minutes that: + +1. Finds all canaries with `agentSelector` set +2. For each canary, matches agent names against the patterns +3. Creates or updates derived canaries for matched agents +4. Deletes derived canaries for agents that no longer match + +There's also an hourly cleanup job that removes orphaned derived canaries (when the parent canary is deleted). + +This means: +- Changes propagate within 5 minutes +- You don't need to restart anything when adding agents +- The system is self-healing + +## Tips and Gotchas + +**Agent names matter.** Pick a naming convention early and stick to it. Something like `{region}-{environment}` works well. + +**The parent canary doesn't run locally.** If you have an `agentSelector`, the canary only runs on the matched agents, not on the server where you applied it unless `local` is specified. + +**Results are aggregated.** In the UI, you'll see results from all agents. This gives you a single view of service health across all locations. + +**Start specific, then broaden.** When testing a new canary, start with a specific agent name, verify it works, then expand to patterns. + +## Conclusion + +Distributed canaries turn a maintenance headache into a one-liner. Instead of managing N copies of the same check across N clusters, you define it once and let the system handle the distribution. + +The pattern matching is powerful enough to handle complex scenarios (regional rollouts, environment separation, gradual expansion) while staying simple for common cases. + +If you're running services across multiple clusters and haven't tried this yet, give it a shot. Your future self will thank you. + +## References + +- [Distributed Canaries Concept](/docs/guide/canary-checker/concepts/distributed-canaries) +- [Canary Spec Reference](/docs/guide/canary-checker/reference/canary-spec) +- [Agent Installation Guide](/docs/docs/installation/saas/agent) From 165ca77cddba9211abc4629e039b6a0edc6ec244 Mon Sep 17 00:00:00 2001 From: Yash Mehrotra Date: Thu, 29 Jan 2026 18:29:36 +0530 Subject: [PATCH 2/5] chore: run task fmt --- .../docs/concepts/distributed-canaries.md | 34 +++++++++---------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/canary-checker/docs/concepts/distributed-canaries.md b/canary-checker/docs/concepts/distributed-canaries.md index 8ba27a8a..9ad7ced4 100644 --- a/canary-checker/docs/concepts/distributed-canaries.md +++ b/canary-checker/docs/concepts/distributed-canaries.md @@ -22,13 +22,13 @@ A background job syncs agent selector canaries every 5 minutes. When agents are The `agentSelector` field accepts a list of patterns to match agent names: -| Pattern | Description | -|---------|-------------| -| `agent-1` | Exact match | -| `eu-west-*` | Prefix match (glob) | -| `*-prod` | Suffix match (glob) | -| `!staging` | Exclude agents matching this pattern | -| `team-*`, `!team-b` | Match all `team-*` except `team-b` | +| Pattern | Description | +| ------------------- | ------------------------------------ | +| `agent-1` | Exact match | +| `eu-west-*` | Prefix match (glob) | +| `*-prod` | Suffix match (glob) | +| `!staging` | Exclude agents matching this pattern | +| `team-*`, `!team-b` | Match all `team-*` except `team-b` | ## Example: HTTP Check on All Agents @@ -41,7 +41,7 @@ metadata: name: api-health namespace: monitoring spec: - schedule: "@every 1m" + schedule: '@every 1m' http: - name: api-endpoint url: http://api-service.default.svc.cluster.local:8080/health @@ -49,7 +49,7 @@ spec: test: expr: json.status == 'healthy' agentSelector: - - "*" # Run on all agents + - '*' # Run on all agents ``` When this canary is created: @@ -69,17 +69,17 @@ kind: Canary metadata: name: external-api-latency spec: - schedule: "@every 5m" + schedule: '@every 5m' http: - name: payment-gateway url: https://api.payment-provider.com/health responseCodes: [200] maxResponseTime: 500 agentSelector: - - "eu-*" # All EU agents - - "us-*" # All US agents - - "!us-test" # Exclude test agent - - "local" # Run on local instance as well + - 'eu-*' # All EU agents + - 'us-*' # All US agents + - '!us-test' # Exclude test agent + - 'local' # Run on local instance as well ``` ## Example: Exclude Specific Agents @@ -92,11 +92,11 @@ kind: Canary metadata: name: production-checks spec: - schedule: "@every 2m" + schedule: '@every 2m' http: - name: internal-service url: http://internal.example.com/status agentSelector: - - "!*-dev" # Exclude all dev agents - - "!*-staging" # Exclude all staging agents + - '!*-dev' # Exclude all dev agents + - '!*-staging' # Exclude all staging agents ``` From cfed6b2ad82c95bdcb8f332f7a460365ef756b92 Mon Sep 17 00:00:00 2001 From: Yash Mehrotra Date: Mon, 9 Feb 2026 19:54:13 +0530 Subject: [PATCH 3/5] chore: add multi approach --- .../skills/update-source-references/SKILL.md | 17 ++- .../docs/concepts/distributed-canaries.md | 4 + .../blog/distributed-canaries/index.mdx | 117 ++++++++++++++---- 3 files changed, 111 insertions(+), 27 deletions(-) diff --git a/.claude/skills/update-source-references/SKILL.md b/.claude/skills/update-source-references/SKILL.md index 64571d31..a0cda014 100644 --- a/.claude/skills/update-source-references/SKILL.md +++ b/.claude/skills/update-source-references/SKILL.md @@ -4,19 +4,26 @@ For all mdx/md files in @docs/canary-checker/ and @docs/mission-control/ that ha 2. For each documented struct, compare ALL public fields from the Go source against the documentation and: - Add any missing fields - - Fix incorrect field names (check json/yaml tags - use the json/yaml tag name, not the Go field name) - - If json/yaml tag differ from each other, warn user + - Fix incorrect field names (check json/yaml tags - use the json tag name, not the Go field name) + - If json/yaml tags differ from each other, prefer the json tag and warn user - Fix incorrect schemes/types (e.g., `Duration` vs `int`, `bool` vs `string`) - Fix incorrect nested structures (check if fields are inline or nested under a parent key) - Remove fields that don't exist in the Go struct - For inline embedded structs, verify which fields they provide -3. For \_canary-spec.mdx, ensure all check types from CanarySpec are listed with correct field names matching the json/yaml tags +3. **For nested struct types (like `ExecConnections`, `GitConnection`, etc.), you MUST:** + - Find the actual struct definition in the codebase (may be in different packages like `duty/connection/`) + - Document ALL fields from that struct, not just the ones currently in docs + - Follow type references across packages to get complete field lists + +4. For _canary-spec.mdx, ensure all check types from CanarySpec are listed with correct field names matching the json tags Pay attention to: -- yaml tags like `yaml:"env"` mean the field name in docs should be `env`, not the Go field name +- Use json tags as the canonical field name (e.g., `json:"env"` means field name in docs should be `env`) +- If yaml and json tags differ, use json tag and warn the user about the discrepancy - Inline embedded structs (e.g., `Connection`, `Description`, `Templatable`) - their fields appear at the same level - Pointer vs value types for nested structs - Deprecated fields should be marked as such -- ignore private fields +- Ignore private fields +- Connection types may be defined in `modules/duty/connection/` not just in the check's own file - always trace the import path to find the actual struct definition diff --git a/canary-checker/docs/concepts/distributed-canaries.md b/canary-checker/docs/concepts/distributed-canaries.md index 9ad7ced4..5bb928b8 100644 --- a/canary-checker/docs/concepts/distributed-canaries.md +++ b/canary-checker/docs/concepts/distributed-canaries.md @@ -7,6 +7,10 @@ sidebar_position: 6 Distributed canaries allow you to define a check once and have it automatically run on multiple agents. This is useful for monitoring services from different locations, clusters, or network segments. +:::info +This feature is only available in [Mission Control](https://flanksource.com/docs) since Canary Checker does not support agents +::: + ## How It Works When you specify an `agentSelector` on a canary: diff --git a/mission-control/blog/distributed-canaries/index.mdx b/mission-control/blog/distributed-canaries/index.mdx index ad90b1dd..a95b06d8 100644 --- a/mission-control/blog/distributed-canaries/index.mdx +++ b/mission-control/blog/distributed-canaries/index.mdx @@ -38,11 +38,81 @@ spec: Now multiply that by three clusters. And then by every service you want to monitor. You see where this is going. -## Enter Agent Selector +There are two ways to solve this, and each fits different situations. -Canary Checker has a feature called `agentSelector` that solves this problem elegantly. Instead of deploying canaries to each cluster individually, you deploy agents to your clusters and define your canaries centrally with an `agentSelector` that specifies where they should run. +## Two Approaches -Here's the same check, but now it runs on all your agents: +### 1. Bundle Canaries With Your Deployment (Push) + +If you're already deploying your application to multiple clusters using Helm, ArgoCD, Flux, or any other deployment tool, you can include the Canary resource right alongside your application. The canary deploys wherever your app deploys — one canary per cluster, automatically. + +### 2. Agent Selector (Pull) + +If you want to define checks centrally and have them distributed to agents, you use `agentSelector`. You write the canary once on the Mission Control server, and it gets replicated to every matched agent. + +Both approaches get you the same result — a health check running in every cluster. The difference is in how they get there. Let's look at each one. + +## Approach 1: Bundle With Your Deployment + +This is the simplest approach if you already have a deployment pipeline that targets multiple clusters. You add the Canary resource to your Helm chart (or Kustomize overlay, or whatever you use), and it rides along with your application. + +Say you have a Helm chart for your `payment-service`. You'd add a canary template: + +```yaml title="charts/payment-service/templates/canary.yaml" +apiVersion: canaries.flanksource.com/v1 +kind: Canary +metadata: + name: {{ .Release.Name }}-health + namespace: {{ .Release.Namespace }} +spec: + schedule: "@every 1m" + http: + - name: payment-api + url: http://{{ .Release.Name }}.{{ .Release.Namespace }}.svc:8080/health + responseCodes: [200] + test: + expr: json.status == 'healthy' +``` + +Now when you deploy your service to three clusters: + +```bash +# EU West +helm install payment-service ./charts/payment-service \ + --kube-context eu-west-prod + +# US East +helm install payment-service ./charts/payment-service \ + --kube-context us-east-prod + +# AP South +helm install payment-service ./charts/payment-service \ + --kube-context ap-south-prod +``` + +Each cluster gets its own canary, running against the local service endpoint. The canary lives and dies with the deployment — if you uninstall the chart, the canary goes with it. + +The nice thing about this approach is that each canary can be customized per environment using Helm values: + +```yaml title="values-eu-west.yaml" +canary: + schedule: "@every 30s" + maxResponseTime: 200 # Stricter for EU +``` + +```yaml title="values-ap-south.yaml" +canary: + schedule: "@every 2m" + maxResponseTime: 800 # More lenient for AP +``` + +This gives you per-cluster tuning that's version-controlled right alongside your deployment config. + +## Approach 2: Agent Selector + +Agent selector takes the opposite approach. Instead of deploying canaries alongside your application, you define them centrally on Mission Control and specify which agents should run them. + +Here's the same health check, but managed centrally: ```yaml title="api-health.yaml" apiVersion: canaries.flanksource.com/v1 @@ -61,9 +131,9 @@ spec: That's it. One file, all clusters. -## How It Actually Works +### How Agent Selector Works -When you create a canary with an `agentSelector`, something interesting happens: the canary doesn't run on the central server at all. Instead, the system: +When you create a canary with an `agentSelector`, the canary doesn't run on the central server at all. Instead, the system: 1. Looks at all registered agents 2. Matches agent names against your selector patterns @@ -72,19 +142,13 @@ When you create a canary with an `agentSelector`, something interesting happens: The copies are kept in sync automatically. If you update the parent canary, all the derived canaries update too. If you add a new agent that matches the pattern, it gets the canary within a few minutes. If you remove an agent, its canary is cleaned up. -## Tutorial: Setting Up Distributed Monitoring - -Let's walk through a practical example. We'll set up monitoring for an internal service that needs to be checked from multiple clusters. - -### Prerequisites +### Setting It Up You'll need: - A central Mission Control instance - At least two Kubernetes clusters with agents installed -### Step 1: Register Your Agents - -First, make sure your agents are registered with meaningful names. When you [install the agent helm chart](/docs/installation/saas/agent), you specify the agent name: +**Register your agents** with meaningful names. When you [install the agent helm chart](/docs/installation/saas/agent), you specify the agent name: ```bash helm install mission-control-agent flanksource/mission-control-agent \ @@ -94,14 +158,12 @@ helm install mission-control-agent flanksource/mission-control-agent \ --set upstream.password= \ --set upstream.host= \ -n mission-control --create-namespace \ - --wait + --wait ``` Do this for each cluster with descriptive names like `eu-west-prod`, `us-east-prod`, `ap-south-prod`. -### Step 2: Create Your Distributed Canary - -Now create a canary that targets all production agents: +**Create your distributed canary** targeting all production agents: ```yaml title="distributed-service-check.yaml" apiVersion: canaries.flanksource.com/v1 @@ -128,8 +190,6 @@ Apply this to your central Mission Control instance: kubectl apply -f distributed-service-check.yaml ``` -### Step 3: Verify It's Working - Within a few minutes, you should see derived canaries created for each agent. You can verify this in the Mission Control UI, or by checking the canaries list: ```bash @@ -138,6 +198,19 @@ kubectl get canaries -A You'll see the original canary plus one derived canary per matched agent. +## When to Use Which + +| | Bundled with Deployment | Agent Selector | +|---|---|---| +| **Model** | Push — canary deploys with your app | Pull — canary is distributed from a central server | +| **Best for** | Application-specific checks that should live with the app | Infrastructure-wide checks or cross-cutting concerns | +| **Per-cluster customization** | Full control via Helm values or overlays | Same check everywhere (that's the point) | +| **Lifecycle** | Tied to the deployment — created and deleted with it | Managed centrally — independent of app deployments | +| **Requires Mission Control** | No — works with standalone canary-checker | Yes — agents report back to Mission Control | +| **Who owns it** | The team deploying the service | The platform or SRE team | + +In practice, you'll likely use both. Application teams bundle canaries in their Helm charts for service-specific checks (with per-environment tuning). The platform team uses agent selector for cross-cutting concerns like external API reachability, DNS resolution, or certificate expiry — checks that don't belong to any single application but need to run everywhere. + ## Pattern Matching Deep Dive The `agentSelector` field is quite flexible. Here are some patterns you'll find useful: @@ -280,11 +353,11 @@ This means: ## Conclusion -Distributed canaries turn a maintenance headache into a one-liner. Instead of managing N copies of the same check across N clusters, you define it once and let the system handle the distribution. +Distributed canaries turn a maintenance headache into something manageable. Whether you bundle canaries in your Helm charts or manage them centrally with agent selector, you get health checks running everywhere your services live — without the copy-paste. -The pattern matching is powerful enough to handle complex scenarios (regional rollouts, environment separation, gradual expansion) while staying simple for common cases. +Bundle with your deployment when the check is specific to the application and the team owning the service should own the canary too. Use agent selector when you need the same check running across all clusters from a single source of truth. -If you're running services across multiple clusters and haven't tried this yet, give it a shot. Your future self will thank you. +Most teams end up using both. And that's probably the right call. ## References From 712e891276a035bb4881d84f04f3d85095edbcd6 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 11 Feb 2026 06:17:58 +0000 Subject: [PATCH 4/5] Initial plan From 5348915dc1b712ae3d6f67a7e56df3a7578b2ab0 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 11 Feb 2026 06:21:39 +0000 Subject: [PATCH 5/5] docs: fix lint issues and add blog link to distributed canaries Co-authored-by: moshloop <1489660+moshloop@users.noreply.github.com> --- canary-checker/docs/concepts/distributed-canaries.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/canary-checker/docs/concepts/distributed-canaries.md b/canary-checker/docs/concepts/distributed-canaries.md index 5bb928b8..9697a842 100644 --- a/canary-checker/docs/concepts/distributed-canaries.md +++ b/canary-checker/docs/concepts/distributed-canaries.md @@ -8,7 +8,11 @@ sidebar_position: 6 Distributed canaries allow you to define a check once and have it automatically run on multiple agents. This is useful for monitoring services from different locations, clusters, or network segments. :::info -This feature is only available in [Mission Control](https://flanksource.com/docs) since Canary Checker does not support agents +This feature is only available in [Mission Control](https://flanksource.com/docs) since Canary Checker does not support agents +::: + +:::tip +For a step-by-step tutorial and real-world examples, see the [Distributed Canaries blog post](/blog/distributed-canaries-tutorial). ::: ## How It Works