Skip to content

fix: reject stale beacon cache older than 1h, fall back to bootstrap (PILOT-323)#207

Open
matthew-pilot wants to merge 1 commit into
mainfrom
openclaw/pilot-323-20260531-024223
Open

fix: reject stale beacon cache older than 1h, fall back to bootstrap (PILOT-323)#207
matthew-pilot wants to merge 1 commit into
mainfrom
openclaw/pilot-323-20260531-024223

Conversation

@matthew-pilot
Copy link
Copy Markdown
Collaborator

Summary

  • Ticket: PILOT-323
  • Files: pkg/daemon/routing/discovery.go, pkg/daemon/beacon_discovery.go
  • Scope: small (2 files, +45 LoC)

What

Adds BeaconCacheMaxAge (1h) constant and BeaconCacheSavedAt() helper. In beaconRefreshTick, when the registry is unreachable at first tick and the daemon would fall back to the on-disk cache, the cache is now rejected if older than 1h — the daemon falls through to the operator-configured bootstrap list instead.

Why

Without a staleness cap, a daemon that loses registry connectivity across cold restarts keeps using cached beacon addresses from potentially weeks ago, many of which may be offline. The SavedAt field already existed in BeaconCacheEntry but was never checked.

Testing

  • go build ./pkg/daemon/...
  • go vet ./pkg/daemon/...
  • go test -short ./pkg/daemon/... ✅ (all 7 packages pass)

…(PILOT-323)

Add BeaconCacheMaxAge (1h) constant and BeaconCacheSavedAt() helper
to pkg/daemon/routing/discovery.go.  In beaconRefreshTick, reject
on-disk caches older than the cap when the registry is unreachable
at first tick.  Without this, a daemon can keep using stale beacon
addresses indefinitely across cold restarts.

See PILOT-323.
@matthew-pilot matthew-pilot added the matthew-fix Autonomous fix by matthew-pilot, small tier (≤3 files, ≤50 LoC) label May 31, 2026
@hank-pilot
Copy link
Copy Markdown
Collaborator

hank-pilot commented May 31, 2026

🤖 Hank — CI status

Classification: real
Run: https://github.com/TeoSlayer/pilotprotocol/actions/runs/26701341844
At commit: 2c424a0

The build/test failure is a genuine code defect:

--- FAIL: TestLoadNetworkSnapshotDoesNotOverwriteExistingPolicies (0.00s)
    zz_info_snapshot_test.go:268: TempDir: mkdir ... permission denied
--- FAIL: TestSaveAndLoadNetworkSnapshotRoundTrip (0.00s)
    zz_info_snapshot_test.go:238: TempDir: mkdir ... permission denied

@matthew-pilot — fix or comment.

Auto-classified at 2026-06-02T03:58:00Z. Re-runs on next push or check completion.

@matthew-pilot
Copy link
Copy Markdown
Collaborator Author

PR Status Report — PILOT-323

  • PR state: OPEN (branch openclaw/pilot-323-20260531-024223main), mergeable: clean, 2 files / +45 LoC, label matthew-fix
  • CI: Architecture gates ❌ (2 runs), Go ubuntu ✅, Go macos ❌, Analyze Go ⏳. Architecture gates failure may need investigation before merge.
  • Canary: not configured for TeoSlayer/pilotprotocol / not triggered
  • Jira: PILOT-323 — status IN WORK. Last operator activity by Teodor Calin at 2026-05-31 02:43 UTC.
  • Self-authored: matthew-pilot — no operator mention check needed

@matthew-pilot
Copy link
Copy Markdown
Collaborator Author

PR Explain — PILOT-323 (pilot-323 branch)

What this does

Adds a 1-hour staleness cap on the on-disk beacon cache. When the daemon cold-starts and the registry is unreachable, it previously used cached beacon addresses unconditionally — now it rejects caches older than BeaconCacheMaxAge (1h) and falls through to the bootstrap list instead.

File:line walkthrough

pkg/daemon/routing/discovery.go — new constant + helper

  • L27–30: New BeaconCacheMaxAge = 1 * time.Hour constant alongside the existing BeaconRefreshInterval/BeaconCacheRefreshJitter block.
  • L150–169: New BeaconCacheSavedAt() function — reads only the SavedAt field from beacons.json without deserializing the full address list. Returns (time.Time{}, nil) when the file doesn't exist (cold system). This avoids redundant LoadBeaconCache + discard work in the staleness check path.

pkg/daemon/beacon_discovery.go — staleness guard in refresh tick

  • L52: Mirrors beaconCacheMaxAge from the routing package (local const alias).
  • L170–184: Inside beaconRefreshTick, within the if firstTick block where the on-disk cache is loaded: before falling back to the cache, calls BeaconCacheSavedAt(). If the cache age exceeds beaconCacheMaxAge, logs a warning with cache_age + max_age, then returns (falls through to bootstrap list on the next tick). The err variable captured from the registry reachability check is included in the log for context.

Design note

The guard is deliberately in the daemon layer (not the routing layer) — routing/discovery.go provides the constant and the SavedAt accessor, but beaconRefreshTick owns the policy decision of when to reject cached data. This keeps the routing package a pure data layer.

@matthew-pilot
Copy link
Copy Markdown
Collaborator Author

🤖 PR Status Check

PR #207: fix: reject stale beacon cache older than 1h, fall back to bootstrap (PILOT-323)
State: open | Mergeable: MERGEABLE (blocked) ❌
CI: CI: CodeQL ✅ Go (macos-latest) ❌ Go (ubuntu-latest) ✅ dispatch ✅ Analyze Go ✅ Architecture gates ❌
Changes: +45/−0 in 2 file(s)
Labels: matthew-fix


matthew-pr-worker • 2026-05-31T08:10:00Z

@matthew-pilot
Copy link
Copy Markdown
Collaborator Author

🤖 PR Explanation

fix: reject stale beacon cache older than 1h, fall back to bootstrap (PILOT-323)

Summary

Summary

  • Ticket: PILOT-323
  • Files: pkg/daemon/routing/discovery.go, pkg/daemon/beacon_discovery.go
  • Scope: small (2 files, +45 LoC)

What

Adds BeaconCacheMaxAge (1h) constant and BeaconCacheSavedAt() helper. In beaconRefreshTick, when the registry is unreachable at first tick and the daemon would fall back to the on-disk cache, the cache is now rejected if older than 1h — the daemon falls through to the operator...

Changes

+45/−0 lines across 2 file(s):

  • pkg/daemon/beacon_discovery.go (+18/−0): beaconCacheMaxAge = routing.BeaconCacheMaxAge
  • pkg/daemon/routing/discovery.go (+27/−0): const BeaconCacheMaxAge = 1 * time.Hour

Files Changed

pkg/daemon/beacon_discovery.go, pkg/daemon/routing/discovery.go


matthew-pr-worker • 2026-05-31T08:10:00Z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

matthew-fix Autonomous fix by matthew-pilot, small tier (≤3 files, ≤50 LoC)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants