Skip to content

Add Windows node text-to-speech#253

Merged
shanselman merged 11 commits intoopenclaw:masterfrom
RBrid:user/rbrid/tts-capability
May 1, 2026
Merged

Add Windows node text-to-speech#253
shanselman merged 11 commits intoopenclaw:masterfrom
RBrid:user/rbrid/tts-capability

Conversation

@RBrid
Copy link
Copy Markdown
Contributor

@RBrid RBrid commented Apr 30, 2026

Implements support for the tts.speak command tracked by #252. Adds the shared TTS capability, Windows and ElevenLabs playback paths, Settings UI/persistence, gateway/MCP advertisement, Command Center classification, docs, and tests.

Summary

Adds support for Windows node text-to-speech via a new tts.speak command.

This is the focused TTS slice tracked by #252, extracted from the broader voice work prototyped in #120. Credit to @NichUK for the original reference implementation and exploration in #120.

Closes #252.

What changed

  • Added shared TtsCapability with command tts.speak.
  • Added Windows speech synthesis playback as the default local provider.
  • Added optional ElevenLabs TTS provider support.
  • Added TTS Settings UI and persisted settings.
  • Protected stored ElevenLabs API keys with Windows DPAPI.
  • Registered TTS in node mode and local MCP only when enabled.
  • Added MCP command description for tts.speak.
  • Classified tts.speak as a privacy-sensitive/dangerous Command Center command.
  • Excluded tts.speak from Mac parity diagnostics until Mac implements it.
  • Added provider-wide 5,000-character request guard.
  • Added ElevenLabs HTTP timeout and provider-side validation.
  • Added interrupt semantics so interrupted playback does not report false success.
  • Updated docs for setup, allowlisting, and manual testing.
  • Added Shared and Tray tests for capability behavior, settings, guardrails, and command grouping.

Manual validation

Validated Windows provider invocation through the gateway:

openclaw nodes invoke --node <windows-node-id> --command tts.speak --params '{"text":"hello from OpenClaw","provider":"windows"}'

Observed successful response:

{
  "ok": true,
  "command": "tts.speak",
  "payload": {
    "spoken": true,
    "provider": "windows",
    "contentType": "audio/wav",
    "durationMs": 2447
  }
}

Automated validation

Passed locally:

.\build.ps1
dotnet test .\tests\OpenClaw.Shared.Tests\OpenClaw.Shared.Tests.csproj --no-restore
dotnet test .\tests\OpenClaw.Tray.Tests\OpenClaw.Tray.Tests.csproj --no-restore

Results:

  • Shared tests: 1059 passed / 20 skipped
  • Tray tests: 252 passed

Implements phase 1 support for the tts.speak command tracked by openclaw#252. Adds the shared TTS capability, Windows and ElevenLabs playback paths, Settings UI/persistence, gateway/MCP advertisement, Command Center classification, docs, and tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
indierawk2k2 and others added 8 commits May 1, 2026 09:12
* Snap Reactor framework as OpenClawTray.Infrastructure

- Copy microsoft/microsoft-ui-reactor src/Reactor/ (249 C# files, 12 modules)
- Rename namespace Microsoft.UI.Reactor -> OpenClawTray.Infrastructure
- Create OpenClawTray.Infrastructure.csproj (net10.0, WinAppSDK 1.8)
- Add ProjectReference from OpenClaw.Tray.WinUI
- Add project to moltbot-windows-hub.slnx
- Fix C# 14 field keyword conflict in ValidationContext.cs
- Exclude ReactorApplication.xaml (library mode, host app owns Application)
- Update global.json rollForward to latestMajor
- Full solution builds clean (0 errors)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Implement OnboardingWindow host with Reactor pages

- OnboardingWindow.cs: WindowEx host with ReactorHostControl, Mica backdrop, 720x752
- OnboardingApp.cs: Root Reactor component with UseNavigation, step indicator, back/next
- OnboardingState.cs: Shared state with mode-dependent page order (matches macOS flow)
- WelcomePage.cs: Page 0 - welcome title + security notice card
- ConnectionPage.cs: Page 1 - local/remote/later gateway selection
- ReadyPage.cs: Page 9 - feature summary with emoji rows
- Placeholder stubs for Wizard/Permissions/Chat pages (Phase 3)
- Full solution builds clean via build.ps1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Wire first-run detection and tray menu to OnboardingWindow

- First-run: ShowOnboardingAsync() replaces ShowSetupWizardAsync() in OnLaunched
- Tray menu: 'setup' action now opens OnboardingWindow instead of SetupWizardWindow
- OnboardingCompleted event mirrors existing SetupCompleted reconnection logic
- Old ShowSetupWizardAsync() preserved for backward compatibility
- Full build passes clean

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Remove SetupWizardWindow, redirect all call sites to OnboardingWindow

- Remove ShowSetupWizardAsync() and _setupWizard field
- Redirect deep link OpenSetup handler to ShowOnboardingAsync()
- SetupWizardWindow.cs retained but no longer wired from App.xaml.cs
- All build.ps1 targets pass clean

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Sprint 1: Enhanced pages + shared widgets (4 parallel tasks)

Welcome Page (op-dlw):
- Lobster icon, security warning card with ⚠️, trust model bullet points
- Two-card layout (orange warning + gray trust explanation)

Connection Page (op-24b):
- Local/Remote/Later radio choices with ●/○ indicators and emoji icons
- Conditional gateway URL + token fields for Local/Remote modes
- Local pre-fills ws://localhost:18789, Test Connection button
- Two-way binding to OnboardingState and SettingsManager

Ready Page (op-qrh):
- 🎉 celebration icon, mode-specific info card
- Feature action rows with icon + title + subtitle
- Launch at Login toggle
- Configure Later / Remote info cards

Shared Widgets (op-5xl):
- OnboardingCard: Rounded card with white background
- FeatureRow: Icon + title + subtitle row component
- StepIndicator: Dot-based navigation indicator
- GlowingIcon: 🦞 lobster icon (animation-ready)

All 4 tasks implemented in parallel. Full build passes clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* OnboardingApp nav + localization (27 keys × 5 locales)

OnboardingApp (op-fix):
- Integrated GlowingIcon header and StepIndicator widget
- Layout matches macOS: icon → page content → nav bar
- Phase 3 placeholder pages with clear labels

Localization (op-4jl):
- 27 onboarding keys added to all 5 locale .resw files
- en-us, fr-fr, nl-nl, zh-cn, zh-tw
- Covers: title, nav buttons, welcome, connection, ready pages

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Sprint 2+3: All pages + polish (6 parallel tasks)

Wizard Page (op-y0w):
- Native offline fallback: gateway URL, token, node mode toggle
- Test Connection button with status feedback
- TODO comments for future WebSocket RPC integration

Permissions Page (op-9mr):
- 5 Windows permissions: Notifications, Camera, Mic, Screen Capture, Location
- Status indicators (✅/⚪) with Open Settings buttons
- Status message area for feedback

Chat Page (op-e38):
- 'Meet your Agent' MVP chat UI
- Agent welcome bubble (blue) + user message bubbles (gray)
- Text input + Send button, footer note about full WebView2 integration

Mica + Theming (op-dl8):
- Non-resizable window via OverlappedPresenter
- Mica backdrop confirmed, window size matches spec

Page Transitions (op-xh9):
- Spring slide transition on NavigationHost (dampingRatio: 0.86)
- Matches macOS interactiveSpring(response: 0.5, dampingFraction: 0.86)

Accessibility (op-61d):
- To be enhanced in Sprint 4 integration pass

All pages wired into OnboardingApp. Full build passes clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* WizardStepView renderer + integration validation

WizardStepView (op-2oj):
- Dynamic renderer for all 7 gateway RPC step types
- Note, Text (with Sensitive/password), Confirm, Select, MultiSelect, Progress, Action
- WizardStepProps record + WizardStepType enum
- Switch expression renders type-appropriate UI with OnSubmit callback

Integration (op-28l):
- Solution file already includes OpenClawTray.Infrastructure (done in Sprint 0)
- build.ps1 builds WinUI with ProjectReference chain — no changes needed
- All 774 tests pass (652 Shared + 122 Tray, 0 failures)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add onboarding unit tests (13 new tests, 135 total Tray tests)

OnboardingStateTests:
- GetPageOrder: Local includes Wizard, Remote excludes it, Later is minimal
- GetPageOrder: NoChat mode excludes Chat for all modes
- GetPageOrder: Always starts with Welcome, ends with Ready
- Defaults: Mode=Local, ShowChat=true
- Complete: fires Finished event, calls Settings.Save()

All 774+ tests pass (652 Shared + 135 Tray, 0 failures).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add WizardStepProps and WizardStepType unit tests

Tests WizardStepType enum (7 values) and WizardStepProps record defaults.
All 145 Tray tests pass (0 failures).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add inner-loop dev scripts for testing onboarding UX

- dev-loop.ps1: Build + kill + launch cycle with -Clean (first-run) and -Tail (logs)
- test-sandbox.wsb: Windows Sandbox config with mapped build output for clean-state testing
- setup-sandbox-network.ps1: Port proxy setup for sandbox-to-WSL gateway connectivity

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix NullRef on first render, duplicate lobster, Border(null!) crash

- OnboardingWindow: use ctx.UseState(state) in mount function for props persistence
- WelcomePage: remove duplicate lobster icon (OnboardingApp header has the persistent one)
- StepIndicator: Border(TextBlock('')) instead of Border(null!) to avoid runtime NullRef

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix nav bar positioning + visual test framework + bug fixes

Nav bar fix:
- Fixed NavigationHost height to 520px so nav bar stays at consistent position
- All pages render within the same content area, nav bar never jumps
- Replaced Spring transition with 200ms Slide (prevents overlap on fast navigation)
- Compacted WelcomePage: merged security+trust cards, reduced font sizes
- Reduced GlowingIcon from 64px to 48px, tightened margins

Bug fixes:
- Fixed NullRef on first render (ctx.UseState for mount props persistence)
- Fixed duplicate lobster icon (removed from WelcomePage, kept in OnboardingApp header)
- Fixed Border(null!) crash in StepIndicator

Visual test framework:
- visual-test.ps1: P/Invoke window finding + UIAutomation button clicking
- Screenshot capture via PrintWindow/CopyFromScreen (note: GDI capture fails on Dev Box/Cloud PC)
- Baseline + after screenshots in visual-test-output/

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* SlideInOnly transition + RenderTargetBitmap visual capture

SlideInOnlyTransition (NavigationTransition.cs + TransitionEngine.cs):
- New transition type: instantly hides old page (opacity=0), slides+fades new in
- Direction auto-reverses on back nav (Push=right, Pop=left)
- 200ms duration with cubic-bezier easing
- Zero flicker — old page is invisible before new one starts animating

RenderTargetBitmap visual capture (OnboardingWindow.cs):
- In-app capture via WinUI RenderTargetBitmap API
- Works on Dev Box/Cloud PC (no physical display needed)
- Triggered by OPENCLAW_VISUAL_TEST=1 env var
- Auto-captures on initial load and every page navigation (PageChanged event)
- Saves PNGs to OPENCLAW_VISUAL_TEST_DIR
- All 6 pages validated via LLM visual analysis

OnboardingState.cs:
- Added PageChanged event for capture integration

All 145 tests pass. Full build clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix Permissions page button alignment: use Grid layout for right-aligned buttons

Changed PermissionRow from HStack to Grid with ['1*', 'Auto'] columns so
'Open Settings' buttons are consistently right-aligned and stacked vertically,
matching the pattern used in ConnectionPage.cs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix Permissions page: left-align status emojis in own column

Move status emojis (✅, ❌, ⚠️) from inline with permission name into
a dedicated Grid column 0 with Auto width. Changes the row Grid from
2 columns [1*, Auto] to 3 columns [Auto, 1*, Auto]:
- Column 0: Status emoji, fixed width, left-aligned
- Column 1: Permission icon + name + description, fills remaining
- Column 2: Open Settings button, right-aligned

This ensures all status emojis form a clean vertical line.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* wip: latest onboarding fixes pre-upstream-merge

Checkpoint of in-progress work before merging origin/master to pick up
GatewayTopologyClassifier, SshTunnelCommandLine, SshTunnelService, and
updated SettingsWindow connection logic.

Includes:
- Permissions page alignment fixes
- ConnectionPage gateway auth + pairing flow
- New onboarding services (GatewayHealthCheck, InputValidator,
  LocalGatewayApprover, PermissionChecker, SetupCodeDecoder,
  WizardStepParser)
- Tests for those services
- Localization keys across 5 locales
- Inner-loop dev scripts and e2e helpers
- Onboarding + auth-fix proposal docs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(onboarding): redesign Connection page to match new UX mockup

Implements the redesigned Connection page from connection-page-mockup.html:

- Five gateway modes (was three): Local / WSL / Remote / SSH Tunnel /
  Configure Later. WSL and SSH are added to ConnectionMode and reuse
  the Local page-order in OnboardingState.GetPageOrder().
- Setup Code row gains explicit Paste and QR-import buttons in addition
  to the existing focus-paste behavior. QR decoding is extracted from
  SetupWizardWindow into a reusable Helpers/QrSetupCodeReader so it can
  be invoked from Reactor pages without depending on the wizard window.
- Animated SSH panel renders inline when SSH mode is selected: 2x2 grid
  of SSH User / Host / Remote Port / Local Port plus a live preview line
  generated via SshTunnelCommandLine.BuildArguments(...). Settings are
  written through to SettingsManager.SshTunnel*. App gains a
  EnsureSshTunnelStarted() shim so TestConnection can spin up the
  managed tunnel before health-checking ws://127.0.0.1:<localPort>.
- Topology detection line renders the GatewayTopologyClassifier output
  (DisplayName/Transport/Detail) live as the user changes modes / SSH
  fields, matching the mockup's '● Detected: ...' line.
- Page content is wrapped in a ScrollView and the onboarding window is
  resized to 720x900 to fit the additional rows in the SSH layout.
- App exposes GetOnboardingWindowHandle() so the QR FileOpenPicker can
  initialize against the onboarding HWND.
- Two new optional environment variables aid visual testing without
  requiring UI automation:
    * OPENCLAW_ONBOARDING_START_ROUTE = <OnboardingRoute name>
    * OPENCLAW_ONBOARDING_START_MODE  = <ConnectionMode name>

Adds new locale keys for the SSH/WSL/QR/Topology surface in all five
locales (en-us authoritative; fr-fr, nl-nl, zh-cn, zh-tw machine-
translated and flagged for human review in the PR description).

Adds tests/OpenClaw.Tray.Tests/ConnectionPageTopologyTests.cs covering:
- 5-mode page-order parity (Wsl/Ssh behave like Local).
- GatewayTopologyClassifier outputs for the canonical mode→URL mapping.
- SshTunnelCommandLine preview includes both forwards (gateway +
  browser-proxy +2) and validates user/host.

Validation (per AGENTS.md):
- ./build.ps1: all projects succeed.
- dotnet test Shared:  967 passed / 20 skipped / 0 failed.
- dotnet test Tray:    350 passed / 0 failed (8 new).
- Visual capture in OPENCLAW_VISUAL_TEST mode for both Local and SSH
  modes; matches mockup layout.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* security: remove hardcoded WSL gateway dev token from e2e test

The fallback token was a dev-gateway secret that got flagged by GitHub secret scanning. Token now must come from WSL openclaw.json (preferred) or OPENCLAW_GATEWAY_TOKEN env var; the script fails fast if neither is available.

Note: the leaked token should be rotated by regenerating the dev gateway config in WSL.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(infra): prune unused Reactor modules (Charting/Data/Yoga/FlexPanel/DataGrid/PropertyGrid)

Per PR feedback: the tray only uses Core/Hosting/Navigation/Elements/Hooks/
Animation/Markdown/Accessibility/Input from the Reactor snap. Removed:
- Charting/ (D3 charts not used by onboarding)
- Data/ (datasource/grid binding not used)
- Yoga/ (FlexPanel not used; tray uses StackElement-based HStack/VStack)
- Controls/DataGrid (cascading: depends on Data+Charting)
- Controls/PropertyGrid (cascading: depends on Data)
- Pruned Yoga/FlexPanel hooks from Core/Element.cs, ElementPool.cs,
  Reconciler.Mount.cs, Reconciler.Update.cs, Elements/Dsl.cs, ElementExtensions.cs
- Pruned Charting hooks from Core/AccessibilityScanner.cs and Hosting/ReactorHost.cs
- Removed UseDataSource from Core/Component.cs
- Removed FieldDescriptor overload from Controls/Validation/FormField.cs
- Removed ResizeGripRegistration call sites (lived in DataGrid)

Build clean. Tray tests 350/350 pass. Shared tests 967/967 pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor(onboarding): replace Reactor snap with FunctionalUI helper

Replace the vendored Reactor-derived infrastructure project with a tiny OpenClaw-owned FunctionalUI helper layer used by onboarding.

Remove unused charting, data, markdown, devtools, validation, localization, input, animation, and broad control infrastructure from the PR.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: remove local workflow files from onboarding PR

Remove Beads and Gastown hook files so the tray onboarding PR only contains product UI changes and required app support.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: remove extraneous artifacts from onboarding PR

Remove local visual outputs, sandbox/provisioning scripts, e2e scratch automation, and upstream planning docs from the tray onboarding PR.

Keep the remaining changes focused on the product onboarding flow and supporting app code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix tray onboarding runtime issues

Remove inconsistent gray onboarding panels, stabilize connection mode selection, fix FunctionalUI reparenting during conditional renders, and add runtime hooks needed for tray window capture and WebChat error rendering.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address onboarding pairing feedback

Remove the local gateway auto-approval shortcut and use the existing pairing command copy/notification flow instead. Also scope bootstrap operator handshakes to the gateway handoff profile, skip Chat for Configure Later, and dispose onboarding state safely.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Constrain bootstrap auth to onboarding setup codes

Keep the default gateway client auth payload and chat URL construction aligned with the existing tray app, while allowing onboarding setup-code handoff to opt into bootstrap auth scopes explicitly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Tighten gateway security follow-ups

Preserve MCP-only onboarding completion routing, remove the unused public connect auth token getter, and add regression coverage for default operator scopes and paired bootstrap handoff auth.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Mike Harsh <mharsh@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…#250)

* feat: winnode CLI for invoking node commands over local MCP

Mirrors `openclaw nodes invoke`'s flag surface but routes to the local
tray's MCP HTTP server (default http://127.0.0.1:8765/) instead of the
gateway. `--node` and `--idempotency-key` are accepted for paste-from-
gateway parity and ignored.

Ships skill.md alongside winnode.exe documenting every supported
command, argument schema, and the A2UI v0.8 JSONL grammar for agent use.

Tests: 62 cases, 100% line/branch on CliRunner via in-process unit tests
plus a loopback HttpListener fake that exercises the full HTTP path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): gate MCP readiness on token-bearing client

InitializeAsync would return ready as soon as `GET /` returned 200, even
if `mcp-token.txt` had not been read yet. Against a tray binary built
before the auth-before-dispatch hardening (where `GET /` answers 200
without auth), this raced ahead and handed back a tokenless `Client` —
every subsequent POST then 401'd. Restructure the loop to require both
the token-on-disk and a 200 from a token-bearing GET before declaring
ready.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(winnode): auto-load MCP bearer token

The CLI now sends `Authorization: Bearer <token>` on every MCP request,
without the user having to plumb the token themselves. Resolution chain
mirrors the per-tool secret convention (gh, az, anthropic):

  1. `--mcp-token <literal>` flag
  2. `OPENCLAW_MCP_TOKEN` env var (literal)
  3. `mcp-token.txt` under `$OPENCLAW_TRAY_DATA_DIR` if set, else
     `%APPDATA%\OpenClawTray\` — the same location SettingsManager
     points the tray at, so a sandboxed tray is found automatically.

When the token comes from disk, run `McpAuthToken.VerifyAcl` (the same
hygiene check `NodeService.StartMcpServer` runs at startup) and route
any owner/DACL warning to stderr so the user knows to rotate. `--verbose`
reports the resolved auth source without echoing the secret value.

Tests redirect via `OPENCLAW_TRAY_DATA_DIR` to a temp sandbox dir so they
don't pick up the developer machine's real tray token.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(winnode): apply 19 review findings (F-01..F-21)

Hardens the winnode CLI against the threat model in
C:/temp/winnode-cli-review-2026-04-30/01-findings.md. F-15 (port-0 nit)
was approved as no-action; F-17 was a positive observation.

- F-01/F-09: validate --mcp-url; refuse auto-loaded token off-loopback
- F-02: explicit SocketsHttpHandler with AllowAutoRedirect=false
- F-03: cap response body at 16 MiB with explicit overflow message
- F-04: warn unconditionally when --mcp-token is used (process-listing leak)
- F-05: warn unconditionally when --idempotency-key is supplied
- F-06: TokenLooksValid ASCII-printable check; ignore corrupt tokens
- F-07: don't echo full token-file path in --verbose
- F-08: canonicalize OPENCLAW_TRAY_DATA_DIR; reject symlink redirect
- F-10: RunAsyncTests is now IDisposable (cleans up sandbox dir)
- F-11: SkillMdDriftTests + REGENERATE-ME header in skill.md;
        McpToolBridge.KnownCommands exposes the canonical command set;
        skill.md re-synced with live capability surface
- F-12: --params @<path> loads JSON object from disk
- F-13: Token_file_with_wide_acl_emits_warn (Windows-only, gracefully
        skips when SetAccessControl is denied by hardened CI)
- F-14: BuildToolsCallBody returns (byte[], int) consumed by
        ByteArrayContent without a string round-trip
- F-16+F-21: SanitizeForStderr strips control chars, redacts ≥32-char
        base64url runs, caps at 4 KiB, default-quiet first-line-only,
        full sanitized body under --verbose
- F-18: --invoke-timeout capped at 600000 ms; long arithmetic on the
        +5000 buffer; out-of-range exits 2
- F-19: --mcp-port and OPENCLAW_MCP_PORT bounded [1, 65535]; env-var
        out-of-range falls back to default with a verbose warning
- F-20: distinguish missing/empty/unreadable/loaded token-file states;
        unreadable exits 1 with a diagnostic before any HTTP traffic

Tests: 23 added (115/115 pass). All other suites stay green
(Shared 1046/1066, Tray 245/245, Integration 18/18, UI 62/62).
WinNode CLI line coverage: 91.6% (434/474 in Program.cs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prevent tray onboarding tests from reading real user settings by allowing SettingsManager to use an explicit settings directory and using temp settings in onboarding tests. Document the isolation rule for future agents.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…icy (openclaw#247)

ValidateExecApprovalRules rejected single '*' but missed patterns like
'**', '***', '?', '? *', '* ?' that also match any command string.

An agent that can call system.execApprovals.set could bypass the
broad-allow restriction by submitting '**' as an allow pattern:
  {"rules": [{"pattern": "**", "action": "allow"}], "baseHash": "..."}

The glob-to-regex translation turns '**' into '^.*.*$', which matches
every command, exactly like '*' does.

Fix: strip all wildcard chars ('*', '?') and whitespace from the
normalised pattern before checking. If nothing remains the pattern is
an all-wildcard glob and is rejected as broad.  The explicit shell-
prefix checks (powershell *, pwsh *, cmd *, cmd.exe *) are preserved
for patterns that contain meaningful content but are still too broad.

Tests: add **,  ***, ?, '? *', '* ?' to ExecApprovalsSet_RejectsUnsafeAllowRules.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…provals.set (openclaw#255)

ValidateExecApprovalRules previously checked for dangerous fragments that end
with a trailing space (e.g. "rm ") but missed the case where the wildcard
character replaces the space — e.g. "rm*" passes the "rm " fragment check yet
matches "rm -rf /" via the ^rm.*$ regex, effectively bypassing the intended
block.

Fix: for each dangerous fragment that has trailing whitespace, also reject
patterns containing the trimmed stem followed directly by * or ?.

Before:
  { "pattern": "rm*", "action": "allow" }  → accepted, allows "rm -rf /"
  { "pattern": "del*", "action": "allow" } → accepted, allows "del /s /q C:\\"

After:
  { "pattern": "rm*", "action": "allow" }  → rejected ("Dangerous allow rule…")
  { "pattern": "del*", "action": "allow" } → rejected

Adds 7 InlineData regression tests covering: rm*, rm?, del*, del?,
remove-item*, shutdown*, net*.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ge coverage gaps (openclaw#245)

- SshTunnelCommandLine: 7 new tests covering CanForwardBrowserProxyPort
  boundary values and BuildArguments whitespace trimming
- ExecApprovalV2Result: test ToString() includes code and reason
- McpToolBridge: test custom serverName/serverVersion via constructor;
  test that null arguments value is accepted (not just missing arguments)

All tests pass (Shared + Tray).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nclaw#251)

* refactor(tray): remove unused BuildTrayMenuFlyout

Method was never called. Active tray menu is driven by
BuildTrayMenuPopup(TrayMenuWindow) via ShowTrayMenuPopup().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(tray): remove legacy unused BuildTrayMenu

Method was explicitly marked "for reference" in a comment but never
called. BuildTrayMenuPopup(TrayMenuWindow) is the active implementation.
Removes the comment and the entire method body.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: AlexAlves87 <alexalves87@github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…penclaw#243)

- Remove System.Linq import; replace FirstOrDefault with foreach loop
  in HandleToolsCallAsync — avoids delegate allocation on every tool call
- Replace ms.ToArray() with ms.GetBuffer() + slice in WriteResult and
  WriteError — avoids copying the byte array before UTF-8 decoding

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Windows node tts.speak command (tracked by #252) by introducing a shared TTS capability plus Windows/ElevenLabs playback implementations, settings/UI, MCP advertisement text, command-center grouping, docs updates, and targeted tests.

Changes:

  • Introduces TtsCapability (tts.speak) in Shared and wires it into the Windows tray node when enabled.
  • Adds Windows built-in speech synthesis + optional ElevenLabs client, plus persisted settings (with DPAPI protection for the ElevenLabs API key).
  • Updates MCP tool descriptions, Command Center command-group classification, docs/README, and adds Shared/Tray tests.
Show a summary per file
File Description
tests/OpenClaw.Tray.Tests/TrayMenuWindowMarkupTests.cs Verifies new Settings UI elements exist (automation IDs for TTS controls).
tests/OpenClaw.Tray.Tests/SettingsRoundTripTests.cs Adds settings round-trip/defaulting assertions for new TTS settings + DPAPI protect/unprotect tests.
tests/OpenClaw.Tray.Tests/OpenClaw.Tray.Tests.csproj Adds DPAPI package + links SettingsManager and ElevenLabs client into the test project.
tests/OpenClaw.Tray.Tests/ElevenLabsTextToSpeechClientTests.cs Adds unit tests for ElevenLabs request construction, validation, error formatting, and timeout.
tests/OpenClaw.Shared.Tests/ModelsTests.cs Validates tts.speak is classified as dangerous and excluded from Mac parity set.
tests/OpenClaw.Shared.Tests/McpToolBridgeTests.cs Ensures MCP tools/list returns a curated description for tts.speak.
tests/OpenClaw.Shared.Tests/CapabilityTests.cs Adds TtsCapability execution/validation tests (required args, length guard, handler behavior).
src/OpenClaw.Tray.WinUI/Windows/SettingsWindow.xaml.cs Loads/saves new TTS settings and toggles ElevenLabs settings UI visibility based on provider.
src/OpenClaw.Tray.WinUI/Windows/SettingsWindow.xaml Adds TTS toggle, provider selection, and ElevenLabs API key/voice/model inputs.
src/OpenClaw.Tray.WinUI/Services/TextToSpeech/TextToSpeechService.cs Implements Windows/ElevenLabs playback and interrupt semantics via MediaPlayer gating.
src/OpenClaw.Tray.WinUI/Services/TextToSpeech/ElevenLabsTextToSpeechClient.cs Adds ElevenLabs HTTP client with validation, timeout, and error message construction.
src/OpenClaw.Tray.WinUI/Services/SettingsManager.cs Persists TTS settings; protects ElevenLabs API key at rest using DPAPI (dpapi: prefix).
src/OpenClaw.Tray.WinUI/Services/NodeService.cs Registers TTS capability/service when enabled; disables tts.* commands when not enabled.
src/OpenClaw.Tray.WinUI/OpenClaw.Tray.WinUI.csproj Adds DPAPI package dependency required for protected settings secrets.
src/OpenClaw.Shared/SettingsData.cs Adds shared DTO fields for TTS settings persistence.
src/OpenClaw.Shared/Models.cs Adds tts.speak to dangerous command grouping and adjusts Mac parity command list.
src/OpenClaw.Shared/Mcp/McpToolBridge.cs Adds curated MCP description for tts.speak tool.
src/OpenClaw.Shared/Capabilities/TtsCapability.cs Adds new node capability handling tts.speak with validation, guardrails, and event hook.
docs/gateway-node-integration.md Documents adding tts.speak to the gateway allowlist.
docs/WINDOWS_NODE_TESTING.md Documents tts.speak requirements and capability advertisement behavior.
docs/MCP_MODE.md Updates MCP tool list summary to include tts.speak.
README.md Updates node capability table, allowlist guidance, and adds a manual tts.speak invocation example.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (2)

src/OpenClaw.Tray.WinUI/Services/TextToSpeech/TextToSpeechService.cs:207

  • TextToSpeechService implements IDisposable but doesn’t dispose owned disposable resources like _playbackGate (and it also doesn’t attempt to stop/dispose an _activePlayer directly during shutdown). Please dispose the semaphore (and ensure any active MediaPlayer is stopped/disposed deterministically) to avoid leaking WinRT/media resources across node restarts/shutdown.
    public void Dispose()
    {
        InterruptActivePlayback();
        // Playback may still release the gate after an interrupt during shutdown.
        _elevenLabsClient.Dispose();
    }

src/OpenClaw.Shared/Models.cs:1057

  • MacNodeParityCommands used to include .. DangerousCommands, but now it hard-codes a subset to exclude tts.speak. This increases the chance that future dangerous commands won’t be covered by parity diagnostics unless someone remembers to update this list. Consider constructing the Mac parity list from DangerousCommands with an explicit exclusion for tts.speak (or adding a clear comment explaining why dangerous commands must be enumerated manually).
    public static readonly string[] MacNodeParityCommands =
    [
        .. SafeCompanionCommands,
        "camera.snap",
        "camera.clip",
        "screen.record",
        "system.notify",
        "system.run",
        "system.which",
        "browser.proxy"
    ];
  • Files reviewed: 22/22 changed files
  • Comments generated: 1

Comment on lines +40 to +66
public async Task<TtsSpeakResult> SpeakAsync(TtsSpeakArgs args, CancellationToken cancellationToken = default)
{
var provider = TtsCapability.ResolveProvider(args.Provider, _settings.TtsProvider);
var stopwatch = Stopwatch.StartNew();

if (string.Equals(provider, TtsCapability.WindowsProvider, StringComparison.OrdinalIgnoreCase))
{
await SpeakWithWindowsAsync(args, cancellationToken).ConfigureAwait(false);
}
else if (string.Equals(provider, TtsCapability.ElevenLabsProvider, StringComparison.OrdinalIgnoreCase))
{
await SpeakWithElevenLabsAsync(args, cancellationToken).ConfigureAwait(false);
}
else
{
throw new InvalidOperationException($"Unsupported TTS provider '{provider}'.");
}

stopwatch.Stop();
return new TtsSpeakResult
{
Provider = provider,
ContentType = string.Equals(provider, TtsCapability.ElevenLabsProvider, StringComparison.OrdinalIgnoreCase)
? "audio/mpeg"
: "audio/wav",
DurationMs = (int)Math.Min(stopwatch.ElapsedMilliseconds, int.MaxValue)
};
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

durationMs is currently derived from a wall-clock stopwatch around synthesis + playback (and for ElevenLabs also includes network latency). This makes the response field misleading if callers expect audio duration. Consider either computing playback duration from MediaPlayer.PlaybackSession.NaturalDuration/position, or renaming the field (and docs) to something like elapsedMs to match its semantics.

This issue also appears on line 202 of the same file.

Copilot uses AI. Check for mistakes.
github-actions Bot and others added 2 commits May 1, 2026 10:39
…w#238)

Three McpHttpServerTests were failing on Linux:

1. Post_WithLocalhostHost_Accepted — HttpListener on Linux rejects
   requests with Host: localhost when only http://127.0.0.1:port/ is
   registered as a prefix (404 before reaching application code). Fix:
   also register http://localhost:port/ so clients connecting via the
   hostname form are served.

2. Post_WithRebindHost_RejectedWithForbidden — With the dual-prefix
   registration, Host: evil.com still doesn't match, but Linux returns
   404 (HttpListener filter) rather than 403 (application code). Both
   are valid rejections; relax assertion to NotEqual(OK).

3. Post_OversizedBody_RejectedWithRequestTooLarge — When the server
   sends 413 and closes the connection before the client finishes
   uploading 5 MiB, Linux surfaces a broken-pipe SocketException rather
   than letting the client see the response status. Catch the
   SocketException path as an equivalent rejection outcome.

All 15 McpHttpServer tests now pass on Linux (967 pass, 20 skip).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolve SettingsManager test-isolation conflict and remove the duplicate Tray test compile item after rebasing openclaw#253.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@shanselman
Copy link
Copy Markdown
Contributor

Thanks, @RBrid. This focused TTS slice is a good way to land the first stable piece of Windows voice support.

This overlaps with the broader Voice Mode direction in #120 from @NichUK, so I’d like to treat this as the foundation layer: land tts.speak first, then have the remaining STT / Talk Mode / repeater / UX work build on top in smaller follow-up PRs.

I pushed a maintainer rebase commit that keeps this branch current with master, preserves the existing SettingsManager test-isolation hook, and removes the duplicate Tray test compile item. Local validation is green:

  • ./build.ps1
  • dotnet test ./tests/OpenClaw.Shared.Tests/OpenClaw.Shared.Tests.csproj --no-restore
  • dotnet test ./tests/OpenClaw.Tray.Tests/OpenClaw.Tray.Tests.csproj --no-restore

@shanselman shanselman merged commit e0c4098 into openclaw:master May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add Windows node text-to-speech command (tts.speak)

6 participants