Add Windows node text-to-speech#253
Conversation
Implements phase 1 support for the tts.speak command tracked by openclaw#252. Adds the shared TTS capability, Windows and ElevenLabs playback paths, Settings UI/persistence, gateway/MCP advertisement, Command Center classification, docs, and tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Snap Reactor framework as OpenClawTray.Infrastructure - Copy microsoft/microsoft-ui-reactor src/Reactor/ (249 C# files, 12 modules) - Rename namespace Microsoft.UI.Reactor -> OpenClawTray.Infrastructure - Create OpenClawTray.Infrastructure.csproj (net10.0, WinAppSDK 1.8) - Add ProjectReference from OpenClaw.Tray.WinUI - Add project to moltbot-windows-hub.slnx - Fix C# 14 field keyword conflict in ValidationContext.cs - Exclude ReactorApplication.xaml (library mode, host app owns Application) - Update global.json rollForward to latestMajor - Full solution builds clean (0 errors) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Implement OnboardingWindow host with Reactor pages - OnboardingWindow.cs: WindowEx host with ReactorHostControl, Mica backdrop, 720x752 - OnboardingApp.cs: Root Reactor component with UseNavigation, step indicator, back/next - OnboardingState.cs: Shared state with mode-dependent page order (matches macOS flow) - WelcomePage.cs: Page 0 - welcome title + security notice card - ConnectionPage.cs: Page 1 - local/remote/later gateway selection - ReadyPage.cs: Page 9 - feature summary with emoji rows - Placeholder stubs for Wizard/Permissions/Chat pages (Phase 3) - Full solution builds clean via build.ps1 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Wire first-run detection and tray menu to OnboardingWindow - First-run: ShowOnboardingAsync() replaces ShowSetupWizardAsync() in OnLaunched - Tray menu: 'setup' action now opens OnboardingWindow instead of SetupWizardWindow - OnboardingCompleted event mirrors existing SetupCompleted reconnection logic - Old ShowSetupWizardAsync() preserved for backward compatibility - Full build passes clean Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Remove SetupWizardWindow, redirect all call sites to OnboardingWindow - Remove ShowSetupWizardAsync() and _setupWizard field - Redirect deep link OpenSetup handler to ShowOnboardingAsync() - SetupWizardWindow.cs retained but no longer wired from App.xaml.cs - All build.ps1 targets pass clean Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Sprint 1: Enhanced pages + shared widgets (4 parallel tasks) Welcome Page (op-dlw): - Lobster icon, security warning card with⚠️ , trust model bullet points - Two-card layout (orange warning + gray trust explanation) Connection Page (op-24b): - Local/Remote/Later radio choices with ●/○ indicators and emoji icons - Conditional gateway URL + token fields for Local/Remote modes - Local pre-fills ws://localhost:18789, Test Connection button - Two-way binding to OnboardingState and SettingsManager Ready Page (op-qrh): - 🎉 celebration icon, mode-specific info card - Feature action rows with icon + title + subtitle - Launch at Login toggle - Configure Later / Remote info cards Shared Widgets (op-5xl): - OnboardingCard: Rounded card with white background - FeatureRow: Icon + title + subtitle row component - StepIndicator: Dot-based navigation indicator - GlowingIcon: 🦞 lobster icon (animation-ready) All 4 tasks implemented in parallel. Full build passes clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * OnboardingApp nav + localization (27 keys × 5 locales) OnboardingApp (op-fix): - Integrated GlowingIcon header and StepIndicator widget - Layout matches macOS: icon → page content → nav bar - Phase 3 placeholder pages with clear labels Localization (op-4jl): - 27 onboarding keys added to all 5 locale .resw files - en-us, fr-fr, nl-nl, zh-cn, zh-tw - Covers: title, nav buttons, welcome, connection, ready pages Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Sprint 2+3: All pages + polish (6 parallel tasks) Wizard Page (op-y0w): - Native offline fallback: gateway URL, token, node mode toggle - Test Connection button with status feedback - TODO comments for future WebSocket RPC integration Permissions Page (op-9mr): - 5 Windows permissions: Notifications, Camera, Mic, Screen Capture, Location - Status indicators (✅/⚪) with Open Settings buttons - Status message area for feedback Chat Page (op-e38): - 'Meet your Agent' MVP chat UI - Agent welcome bubble (blue) + user message bubbles (gray) - Text input + Send button, footer note about full WebView2 integration Mica + Theming (op-dl8): - Non-resizable window via OverlappedPresenter - Mica backdrop confirmed, window size matches spec Page Transitions (op-xh9): - Spring slide transition on NavigationHost (dampingRatio: 0.86) - Matches macOS interactiveSpring(response: 0.5, dampingFraction: 0.86) Accessibility (op-61d): - To be enhanced in Sprint 4 integration pass All pages wired into OnboardingApp. Full build passes clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * WizardStepView renderer + integration validation WizardStepView (op-2oj): - Dynamic renderer for all 7 gateway RPC step types - Note, Text (with Sensitive/password), Confirm, Select, MultiSelect, Progress, Action - WizardStepProps record + WizardStepType enum - Switch expression renders type-appropriate UI with OnSubmit callback Integration (op-28l): - Solution file already includes OpenClawTray.Infrastructure (done in Sprint 0) - build.ps1 builds WinUI with ProjectReference chain — no changes needed - All 774 tests pass (652 Shared + 122 Tray, 0 failures) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add onboarding unit tests (13 new tests, 135 total Tray tests) OnboardingStateTests: - GetPageOrder: Local includes Wizard, Remote excludes it, Later is minimal - GetPageOrder: NoChat mode excludes Chat for all modes - GetPageOrder: Always starts with Welcome, ends with Ready - Defaults: Mode=Local, ShowChat=true - Complete: fires Finished event, calls Settings.Save() All 774+ tests pass (652 Shared + 135 Tray, 0 failures). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add WizardStepProps and WizardStepType unit tests Tests WizardStepType enum (7 values) and WizardStepProps record defaults. All 145 Tray tests pass (0 failures). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add inner-loop dev scripts for testing onboarding UX - dev-loop.ps1: Build + kill + launch cycle with -Clean (first-run) and -Tail (logs) - test-sandbox.wsb: Windows Sandbox config with mapped build output for clean-state testing - setup-sandbox-network.ps1: Port proxy setup for sandbox-to-WSL gateway connectivity Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix NullRef on first render, duplicate lobster, Border(null!) crash - OnboardingWindow: use ctx.UseState(state) in mount function for props persistence - WelcomePage: remove duplicate lobster icon (OnboardingApp header has the persistent one) - StepIndicator: Border(TextBlock('')) instead of Border(null!) to avoid runtime NullRef Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix nav bar positioning + visual test framework + bug fixes Nav bar fix: - Fixed NavigationHost height to 520px so nav bar stays at consistent position - All pages render within the same content area, nav bar never jumps - Replaced Spring transition with 200ms Slide (prevents overlap on fast navigation) - Compacted WelcomePage: merged security+trust cards, reduced font sizes - Reduced GlowingIcon from 64px to 48px, tightened margins Bug fixes: - Fixed NullRef on first render (ctx.UseState for mount props persistence) - Fixed duplicate lobster icon (removed from WelcomePage, kept in OnboardingApp header) - Fixed Border(null!) crash in StepIndicator Visual test framework: - visual-test.ps1: P/Invoke window finding + UIAutomation button clicking - Screenshot capture via PrintWindow/CopyFromScreen (note: GDI capture fails on Dev Box/Cloud PC) - Baseline + after screenshots in visual-test-output/ Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * SlideInOnly transition + RenderTargetBitmap visual capture SlideInOnlyTransition (NavigationTransition.cs + TransitionEngine.cs): - New transition type: instantly hides old page (opacity=0), slides+fades new in - Direction auto-reverses on back nav (Push=right, Pop=left) - 200ms duration with cubic-bezier easing - Zero flicker — old page is invisible before new one starts animating RenderTargetBitmap visual capture (OnboardingWindow.cs): - In-app capture via WinUI RenderTargetBitmap API - Works on Dev Box/Cloud PC (no physical display needed) - Triggered by OPENCLAW_VISUAL_TEST=1 env var - Auto-captures on initial load and every page navigation (PageChanged event) - Saves PNGs to OPENCLAW_VISUAL_TEST_DIR - All 6 pages validated via LLM visual analysis OnboardingState.cs: - Added PageChanged event for capture integration All 145 tests pass. Full build clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix Permissions page button alignment: use Grid layout for right-aligned buttons Changed PermissionRow from HStack to Grid with ['1*', 'Auto'] columns so 'Open Settings' buttons are consistently right-aligned and stacked vertically, matching the pattern used in ConnectionPage.cs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix Permissions page: left-align status emojis in own column Move status emojis (✅, ❌,⚠️ ) from inline with permission name into a dedicated Grid column 0 with Auto width. Changes the row Grid from 2 columns [1*, Auto] to 3 columns [Auto, 1*, Auto]: - Column 0: Status emoji, fixed width, left-aligned - Column 1: Permission icon + name + description, fills remaining - Column 2: Open Settings button, right-aligned This ensures all status emojis form a clean vertical line. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * wip: latest onboarding fixes pre-upstream-merge Checkpoint of in-progress work before merging origin/master to pick up GatewayTopologyClassifier, SshTunnelCommandLine, SshTunnelService, and updated SettingsWindow connection logic. Includes: - Permissions page alignment fixes - ConnectionPage gateway auth + pairing flow - New onboarding services (GatewayHealthCheck, InputValidator, LocalGatewayApprover, PermissionChecker, SetupCodeDecoder, WizardStepParser) - Tests for those services - Localization keys across 5 locales - Inner-loop dev scripts and e2e helpers - Onboarding + auth-fix proposal docs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(onboarding): redesign Connection page to match new UX mockup Implements the redesigned Connection page from connection-page-mockup.html: - Five gateway modes (was three): Local / WSL / Remote / SSH Tunnel / Configure Later. WSL and SSH are added to ConnectionMode and reuse the Local page-order in OnboardingState.GetPageOrder(). - Setup Code row gains explicit Paste and QR-import buttons in addition to the existing focus-paste behavior. QR decoding is extracted from SetupWizardWindow into a reusable Helpers/QrSetupCodeReader so it can be invoked from Reactor pages without depending on the wizard window. - Animated SSH panel renders inline when SSH mode is selected: 2x2 grid of SSH User / Host / Remote Port / Local Port plus a live preview line generated via SshTunnelCommandLine.BuildArguments(...). Settings are written through to SettingsManager.SshTunnel*. App gains a EnsureSshTunnelStarted() shim so TestConnection can spin up the managed tunnel before health-checking ws://127.0.0.1:<localPort>. - Topology detection line renders the GatewayTopologyClassifier output (DisplayName/Transport/Detail) live as the user changes modes / SSH fields, matching the mockup's '● Detected: ...' line. - Page content is wrapped in a ScrollView and the onboarding window is resized to 720x900 to fit the additional rows in the SSH layout. - App exposes GetOnboardingWindowHandle() so the QR FileOpenPicker can initialize against the onboarding HWND. - Two new optional environment variables aid visual testing without requiring UI automation: * OPENCLAW_ONBOARDING_START_ROUTE = <OnboardingRoute name> * OPENCLAW_ONBOARDING_START_MODE = <ConnectionMode name> Adds new locale keys for the SSH/WSL/QR/Topology surface in all five locales (en-us authoritative; fr-fr, nl-nl, zh-cn, zh-tw machine- translated and flagged for human review in the PR description). Adds tests/OpenClaw.Tray.Tests/ConnectionPageTopologyTests.cs covering: - 5-mode page-order parity (Wsl/Ssh behave like Local). - GatewayTopologyClassifier outputs for the canonical mode→URL mapping. - SshTunnelCommandLine preview includes both forwards (gateway + browser-proxy +2) and validates user/host. Validation (per AGENTS.md): - ./build.ps1: all projects succeed. - dotnet test Shared: 967 passed / 20 skipped / 0 failed. - dotnet test Tray: 350 passed / 0 failed (8 new). - Visual capture in OPENCLAW_VISUAL_TEST mode for both Local and SSH modes; matches mockup layout. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * security: remove hardcoded WSL gateway dev token from e2e test The fallback token was a dev-gateway secret that got flagged by GitHub secret scanning. Token now must come from WSL openclaw.json (preferred) or OPENCLAW_GATEWAY_TOKEN env var; the script fails fast if neither is available. Note: the leaked token should be rotated by regenerating the dev gateway config in WSL. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(infra): prune unused Reactor modules (Charting/Data/Yoga/FlexPanel/DataGrid/PropertyGrid) Per PR feedback: the tray only uses Core/Hosting/Navigation/Elements/Hooks/ Animation/Markdown/Accessibility/Input from the Reactor snap. Removed: - Charting/ (D3 charts not used by onboarding) - Data/ (datasource/grid binding not used) - Yoga/ (FlexPanel not used; tray uses StackElement-based HStack/VStack) - Controls/DataGrid (cascading: depends on Data+Charting) - Controls/PropertyGrid (cascading: depends on Data) - Pruned Yoga/FlexPanel hooks from Core/Element.cs, ElementPool.cs, Reconciler.Mount.cs, Reconciler.Update.cs, Elements/Dsl.cs, ElementExtensions.cs - Pruned Charting hooks from Core/AccessibilityScanner.cs and Hosting/ReactorHost.cs - Removed UseDataSource from Core/Component.cs - Removed FieldDescriptor overload from Controls/Validation/FormField.cs - Removed ResizeGripRegistration call sites (lived in DataGrid) Build clean. Tray tests 350/350 pass. Shared tests 967/967 pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor(onboarding): replace Reactor snap with FunctionalUI helper Replace the vendored Reactor-derived infrastructure project with a tiny OpenClaw-owned FunctionalUI helper layer used by onboarding. Remove unused charting, data, markdown, devtools, validation, localization, input, animation, and broad control infrastructure from the PR. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: remove local workflow files from onboarding PR Remove Beads and Gastown hook files so the tray onboarding PR only contains product UI changes and required app support. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: remove extraneous artifacts from onboarding PR Remove local visual outputs, sandbox/provisioning scripts, e2e scratch automation, and upstream planning docs from the tray onboarding PR. Keep the remaining changes focused on the product onboarding flow and supporting app code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix tray onboarding runtime issues Remove inconsistent gray onboarding panels, stabilize connection mode selection, fix FunctionalUI reparenting during conditional renders, and add runtime hooks needed for tray window capture and WebChat error rendering. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address onboarding pairing feedback Remove the local gateway auto-approval shortcut and use the existing pairing command copy/notification flow instead. Also scope bootstrap operator handshakes to the gateway handoff profile, skip Chat for Configure Later, and dispose onboarding state safely. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Constrain bootstrap auth to onboarding setup codes Keep the default gateway client auth payload and chat URL construction aligned with the existing tray app, while allowing onboarding setup-code handoff to opt into bootstrap auth scopes explicitly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Tighten gateway security follow-ups Preserve MCP-only onboarding completion routing, remove the unused public connect auth token getter, and add regression coverage for default operator scopes and paired bootstrap handoff auth. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Mike Harsh <mharsh@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…#250) * feat: winnode CLI for invoking node commands over local MCP Mirrors `openclaw nodes invoke`'s flag surface but routes to the local tray's MCP HTTP server (default http://127.0.0.1:8765/) instead of the gateway. `--node` and `--idempotency-key` are accepted for paste-from- gateway parity and ignored. Ships skill.md alongside winnode.exe documenting every supported command, argument schema, and the A2UI v0.8 JSONL grammar for agent use. Tests: 62 cases, 100% line/branch on CliRunner via in-process unit tests plus a loopback HttpListener fake that exercises the full HTTP path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): gate MCP readiness on token-bearing client InitializeAsync would return ready as soon as `GET /` returned 200, even if `mcp-token.txt` had not been read yet. Against a tray binary built before the auth-before-dispatch hardening (where `GET /` answers 200 without auth), this raced ahead and handed back a tokenless `Client` — every subsequent POST then 401'd. Restructure the loop to require both the token-on-disk and a 200 from a token-bearing GET before declaring ready. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(winnode): auto-load MCP bearer token The CLI now sends `Authorization: Bearer <token>` on every MCP request, without the user having to plumb the token themselves. Resolution chain mirrors the per-tool secret convention (gh, az, anthropic): 1. `--mcp-token <literal>` flag 2. `OPENCLAW_MCP_TOKEN` env var (literal) 3. `mcp-token.txt` under `$OPENCLAW_TRAY_DATA_DIR` if set, else `%APPDATA%\OpenClawTray\` — the same location SettingsManager points the tray at, so a sandboxed tray is found automatically. When the token comes from disk, run `McpAuthToken.VerifyAcl` (the same hygiene check `NodeService.StartMcpServer` runs at startup) and route any owner/DACL warning to stderr so the user knows to rotate. `--verbose` reports the resolved auth source without echoing the secret value. Tests redirect via `OPENCLAW_TRAY_DATA_DIR` to a temp sandbox dir so they don't pick up the developer machine's real tray token. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(winnode): apply 19 review findings (F-01..F-21) Hardens the winnode CLI against the threat model in C:/temp/winnode-cli-review-2026-04-30/01-findings.md. F-15 (port-0 nit) was approved as no-action; F-17 was a positive observation. - F-01/F-09: validate --mcp-url; refuse auto-loaded token off-loopback - F-02: explicit SocketsHttpHandler with AllowAutoRedirect=false - F-03: cap response body at 16 MiB with explicit overflow message - F-04: warn unconditionally when --mcp-token is used (process-listing leak) - F-05: warn unconditionally when --idempotency-key is supplied - F-06: TokenLooksValid ASCII-printable check; ignore corrupt tokens - F-07: don't echo full token-file path in --verbose - F-08: canonicalize OPENCLAW_TRAY_DATA_DIR; reject symlink redirect - F-10: RunAsyncTests is now IDisposable (cleans up sandbox dir) - F-11: SkillMdDriftTests + REGENERATE-ME header in skill.md; McpToolBridge.KnownCommands exposes the canonical command set; skill.md re-synced with live capability surface - F-12: --params @<path> loads JSON object from disk - F-13: Token_file_with_wide_acl_emits_warn (Windows-only, gracefully skips when SetAccessControl is denied by hardened CI) - F-14: BuildToolsCallBody returns (byte[], int) consumed by ByteArrayContent without a string round-trip - F-16+F-21: SanitizeForStderr strips control chars, redacts ≥32-char base64url runs, caps at 4 KiB, default-quiet first-line-only, full sanitized body under --verbose - F-18: --invoke-timeout capped at 600000 ms; long arithmetic on the +5000 buffer; out-of-range exits 2 - F-19: --mcp-port and OPENCLAW_MCP_PORT bounded [1, 65535]; env-var out-of-range falls back to default with a verbose warning - F-20: distinguish missing/empty/unreadable/loaded token-file states; unreadable exits 1 with a diagnostic before any HTTP traffic Tests: 23 added (115/115 pass). All other suites stay green (Shared 1046/1066, Tray 245/245, Integration 18/18, UI 62/62). WinNode CLI line coverage: 91.6% (434/474 in Program.cs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prevent tray onboarding tests from reading real user settings by allowing SettingsManager to use an explicit settings directory and using temp settings in onboarding tests. Document the isolation rule for future agents. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…icy (openclaw#247) ValidateExecApprovalRules rejected single '*' but missed patterns like '**', '***', '?', '? *', '* ?' that also match any command string. An agent that can call system.execApprovals.set could bypass the broad-allow restriction by submitting '**' as an allow pattern: {"rules": [{"pattern": "**", "action": "allow"}], "baseHash": "..."} The glob-to-regex translation turns '**' into '^.*.*$', which matches every command, exactly like '*' does. Fix: strip all wildcard chars ('*', '?') and whitespace from the normalised pattern before checking. If nothing remains the pattern is an all-wildcard glob and is rejected as broad. The explicit shell- prefix checks (powershell *, pwsh *, cmd *, cmd.exe *) are preserved for patterns that contain meaningful content but are still too broad. Tests: add **, ***, ?, '? *', '* ?' to ExecApprovalsSet_RejectsUnsafeAllowRules. Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…provals.set (openclaw#255) ValidateExecApprovalRules previously checked for dangerous fragments that end with a trailing space (e.g. "rm ") but missed the case where the wildcard character replaces the space — e.g. "rm*" passes the "rm " fragment check yet matches "rm -rf /" via the ^rm.*$ regex, effectively bypassing the intended block. Fix: for each dangerous fragment that has trailing whitespace, also reject patterns containing the trimmed stem followed directly by * or ?. Before: { "pattern": "rm*", "action": "allow" } → accepted, allows "rm -rf /" { "pattern": "del*", "action": "allow" } → accepted, allows "del /s /q C:\\" After: { "pattern": "rm*", "action": "allow" } → rejected ("Dangerous allow rule…") { "pattern": "del*", "action": "allow" } → rejected Adds 7 InlineData regression tests covering: rm*, rm?, del*, del?, remove-item*, shutdown*, net*. Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ge coverage gaps (openclaw#245) - SshTunnelCommandLine: 7 new tests covering CanForwardBrowserProxyPort boundary values and BuildArguments whitespace trimming - ExecApprovalV2Result: test ToString() includes code and reason - McpToolBridge: test custom serverName/serverVersion via constructor; test that null arguments value is accepted (not just missing arguments) All tests pass (Shared + Tray). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nclaw#251) * refactor(tray): remove unused BuildTrayMenuFlyout Method was never called. Active tray menu is driven by BuildTrayMenuPopup(TrayMenuWindow) via ShowTrayMenuPopup(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(tray): remove legacy unused BuildTrayMenu Method was explicitly marked "for reference" in a comment but never called. BuildTrayMenuPopup(TrayMenuWindow) is the active implementation. Removes the comment and the entire method body. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: AlexAlves87 <alexalves87@github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…penclaw#243) - Remove System.Linq import; replace FirstOrDefault with foreach loop in HandleToolsCallAsync — avoids delegate allocation on every tool call - Replace ms.ToArray() with ms.GetBuffer() + slice in WriteResult and WriteError — avoids copying the byte array before UTF-8 decoding Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new Windows node tts.speak command (tracked by #252) by introducing a shared TTS capability plus Windows/ElevenLabs playback implementations, settings/UI, MCP advertisement text, command-center grouping, docs updates, and targeted tests.
Changes:
- Introduces
TtsCapability(tts.speak) in Shared and wires it into the Windows tray node when enabled. - Adds Windows built-in speech synthesis + optional ElevenLabs client, plus persisted settings (with DPAPI protection for the ElevenLabs API key).
- Updates MCP tool descriptions, Command Center command-group classification, docs/README, and adds Shared/Tray tests.
Show a summary per file
| File | Description |
|---|---|
| tests/OpenClaw.Tray.Tests/TrayMenuWindowMarkupTests.cs | Verifies new Settings UI elements exist (automation IDs for TTS controls). |
| tests/OpenClaw.Tray.Tests/SettingsRoundTripTests.cs | Adds settings round-trip/defaulting assertions for new TTS settings + DPAPI protect/unprotect tests. |
| tests/OpenClaw.Tray.Tests/OpenClaw.Tray.Tests.csproj | Adds DPAPI package + links SettingsManager and ElevenLabs client into the test project. |
| tests/OpenClaw.Tray.Tests/ElevenLabsTextToSpeechClientTests.cs | Adds unit tests for ElevenLabs request construction, validation, error formatting, and timeout. |
| tests/OpenClaw.Shared.Tests/ModelsTests.cs | Validates tts.speak is classified as dangerous and excluded from Mac parity set. |
| tests/OpenClaw.Shared.Tests/McpToolBridgeTests.cs | Ensures MCP tools/list returns a curated description for tts.speak. |
| tests/OpenClaw.Shared.Tests/CapabilityTests.cs | Adds TtsCapability execution/validation tests (required args, length guard, handler behavior). |
| src/OpenClaw.Tray.WinUI/Windows/SettingsWindow.xaml.cs | Loads/saves new TTS settings and toggles ElevenLabs settings UI visibility based on provider. |
| src/OpenClaw.Tray.WinUI/Windows/SettingsWindow.xaml | Adds TTS toggle, provider selection, and ElevenLabs API key/voice/model inputs. |
| src/OpenClaw.Tray.WinUI/Services/TextToSpeech/TextToSpeechService.cs | Implements Windows/ElevenLabs playback and interrupt semantics via MediaPlayer gating. |
| src/OpenClaw.Tray.WinUI/Services/TextToSpeech/ElevenLabsTextToSpeechClient.cs | Adds ElevenLabs HTTP client with validation, timeout, and error message construction. |
| src/OpenClaw.Tray.WinUI/Services/SettingsManager.cs | Persists TTS settings; protects ElevenLabs API key at rest using DPAPI (dpapi: prefix). |
| src/OpenClaw.Tray.WinUI/Services/NodeService.cs | Registers TTS capability/service when enabled; disables tts.* commands when not enabled. |
| src/OpenClaw.Tray.WinUI/OpenClaw.Tray.WinUI.csproj | Adds DPAPI package dependency required for protected settings secrets. |
| src/OpenClaw.Shared/SettingsData.cs | Adds shared DTO fields for TTS settings persistence. |
| src/OpenClaw.Shared/Models.cs | Adds tts.speak to dangerous command grouping and adjusts Mac parity command list. |
| src/OpenClaw.Shared/Mcp/McpToolBridge.cs | Adds curated MCP description for tts.speak tool. |
| src/OpenClaw.Shared/Capabilities/TtsCapability.cs | Adds new node capability handling tts.speak with validation, guardrails, and event hook. |
| docs/gateway-node-integration.md | Documents adding tts.speak to the gateway allowlist. |
| docs/WINDOWS_NODE_TESTING.md | Documents tts.speak requirements and capability advertisement behavior. |
| docs/MCP_MODE.md | Updates MCP tool list summary to include tts.speak. |
| README.md | Updates node capability table, allowlist guidance, and adds a manual tts.speak invocation example. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (2)
src/OpenClaw.Tray.WinUI/Services/TextToSpeech/TextToSpeechService.cs:207
TextToSpeechServiceimplementsIDisposablebut doesn’t dispose owned disposable resources like_playbackGate(and it also doesn’t attempt to stop/dispose an_activePlayerdirectly during shutdown). Please dispose the semaphore (and ensure any activeMediaPlayeris stopped/disposed deterministically) to avoid leaking WinRT/media resources across node restarts/shutdown.
public void Dispose()
{
InterruptActivePlayback();
// Playback may still release the gate after an interrupt during shutdown.
_elevenLabsClient.Dispose();
}
src/OpenClaw.Shared/Models.cs:1057
MacNodeParityCommandsused to include.. DangerousCommands, but now it hard-codes a subset to excludetts.speak. This increases the chance that future dangerous commands won’t be covered by parity diagnostics unless someone remembers to update this list. Consider constructing the Mac parity list fromDangerousCommandswith an explicit exclusion fortts.speak(or adding a clear comment explaining why dangerous commands must be enumerated manually).
public static readonly string[] MacNodeParityCommands =
[
.. SafeCompanionCommands,
"camera.snap",
"camera.clip",
"screen.record",
"system.notify",
"system.run",
"system.which",
"browser.proxy"
];
- Files reviewed: 22/22 changed files
- Comments generated: 1
| public async Task<TtsSpeakResult> SpeakAsync(TtsSpeakArgs args, CancellationToken cancellationToken = default) | ||
| { | ||
| var provider = TtsCapability.ResolveProvider(args.Provider, _settings.TtsProvider); | ||
| var stopwatch = Stopwatch.StartNew(); | ||
|
|
||
| if (string.Equals(provider, TtsCapability.WindowsProvider, StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| await SpeakWithWindowsAsync(args, cancellationToken).ConfigureAwait(false); | ||
| } | ||
| else if (string.Equals(provider, TtsCapability.ElevenLabsProvider, StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| await SpeakWithElevenLabsAsync(args, cancellationToken).ConfigureAwait(false); | ||
| } | ||
| else | ||
| { | ||
| throw new InvalidOperationException($"Unsupported TTS provider '{provider}'."); | ||
| } | ||
|
|
||
| stopwatch.Stop(); | ||
| return new TtsSpeakResult | ||
| { | ||
| Provider = provider, | ||
| ContentType = string.Equals(provider, TtsCapability.ElevenLabsProvider, StringComparison.OrdinalIgnoreCase) | ||
| ? "audio/mpeg" | ||
| : "audio/wav", | ||
| DurationMs = (int)Math.Min(stopwatch.ElapsedMilliseconds, int.MaxValue) | ||
| }; |
There was a problem hiding this comment.
durationMs is currently derived from a wall-clock stopwatch around synthesis + playback (and for ElevenLabs also includes network latency). This makes the response field misleading if callers expect audio duration. Consider either computing playback duration from MediaPlayer.PlaybackSession.NaturalDuration/position, or renaming the field (and docs) to something like elapsedMs to match its semantics.
This issue also appears on line 202 of the same file.
…w#238) Three McpHttpServerTests were failing on Linux: 1. Post_WithLocalhostHost_Accepted — HttpListener on Linux rejects requests with Host: localhost when only http://127.0.0.1:port/ is registered as a prefix (404 before reaching application code). Fix: also register http://localhost:port/ so clients connecting via the hostname form are served. 2. Post_WithRebindHost_RejectedWithForbidden — With the dual-prefix registration, Host: evil.com still doesn't match, but Linux returns 404 (HttpListener filter) rather than 403 (application code). Both are valid rejections; relax assertion to NotEqual(OK). 3. Post_OversizedBody_RejectedWithRequestTooLarge — When the server sends 413 and closes the connection before the client finishes uploading 5 MiB, Linux surfaces a broken-pipe SocketException rather than letting the client see the response status. Catch the SocketException path as an equivalent rejection outcome. All 15 McpHttpServer tests now pass on Linux (967 pass, 20 skip). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolve SettingsManager test-isolation conflict and remove the duplicate Tray test compile item after rebasing openclaw#253. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks, @RBrid. This focused TTS slice is a good way to land the first stable piece of Windows voice support. This overlaps with the broader Voice Mode direction in #120 from @NichUK, so I’d like to treat this as the foundation layer: land I pushed a maintainer rebase commit that keeps this branch current with
|
Implements support for the tts.speak command tracked by #252. Adds the shared TTS capability, Windows and ElevenLabs playback paths, Settings UI/persistence, gateway/MCP advertisement, Command Center classification, docs, and tests.
Summary
Adds support for Windows node text-to-speech via a new
tts.speakcommand.This is the focused TTS slice tracked by #252, extracted from the broader voice work prototyped in #120. Credit to @NichUK for the original reference implementation and exploration in #120.
Closes #252.
What changed
TtsCapabilitywith commandtts.speak.tts.speak.tts.speakas a privacy-sensitive/dangerous Command Center command.tts.speakfrom Mac parity diagnostics until Mac implements it.Manual validation
Validated Windows provider invocation through the gateway:
Observed successful response:
{ "ok": true, "command": "tts.speak", "payload": { "spoken": true, "provider": "windows", "contentType": "audio/wav", "durationMs": 2447 } }Automated validation
Passed locally:
Results: