Skip to content

Added UI Automation Crawler to Onboarding Agent#2294

Open
robgruen wants to merge 28 commits intomainfrom
dev/robgruen/onboarding_experiment
Open

Added UI Automation Crawler to Onboarding Agent#2294
robgruen wants to merge 28 commits intomainfrom
dev/robgruen/onboarding_experiment

Conversation

@robgruen
Copy link
Copy Markdown
Collaborator

@robgruen robgruen commented May 5, 2026

This pull request introduces the initial setup and implementation for the UiAutomationHelper .NET project, providing both project structure and core functionality for UI automation tasks. The main focus is on enabling programmatic control of Windows applications and their UI elements via a set of RPC-accessible methods. Additionally, the PR includes configuration for project management, dependencies, and best practices.Project setup and configuration:

  • Added a new solution file UiAutomationHelper.sln with two projects: the main automation helper and its test project.
  • Created the initial .csproj file for UiAutomationHelper, targeting .NET 8.0 for Windows, referencing FlaUI libraries for UI automation, and configuring build and packaging settings.
  • Added a .gitignore for uiAutomationHelper to exclude build outputs and user-specific files.

Core functionality:

  • Implemented AppMethods in src/Methods/AppMethods.cs to support launching, attaching, listing, and killing applications, with robust parameter validation and error handling.
  • Implemented ActionMethods in src/Methods/ActionMethods.cs providing methods to interact with UI elements (invoke, toggle, set value, select, expand/collapse, scroll, focus, click, send keys), including parameter parsing, error handling, and support for various UI patterns.

Documentation:

  • Added Copilot instructions for Azure-related requests, specifying tool usage and best practices.

robgruen and others added 21 commits May 3, 2026 20:18
Skeleton for the UIA-based onboarding crawl: a .NET helper exposing a
JSON-RPC stdio surface (app lifecycle, tree.dump, screenshot, do.invoke)
backed by FlaUI/UIA3, and a TypeScript HelperClient. Verified end-to-end
against Windows Clock; live smoke produces a tree-dump fixture and
screenshot. SelectorParser has 20 xUnit tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Helper RPCs: do.toggle/setValue/select/expand/scroll/focus/click/sendKeys
(joining do.invoke from slice 1), plus find (with optional polling) and
events.idle (focus-change-debounce). All app.list and tree.dump calls now
retry transient COM errors that fire during UWP teardown.

Selector resolution gained two fixes from real-world failures: when
AutomationId is missing, capture-time selectors include ClassName as a
disambiguator so siblings sharing a Name (UWP's nested ApplicationFrame
and CoreWindow both named after the app) resolve correctly. App.launch's
returned mainWindow is now the desktop-rooted ApplicationFrameWindow,
not the inner CoreWindow which lives under it in UIA's logical tree —
resolved via Win32 GA_ROOTOWNER + name match + a poll loop for the
async-created frame.

Smoke now drives Clock through invoke + select + focus + find +
events.idle and produces a clock-tree-navigated.json fixture showing the
post-navigation state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
C# helper: snapshot.capture / snapshot.restore / snapshot.delete RPCs
backed by FolderSnapshotter (recursive copy with exclude globs) and
ProcessKiller (graceful close → force kill on identity match). Restore
is replace-not-merge — the target directory is wiped before files come
back, so files added to state between snapshots disappear on restore.

TS: snapshotPolicy.ts library with inferSnapshotPolicy (UWP via
PowerShell Get-AppxPackage → PackageFamilyName → LocalState/Settings/
RoamingState folder enumeration), plus load/save/approve/markStateless
helpers. HelperClient gains snapshotCapture/Restore/Delete.

Smoke:
- inferSnapshotPolicy for Clock detects the 3 expected UWP folders
  (LocalState, Settings, RoamingState) under Microsoft.WindowsAlarms_*.
- Synthetic capture/restore round-trips against a sandboxed state
  directory: dirty all 3 files + add a new one + restore → all originals
  match expected content + the added file is gone.

Slice 3b (onboarding-action wiring: inferSnapshotPolicy / approveSnapshotPolicy
/ markStateless / editSnapshotPolicy actions on the manifest+grammar)
deferred to when we wire the full pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Helper: tree.fingerprint RPC computes a SHA-256 of the (filtered) UIA
subtree, with optional dynamic-control rules that mask `value` / `name`
/ `toggleState` of matched controls. Matchers: automationId, exact
selector, glob selectorPattern, container (subtree + controlType +
optional name/className regexes).

TS dynamicControls.ts: calibrateDynamicControls runs N tree dumps (3 by
default, 3s apart) with no input, diffs them by selector, and emits
DynamicControlRule[] tagged `calibration-drift` with confidence =
transitions / (N-1). Persistence (load/save) and rule-merge by matcher
identity also included.

Smoke validates the mechanism on Clock's running stopwatch:
  - Back-to-back fingerprints identical (deterministic hash).
  - Applying a rule that masks Close button's name → different
    fingerprint than no-rule (rule application affects the hash).
  - Calibration picks up StopwatchTimerText as dynamic with confidence
    1.0 across 3 dumps.
  - Naked fingerprints diverge across a 4s window (timer advanced);
    rules-aware fingerprints partially mask drift but don't fully (Clock
    has multiple time-display elements at different granularities).
    The residual drift is a real-world finding to address with iterative
    explore-drift rule updates in the autonomous loop (slice 6).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Helper: JSON-RPC notifications (server → client, no id), routed through
a shared write lock so UIA event-thread emissions don't interleave with
RPC responses. New events.subscribe / events.unsubscribe RPCs accept
eventTypes ["Invoked", "ValueChanged", "ToggleStateChanged",
"StructureChanged"], scoped to a selector with TreeScope.Subtree.
SubscriptionRegistry holds active subscriptions; Subscription.Dispose
unregisters via FlaUI's IDisposable handlers.

TS: HelperClient gains a notification dispatch path with `onEvent()`
registration. Recorder library subscribes, writes captured events as
JSONL into <workspaceDir>/recordings/<sessionId>/transitions.jsonl.

Smoke launches Clock, drives it through navigation + click, and
captures 10 StructureChanged events end-to-end into a transitions.jsonl
fixture.

Real-world finding: UIA's InvokedEvent doesn't propagate to in-process
listeners for UWP apps even when triggered via real Mouse.LeftClick —
likely a cross-process COM marshaling quirk between ApplicationFrameHost
and the UWP package's CoreWindow. StructureChanged events DO fire reliably,
which is enough for the autonomous-explore loop in slice 6 (which
re-dumps the tree after each agent action and doesn't depend on Invoked
events for its own actions). Record-mode for genuine user-driven sessions
(separate process driving the app) is the case where Invoked events
matter, and that case is untested here.

Also: NavView Group elements can have dynamic Names that embed running
state ("Stopwatch, Paused, 12 seconds 23 centiseconds"), invalidating
selectors built on Name-only as soon as the app starts. Caught this in
the smoke; selector fallbacks beyond ClassName disambiguation will be
needed in slice 6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Outer loop with pluggable DecisionOracle:
  - capture (treeFingerprint + treeDump → upsertState dedup by fingerprint)
  - oracle.decide(input) → ExploreDecision
  - execute (dispatch verb to do.* RPCs)
  - eventsIdle → recapture → addTransition
  - persist incrementally to states.jsonl + transitions.jsonl

State graph persists every iteration (JSONL append + per-state TreeNode
JSON + optional screenshots) and rehydrates on construction, so a crashed
run can resume from disk. Budget gates: maxIterations / maxWallClockMs /
maxStates / convergenceThreshold (iterations since last new state).

Frontier computation: maps Pattern set → ActionVerb candidates per
actionable on-screen control, marks destructive (delete/remove/reset/
clear/erase regex), priority-sorts (Button/MenuItem/ListItem first,
unstable identifiers later, destructive last).

Stub oracle picks the first non-destructive non-window-management
frontier item; smoke shows 6 iterations, 2 distinct states, 5
successful transitions persisted with correct fromStateId→toStateId
dedup on revisits.

Slice 6b will swap StubOracle for a typechat-backed LLM oracle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LlmOracle implements DecisionOracle. ExploreDecision schema (act/stop/
restore) lives at exploreLlmSchema.ts and is loaded as text by
TypeChat's TypeScript-JSON validator, the same pattern as
discoveryLlmSchema. Postbuild copies it into dist alongside the
discovery schema. lib/llm.ts gets a getExploreModel() factory tagged
"onboarding:explore".

Prompt template includes goal, frontier (rendered as a numbered list
with controlType/name/automationId/verbs), recent transitions tail,
visited-state ids, and remaining budget. On translation failure the
oracle counts consecutive failures and either falls back to the first
non-destructive frontier item (single retry) or stops.

Smoke against Windows Clock with budget(maxIterations=8): the model
systematically navigates Focus sessions → Timer → Alarm → Stopwatch →
World clock, discovering 11 distinct states across 8 successful
transitions (no failures). State dedup confirmed via revisit on iter 8
(state-004 shows up again with the same fingerprint).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three TypeChat schemas in synthesisLlmSchema.ts: NeutralStatesClassification
(per-state isNeutral + tabOrSection label), ClusteringResult (group chunks
by user-intent), and SynthesizedAction (final action with playback
recipe, parameters, preconditions). Postbuild copies the schema into dist.

synthesizer.ts pipelines:
  1. classifyNeutralStates — one LLM call covering all states, summarized
     by their actionable controls + window title
  2. chunkTransitions — deterministic; cuts the transition log at neutral
     boundaries, trailing-non-neutral chunks flagged isNeutralEnd=false
  3. clusterChunks — one LLM call, intent-naming in camelCase verb-noun
  4. synthesizeOneCluster — one LLM call per cluster, builds full
     PlaybackStep[] with valueRef/valueLiteral params extracted from
     chunk variations
  5. writeOutput — discoveredActions.json (matches schema phraseGen
     consumes) + synthesisReport.md for the human approval gate

End-to-end smoke: explore Clock (8 iterations, 11 states, 8 transitions)
→ synthesize (3 actions, all "navigateToTab" by destination, with full
selector paths in playback). discoveredActions.json contains valid,
replay-ready recipes.

Quality finding: clustering didn't merge functionally identical chunks
into one parameterized action. Design called for
`navigateToTab(tab: "Focus sessions" | "Timer" | "Alarm" | ...)`, got
three separate `navigateToTab` actions split by destination instead.
The clustering prompt needs stronger emphasis on parameterization-via-
variation (or a two-pass merge step). Mechanism is correct; output
quality has room.

Other findings: neutral classification produced sensible per-state
labels (focusTab.setup / timerTab.empty / alarmTab.empty / etc.) just
from actionable-control summaries; chunks merge correctly when
intermediate states are non-neutral (2-step playbacks emerged where
exploration crossed two neutrals).

This closes the seven-slice arc: helper → verbs → snapshot →
calibration → record → autonomous loop → synthesis. Branch is now a
working end-to-end UIA-based onboarding pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Built an end-to-end demo proving the design works on Windows Clock:

clockCrawl.ts: full crawl with snapshot/restore safety net.
  - inferSnapshotPolicy auto-detects the 3 UWP folders (LocalState +
    Settings + RoamingState under Microsoft.WindowsAlarms_*)
  - captures baseline (82KB)
  - drives Clock for 25 iterations against a task-oriented goal
    (create alarm, create timer, exercise stopwatch...)
  - synthesis produces 4 actions with parameters and playback recipes
  - restores baseline at the end

playbackExecutor.ts: generic SynthesizedAction → executed steps.
  - resolves valueRef/${param} substitution against a runtime params map
  - dispatches each step's verb to the corresponding do.* RPC
  - waits for UIA idle after invoke/select by default

clockAgentDemo.ts: replays a crawled action with NEW parameters.
  - loads discoveredActions.json from the most recent run
  - restores Clock to the baseline snapshot (known starting point)
  - runs createAlarm({alarmName: "Crawled Demo Alarm", hour: 8, minute: 15})
  - verifies a new AlarmViewGrid DataItem named "Edit alarm, Crawled
    Demo Alarm, 8:15AM, Only once, " appears in the tree
  - restores baseline again to leave Clock as we found it

Result: the LLM successfully crawled Clock, the synthesizer extracted
correct UIA paths through Popup → EditFlyout → ContentScrollViewer →
DurationPicker → HourPicker, and the executor replayed those exact
paths to create a brand new alarm with new parameter values not seen
during the crawl. This validates the entire design end to end:

  helper → exploration → state graph → synthesis → discoveredActions.json
  → playback executor → real Windows alarm

Quality findings still standing:
- Clustering didn't merge tab-switching chunks into one parameterized
  navigateToTab; got 3 by-destination clusters + a mislabeled "12-step
  navigateToTab" that's actually the create-alarm-then-create-timer
  full flow.
- Verification predicate in the demo had to use DataItem with
  AutomationId="AlarmViewGrid", not ListItem with Toggle — alarm rows
  don't render the way I first guessed. (Synthesis could be enhanced to
  emit better postcondition assertions.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…erge

The synthesis pipeline produces dramatically better-shaped actions when
the underlying model is a reasoning model and the prompts encode
structural rules explicitly:

- getSynthesisModel() defaults to GPT_5. Same for the explore oracle.
  Note: aiclient's getEnvSetting short-circuits on its empty-string
  default, so endpoint-suffixed timeout vars must be set explicitly
  (AZURE_OPENAI_MAX_TIMEOUT_GPT_5) — smoke tests handle this in their
  preamble.

- Tightened prompts in synthesizer.ts:
  * NEUTRAL_RULES — modal/popup/flyout/wizard is NEVER neutral; "Save"-
    bearing controls are a hard signal; tab landing areas ARE neutral.
  * CLUSTERING_RULES — aggressively merge open→fill→save flows into one
    cluster; parameterize by variation; toggle-aware (split start/pause
    despite shared selector); don't emit fragments; aim for few clusters.
  * SYNTHESIS_RULES — use the LONGEST chunk as canonical playback (don't
    take intersection); declare parameters even from one chunk if value
    is clearly user-supplied; toggle-aware (one click per logical
    action, not the repeated count).

- New validation pass (synthesisLlmSchema.ts ValidationResult): GPT-5
  reads the full synthesized set and emits per-action verdicts (ok /
  fragment / duplicate / broken / ambiguous) plus MergeRecommendations
  for duplicates that should be one parameterized action. Recommendations
  are applied automatically: target actions are removed, a single
  parameterized action replaces them.

- mergeIntoWorkspace appends/updates a workspace-level
  discoveredActions.json so successive crawls accumulate (rather than
  each run overwriting the canonical set). Per-action merge: longer
  playback wins, parameter examples union, destructive flags union.

Real-data result on the existing Windows Clock 54-iteration run, before
vs after this change:

  before (default model + loose prompts): 12 actions, mostly fragments
    confirmAlarm/Clock/Timer (1-step fragments), createAlarm 1-step,
    setAlarmDetails/Time fragments, startStopwatch with 9 alternating
    clicks merged

  after (GPT-5 + tight prompts + validation):
    addWorldClock(city: string) — 3 steps
    createAlarm(name: string, minutes: number) — 4 steps
    createTimer() — 2 steps
    recordLap() — 1 step
    setStopwatchRunning(running: boolean) — auto-merged from start+pause
    startFocusSession() — 1 step

Also: clockFullCrawl.ts (popup-aware big-budget crawl) and
resynthesize.ts (re-runs synthesis on an existing runDir without re-
exploring — invaluable for iterating on synthesis prompts cheaply).

uiCapture/README.md documents the full pipeline: helper RPCs, explorer
loop, synthesis stages, on-disk layout, smoke tests, observed quality
patterns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two reconnaissance modes that catalog what an app supports BEFORE the
crawl, so the explore loop can drive specific actions instead of guessing.

reconLlmSchema.ts + tabReconnaissance.ts — simple per-tab variant. Walks
the NavView tabs (heuristic: largest cluster of sibling ListItems with
SelectionItem pattern), navigates to each, sends screenshot + filtered
control tree to a vision LLM, gets back a TabRecon with expectedActions.

iterativeReconLlmSchema.ts + iterativeReconnaissance.ts — multi-turn
loop. Per turn: screenshot + tree + already-discovered list go to the
vision LLM, which returns newDiscoveries plus a click/back/done
decision. Drills INTO modals/dialogs to enumerate their fields, then
clicks Cancel to back out. Vastly richer than the per-tab variant
because it sees what's BEHIND the buttons, not just on the surface.

getReconModel() defaults to GPT_v (the dedicated vision deployment in
this Azure config). GPT-5 deployments here don't accept image_url
content type ("API version not supported"); GPT-4o uses a /openai/v1/
URL shape that aiclient's request builder doesn't construct correctly.
GPT-v on /openai/deployments/gpt-v/chat/completions just works.

TypeChat wiring fix: image content goes in promptHistory as a prior
user message, NOT substituted for the createRequestPrompt result. That
way TypeChat's standard schema-instruction wrapper still gets appended,
and the model knows to respond in JSON.

Smoke result on Clock (clockIterativeRecon.ts, 20 turns):
  → 34 discovered actions across 5 tabs
  → screen path: Timer → Add timer dialog → Timer → Alarm → Add alarm
    dialog → Alarm → Stopwatch → World clock → Add location → Focus
    sessions → ...
  → caught secondary features explore alone wouldn't (keepTimerOnTop,
    linkSpotify, repeatAlarm with days enum, setAlarmSound enum, etc.)
  → correctly flagged resetStopwatch as destructive
  → properly typed parameters with plausible examples (hour=7,
    cityName='New York', period='AM')

Some actions are over-decomposed (nameAlarm / setAlarmTime / saveAlarm
emitted as separate intents instead of fields of createAlarm). Expect
the synthesis pass to roll these up when it sees the actual chunks.

Also adds clockReconCrawl.ts (full pipeline: simple recon → goal-from-
recon → crawl → synthesize) and clockIterativeRecon.ts (recon-only
smoke for fast feedback).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a "Status" header at the top with what's working and what's left
(by priority: TypeAgent integration → synthesis prompts on richer
input → selector decay → focused crawl tooling → multi-id selector
fallback). Documents the reconnaissance subsystem (both per-tab and
iterative variants, vision wiring, model selection).

Updates pipeline diagram to show the optional recon phase feeding the
explore loop's goal. Adds new findings to the quality observations:
GPT-5 for synthesis, GPT-v for vision, aiclient gotchas (URL shape
+ endpoint-suffixed env-var fallback bug). Updates "adding a new
integration" to make iterativeReconnoiter the recommended starting
step.

Also updates clockReconCrawl.ts to use iterativeReconnoiter (the
richer recon variant) instead of the simpler per-tab survey.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Generates a TypeAgent agent package from a workspace's discoveredActions.json:
  packages/agents/<name>/
    package.json                  ← exports agent/manifest + agent/handlers,
                                    deps on agent-sdk + onboarding-agent
    tsconfig.json + src/tsconfig.json
    src/<name>Schema.ts           ← typed action union + per-action types
                                    with parameters mapped from ParamSpec
                                    (string/number/boolean/enum literals)
    src/<name>Manifest.json       ← schemaType.action = "<Cap>Action"
    src/<name>ActionHandler.ts    ← AppAgent.executeAction wires through
                                    HelperClient + executePlayback,
                                    auto-launches the app on first call
    data/discoveredActions.json   ← copied alongside for runtime loading

Public exports surface from onboarding-agent: a new "./uiCapture"
subpath entry point in package.json + dist/uiCapture/index.ts that
re-exports HelperClient, executePlayback, and the relevant types so
generated agents can import { ... } from "onboarding-agent/uiCapture"
without depending on internal paths.

Generated handler manages a per-session AgentState with the helper
client + tracked app pid + main window selector. ensureClient lazily
spawns the helper; ensureAppRunning launches the target AUMID/exePath
(taken from the scaffolder's appLaunch option) on first action and
re-launches if the prior pid has exited. Each action looks up the
SynthesizedAction by actionName and runs executePlayback with the
caller's parameters.

Manifest currently omits grammarFile — the dispatcher falls back to
LLM-based translation against the .pas.json schema. Hand-tuned grammar
or phraseGen-emitted grammar can be added later.

scaffoldClockAgent.ts is the one-shot CLI wrapper for Windows Clock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tcher

End-to-end pipeline now works through TypeAgent: 'start a timer in clock'
→ NL→typed action via TypeChat → playback recipe → real Clock UI. Verified
on Windows Clock with two distinct actions:

  $ node packages/cli/bin/run.js run request "start a timer in clock"
  [⏰ windowsClock] Translating 'start a timer in clock' into action 'startTimer'
  [⏰ windowsClock] Executing action windowsClock.startTimer
  Done: startTimer (3 steps)

  $ node packages/cli/bin/run.js run request "show me the timer tab in clock"
  [⏰ windowsClock] Executing action windowsClock.navigateToTimerTab
  Done: navigateToTimerTab (1 steps)

Changes to land the integration:

- Generated windowsClock-agent package via scaffoldUiAgent (4 actions
  from the latest crawl: navigateToTimerTab, renameTimer, startTimer,
  setTimerViewMode). Schema, manifest, action handler, package.json,
  tsconfigs all auto-generated from data/discoveredActions.json.

- Registered windowsClock in defaultAgentProvider/data/config.json and
  added windowsClock-agent as a workspace dependency in its
  package.json. Dispatcher loads it the same way as built-in agents.

- Three scaffolder fixes uncovered while integrating:
  * /** ... */ blocks rejected by action-schema-compiler — switched to
    single-line // comments per action description.
  * Record<string, never> not supported for parameter types — switched
    to {} for zero-parameter actions.
  * Comments above the entry-type union are rejected — moved
    auto-generation note out of the way.

- New scaffolder option: appTitleMatch. Each action handler probes
  app.list for an existing window matching the title before launching;
  UWP apps can't be launched twice and FlaUI returns "no main window"
  when they are. Without this fix, the second NL request in a session
  failed; with it, the handler attaches to the running Clock instance.

Quality issue still standing: the explore phase only covered the Timer
tab (28 iter wall-clock-capped at 15 min, the LLM oracle drilled deep
on Timer instead of moving on). The crawl produced just 4 actions
instead of the ~10-15 implied by reconnaissance's 35 candidates. Fix is
either focused per-tab re-crawls (merge logic already in place) or a
prompt tweak to make the oracle move on after exhausting a tab.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-tab focused crawls for Alarm, Stopwatch, World Clock, Focus session
(Timer was already covered) — snapshot once at start, restore between
tabs, run an explore loop with a tightly-scoped goal naming the
specific tasks for that tab, synthesize and merge into the workspace's
discoveredActions.json. Final restore at the end.

Per-tab budget: 10-18 iterations / 4-5 min wall clock. Each tab's goal
includes a short list of concrete tasks (open dialog, fill fields,
click commit) and explicitly references the relevant AutomationIds and
container patterns. The LLM oracle was very efficient: every tab
stopped early via "Goal completed" except Focus (hit max-iterations
cap, all 12 transitions still successful).

Crawl-by-crawl results:

  alarm     → 9 iter, 8/8 successful, +3 actions  → total 7
  stopwatch → 7 iter, 3/6 successful, +3 actions  → total 10
              (3 fails on the dynamic-name parent Group selector
              decay we already documented)
  worldclock→ 5 iter, 4/4 successful, +2 actions  → total 12
  focus     → 12 iter, 12/12 successful, +2 actions → total 14

Final 14 actions:

  navigate{Alarm,Stopwatch,Timer,WorldClock,Focus}Tab — 5 tabs, 1 step each
  createAlarm(alarmName, hour, minute) — 5-step flow
  addWorldClock(cityQuery, suggestionItem) — 3-step flow
  setAlarmEnabled(enabled: boolean) — auto-merged from enable/disable
  setStopwatchRunning(running: boolean) — auto-merged from start/pause
  setFocusSessionRunning(running: boolean) — auto-merged from start/pause
  setTimerViewMode(mode: enum) — auto-merged from expand/restore
  recordLap()
  startTimer()
  renameTimer(name: string)

Verified end-to-end through TypeAgent:
  $ run request "in windows clock app, navigate to alarm tab"
    → Done: navigateToAlarmTab (1 steps)
  $ run request "in windows clock, set an alarm for 8:30 named morning"
    → Done: createAlarm (5 steps)
  → real "morning" alarm at 8:30AM appeared in Clock's tree

Quality issues still standing:
- Boolean parameter examples got nonsense values like 'stopwatch' from
  the merge-recommendation collectExamples heuristic (it derives from
  action-name suffix). Recipes still work since the boolean is unused
  in the playback (the merged action still just toggles), but the
  schema example values are misleading. Fix: pass true/false through
  proper merge-aware example synthesis.
- createAlarm assumes the Alarm tab is already active. Synthesis
  correctly extracted the alarm-creation flow but discarded the
  navigation prefix (it became its own navigateToAlarmTab action).
  Multi-step user requests through TypeAgent need to either chain
  navigate-then-create, or the runtime handler needs a precondition
  step (e.g., auto-call navigateToTab matching the action's
  tabOrSection). Current workflow: user says navigate first, then act.
- Dispatcher's construction cache is aggressive: phrases like "create
  an alarm" hit the onboarding-agent's scaffoldAgent action, "go to
  X" hits excel.navigateToCell, "switch to" hits player.selectDevice.
  Workaround: include "windows clock" in the request to disambiguate.
  Real fix: clear the cache or extend the windowsClock agent's NL
  patterns explicitly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the Status section (now 2026-05-05) to reflect TypeAgent
integration shipping end-to-end and the multi-tab crawl producing
14 actions across all 5 Clock tabs verified through the dispatcher.

Adds 3 newly-surfaced issues to the prioritized backlog:

- createAlarm assumes the right tab is already active (synthesis dropped
  the navigate prefix into its own action; needs runtime handler to
  auto-navigate or synthesizer to inject the prefix).
- Auto-merged toggle actions get nonsense boolean examples like
  'stopwatch' instead of [true, false] — applyMergeRecommendations'
  collectExamples fallback uses action-name suffix.
- Dispatcher construction cache misroutes common phrasings: "create an
  alarm" hits onboarding's scaffoldAgent, "go to X" hits excel,
  "switch to" hits player. Workaround: include "windows clock" in the
  request. Real fix: clear cache or write explicit .agr.

Plus an item for helper binary bundling once a UI-Automation agent
needs to ship outside the dev repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@robgruen robgruen marked this pull request as ready for review May 5, 2026 18:45
…sualStudio agents

Co-authored-by: robgruen <25374553+robgruen@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented May 5, 2026

@copilot resolve the merge conflicts in this pull request

Resolved the merge conflicts in commit 750df170. The two conflicts were:

  1. ts/packages/defaultAgentProvider/data/config.json — kept both the windowsClock and visualStudio agent entries.
  2. ts/pnpm-lock.yaml — took the origin/main version and regenerated it to include the windowsClock-agent workspace dependency.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • www.electronjs.org
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/TypeAgent/TypeAgent/ts/node_modules/.pnpm/@electron&#43;rebuild@4.0.3/node_modules/@electron/rebuild/lib/module-type/node-gyp/worker.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

… remove invisible unicode

Agent-Logs-Url: https://github.com/microsoft/TypeAgent/sessions/49ceea26-2edf-4b12-8dcf-5ab4e7f5f22c

Co-authored-by: robgruen <25374553+robgruen@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented May 5, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • www.electronjs.org
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/TypeAgent/TypeAgent/ts/node_modules/.pnpm/@electron&#43;rebuild@4.0.3/node_modules/@electron/rebuild/lib/module-type/node-gyp/worker.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copy link
Copy Markdown
Contributor

Copilot AI commented May 5, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • www.electronjs.org
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/TypeAgent/TypeAgent/ts/node_modules/.pnpm/@electron&#43;rebuild@4.0.3/node_modules/@electron/rebuild/lib/module-type/node-gyp/worker.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copy link
Copy Markdown
Contributor

Copilot AI commented May 6, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • www.electronjs.org
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/TypeAgent/TypeAgent/ts/node_modules/.pnpm/@electron&#43;rebuild@4.0.3/node_modules/@electron/rebuild/lib/module-type/node-gyp/worker.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI had a problem deploying to development-fork May 6, 2026 00:58 Failure
Copilot AI temporarily deployed to development-fork May 6, 2026 00:58 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants