From b8e6550c25a69e2bcd4b4af227c50df8546c1578 Mon Sep 17 00:00:00 2001 From: Echo Xiao Date: Mon, 8 Jun 2026 11:31:02 -0700 Subject: [PATCH 1/4] Iter 7: Architecture extraction, tool behavior refactor, Claude judge eval MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Major changes: - Extract architecture knowledge from AGENTS.md to architecture.json (30 entries, source-verified) - AGENTS.md stripped to pure rules (tool order, answer format, navigation strategy) - implement: class skeleton mode (10K+ → ~500 tokens), ClassName.methodName support - implement: enforce search/graph before implement (session tracking) - search/graph/implement: navigation hints in responses - graph: architecture context injection from architecture.json - grep: sorted by relevance, limited to top 10 - Callee skeletons removed (graph(down) replaces at 1/10th cost) Bug fixes: - retriever.ts: callee skeleton bug (objects treated as strings) - Remove 5 unused deps, unused exports, stale params Results: - L1: 25/34 (unchanged, no artificial inflation) - L2 tokens: 69K → 29K avg/question (-58%) - L2 implement: 3,070 → 544 avg tokens (-82%) - Claude judge: 6 GOOD, 16 ACCEPTABLE, 9 WEAK, 3 WRONG (65% usable) Eval renames: tool-eval → layer1-tool-eval, agent-eval → layer2-agent-eval New: compare.ts generates 3-way comparison report, comparison-report.md Co-Authored-By: Claude Opus 4.6 (1M context) --- AGENTS.md | 199 +- README.md | 23 +- docs/eval-iterations.md | 117 + logs/agent-eval.md | 2971 -------------- logs/comparison-report.md | 3465 +++++++++++++++++ .../claude-01-push-notifications.md | 54 +- .../claude-02-msg-permissions.md | 34 +- logs/gemini-answers/claude-03-file-upload.md | 36 +- .../claude-04-e2e-encryption.md | 122 +- logs/gemini-answers/claude-05-call-chain.md | 66 +- .../claude-06-livechat-routing.md | 70 +- .../gemini-answers/claude-07-api-endpoints.md | 47 +- logs/gemini-answers/claude-08-federation.md | 42 +- .../new-09-realtime-streamer.md | 48 +- logs/gemini-answers/new-10-apps-engine.md | 48 +- logs/gemini-answers/new-11-settings.md | 67 +- logs/gemini-answers/new-12-ldap-auth.md | 93 +- logs/gemini-answers/new-13-room-creation.md | 86 +- logs/gemini-answers/new-14-ee-license.md | 58 +- .../gemini-answers/new-15-impact-aftersave.md | 40 +- logs/gemini-answers/new-16-impact-streamer.md | 138 +- logs/gemini-answers/new-17-slash-commands.md | 78 +- logs/gemini-answers/new-18-webhook.md | 67 +- .../new-19-message-rendering.md | 52 +- logs/gemini-answers/new-20-proxify.md | 54 +- logs/gemini-answers/new-21-impact-settings.md | 35 +- logs/gemini-answers/new-22-2fa.md | 103 +- logs/gemini-answers/new-23-omnichannel.md | 67 +- logs/gemini-answers/new-24-autotranslate.md | 44 +- logs/gemini-answers/new-25-search.md | 74 +- logs/gemini-answers/new-26-team.md | 116 +- .../gemini-answers/new-27-video-conference.md | 73 +- logs/gemini-answers/tour-04-msg-client.md | 40 +- logs/gemini-answers/tour-05-msg-server.md | 87 +- logs/gemini-answers/tour-06-endpoint.md | 75 +- .../gemini-answers/tour-07-db-model-create.md | 132 +- logs/gemini-answers/tour-08-db-model-use.md | 55 +- logs/gemini-answers/tour-10-new-service.md | 108 +- logs/gemini-answers/tour-11-new-package.md | 97 +- ...seline-eval.md => layer0-baseline-eval.md} | 0 logs/{tool-eval.md => layer1-tool-eval.md} | 2 +- logs/layer2-agent-eval.md | 2873 ++++++++++++++ package-lock.json | 326 +- package.json | 14 +- src/architecture.json | 122 + src/config.ts | 1 - src/eval/compare.ts | 215 + ...seline-eval.ts => layer0-baseline-eval.ts} | 2 +- .../{tool-eval.ts => layer1-tool-eval.ts} | 2 +- .../{agent-eval.ts => layer2-agent-eval.ts} | 10 +- src/indexer/local-db.ts | 7 - src/server/registry.ts | 141 +- src/server/retriever.ts | 96 +- 53 files changed, 8161 insertions(+), 4831 deletions(-) delete mode 100644 logs/agent-eval.md create mode 100644 logs/comparison-report.md rename logs/{baseline-eval.md => layer0-baseline-eval.md} (100%) rename logs/{tool-eval.md => layer1-tool-eval.md} (99%) create mode 100644 logs/layer2-agent-eval.md create mode 100644 src/architecture.json create mode 100644 src/eval/compare.ts rename src/eval/{baseline-eval.ts => layer0-baseline-eval.ts} (98%) rename src/eval/{tool-eval.ts => layer1-tool-eval.ts} (99%) rename src/eval/{agent-eval.ts => layer2-agent-eval.ts} (98%) diff --git a/AGENTS.md b/AGENTS.md index 8d79cb4..c176dab 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -2,32 +2,38 @@ ## Answer Rules -1. **Always include specific file paths** in your answer (e.g., `apps/meteor/app/lib/server/functions/sendMessage.ts`). Every key file in the chain must be listed with its role. -2. **Start from the entry point**, not the middle. For architecture questions, trace the full chain from the top-level entry to the final destination. -3. **Keep tool calls efficient.** Use `search` → `graph` first, then `implement` only at key points (max 2-3 calls). Do NOT keep searching if you already have enough information — write your answer. +1. **ALWAYS call at least one tool.** Never answer from memory alone — your training data has outdated file paths. Use tools to get real paths. +2. **Always include specific file paths** in your answer (e.g., `apps/meteor/app/lib/server/functions/sendMessage.ts`). Every key file in the chain must be listed with its role. +3. **Start from the entry point**, not the middle. For architecture questions, trace the full chain from the top-level entry to the final destination. 4. **List the call chain explicitly** in your answer: `Entry → Step 1 → Step 2 → ... → Final`. +5. **Follow the tool order: search → graph → implement.** You MUST call `search` or `graph` before `implement`. The system enforces this — `implement` will be rejected if you haven't searched first. ## Tools Three tools only. All other file/shell tools are disabled. -| Tool | When to use | -|------|-------------| -| `search(query, layer?, question?)` | Find entry point by symbol or keyword | -| `graph(symbol, direction, depth?, edgeTypes?, question?)` | Traverse dependency edges from a known symbol | -| `implement(symbol, filename)` | Read full source of one specific symbol — use when you need implementation details | +| Tool | When to use | Cost | +|------|-------------|------| +| `search(query, layer?)` | Find entry point by symbol or keyword | Cheap (~200 tokens) | +| `graph(query, direction?, depth?, layer?, mode?, edgeTypes?)` | Traverse dependency edges from a known symbol | Cheap (~300 tokens) | +| `implement(symbolName, filename)` | Read source of a specific symbol. For classes: returns method signatures — use `implement("Class.method", file)` to read a specific method. | Expensive (1K-5K tokens) | -**`implement` is expensive (returns full source code). Use it at layer boundaries or to confirm key details (max 2-3 calls). Prefer `search` + `graph` for navigation — they return file paths and symbols without consuming excessive tokens. When you have enough information, stop calling tools and write your answer.** +**Strategy: use `search` + `graph` to map the territory first (cheap), then `implement` only at 1-2 key points (expensive). `graph(down)` already shows what a function calls — you don't need `implement` just to see the call chain.** --- ## Navigation Rules -**Default flow for any architectural question:** +**Mandatory flow — follow this order every time:** ``` -search → graph(down) → implement only at boundaries +Step 1: search(entry_symbol) → find files + symbols +Step 2: graph(symbol, "down") → map the call chain (cheap, gives you the full picture) +Step 3: implement(symbol, file) → read source ONLY at 1-2 key points +Step 4: STOP and write your answer → include all file paths from steps 1-3 ``` +**Do NOT skip to implement.** `graph(down)` gives you the same call chain information for 1/10th the token cost. Use `implement` only when you need to see the actual logic inside a function. + **Pick direction:** - `graph(down)` — what does X invoke? (trace a flow forward) - `graph(up)` — what calls X? (find callers, assess impact) @@ -42,178 +48,17 @@ search → graph(down) → implement only at boundaries - Component tree: `edgeTypes=['jsx']` - Full routing: `edgeTypes=['call','event_listen','pubsub_subscribe']` -**If `search` or `graph` returns nothing:** the symbol may be dynamically registered — check the Dynamic Patterns section below before retrying. - --- ## Question Type → Entry Strategy | Type | Strategy | |------|----------| -| Architecture / Call chain | Check Architecture section for entry point → `search(entry)` → `graph(down)` | -| Locate | `search(keyword)` → `implement` top result | -| Pattern | `search` existing instance → `implement` — skip `graph` | -| Routing | Check Architecture section → `search(dispatcher)` → `graph(down, edgeTypes=[...])` | -| Impact | `search(target)` → `graph(up)` → `implement` top callers | - ---- - -## Architecture - -### Client Message Sending -``` -RoomBody → ComposerContainer → ComposerMessage → MessageBox - ↓ onSend - chat.flows.sendMessage() - ↓ - sdk.call('sendMessage') ← DDP boundary -``` -Entry: `search('MessageBox', layer='client')` → `graph(down)` - -Cross DDP boundary: `sdk.call('sendMessage')` → virtual node `'sendMessage'` → server handler (see Dynamic Patterns §A) - ---- - -### Server Message Sending -``` -Meteor.methods({ sendMessage }) ← DDP entry (virtual node 'sendMessage') - ↓ -executeSendMessage ← permission check - ↓ -sendMessage → Messages.insertOne ← DB write - ↓ -afterSaveMessage callbacks ← event_emit (see Dynamic Patterns §B) -``` -Entry: `search('executeSendMessage', layer='server')` → `graph(down)` - ---- - -### Push Notifications -``` -afterSaveMessage → sendMessageNotifications → sendNotification (per user) - ↓ - shouldNotifyMobile/Desktop/Email - ↓ - NotificationQueue → PushNotification → APN / FCM -``` -Entry: `search('sendNotificationsOnMessage')` → `graph(down)` - ---- - -### REST API -``` -ApiClass → authenticationMiddleware → permissionsMiddleware → rate limiter → Route Handler -``` -Entry: `search('ApiClass')` or search the specific route path → `graph(down)` - ---- - -### DDP Subscription / Real-time Sync -``` -Meteor.subscribe('X') → Meteor.publish('X', fn) → StreamerCentral → DDP push to client - ↓ - Streamer Client → React re-render -``` -Entry: `search('StreamerCentral')` → `graph(down)` - ---- - -### Apps Engine -``` -AppManager → AppListenerManager → executeListener() - ↓ - Bridge layer (adapts core ↔ App) - ↓ - App hook return value applied to core flow -``` -Entry: `search('AppListenerManager')` → `graph(down)` - ---- - -### Authentication -``` -Meteor.loginWithPassword/LDAP/OAuth - ↓ -Accounts.registerLoginHandler → credential validation → { id, token } - ↓ (subsequent requests) -x-auth-token header → authenticationMiddleware → Users.findOneByIdAndLoginToken -``` -Entry: `search('registerLoginHandler')` → `graph(down)` - ---- - -### Webhook Routing -``` -POST /hooks/:integrationId/:token → authenticatedRoute → executeIntegrationRest → processWebhookMessage -``` -Entry: `search('executeIntegrationRest')` → `graph(down)` - ---- - -## Dynamic Patterns - -These patterns are **not visible via import edges**. The graph connects them via virtual nodes — but only if the dispatch target is a string literal in source. - -### A. DDP Method Dispatch -``` -sdk.call('sendMessage') → virtual node 'sendMessage' -Meteor.methods({ sendMessage: fn }) → virtual node 'sendMessage' → fn -``` -`graph('sendMessage', up)` shows the client caller. `graph('sendMessage', down)` shows the server handler. - -### B. Callbacks Event System -``` -callbacks.run('afterSaveMessage') → virtual node 'afterSaveMessage' -callbacks.add('afterSaveMessage', handler) → virtual node 'afterSaveMessage' → handler -``` -Use `graph('afterSaveMessage', down, edgeTypes=['event_listen'])` to find all registered handlers. - -### C. Meteor Pub/Sub -``` -Meteor.subscribe('roomMessages') → virtual node 'roomMessages' -Meteor.publish('roomMessages', fn) → virtual node 'roomMessages' → fn -``` - -### D. core-services Bus -Services do NOT call each other directly — they go through a broker. -``` -ServiceName.method(args) → proxify('ServiceName') → LocalBroker → ServiceClass instance -``` -If you can't find a service implementation via `graph`, search for the `ServiceClass` with `name = 'ServiceName'`. - -### E. Message Rendering (data pipeline, not a call chain) -``` -message.msg → parse() → Root AST → -``` -`graph` cannot traverse this. Use `implement` on each step directly. - -### F. Blaze → React (legacy portals) -Some pages use HTML/Blaze templates. React mounts into them via `createPortal`. If you find a `.html` template, look for the React counterpart in a nearby `portals/` or `views/` directory. - -### G. Fuselage components -``, `