Self-improvement: automation ledger, fact trust, hook analytics, test isolation#178
Conversation
…nd test isolation - accept low/medium/high bucket trust labels in session reflector fact proposals and clarify the numeric-trust prompt instruction - stop persisting consecutive identical scheduler-skip run records that flooded the automation run ledger every tick - append hook_analytics.jsonl lines via a single O_APPEND write so concurrent hook processes no longer corrupt or drop entries - isolate branch_db_safety_test under a throwaway profile home so it stops writing corrupt branch-meta.json and stale registry rows into the real ~/.tracedecay store
|
Graceful shutdown persists token counters and checkpoints WALs for every live project server sequentially, which can exceed systemd's stop timeout and end in a SIGKILL mid-checkpoint. Cap shutdown work at 45s, log the timeout outcome, and abort the stalled task; SQLite WAL keeps remaining state crash-safe.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: deb256e427
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| _env_lock: tokio::sync::MutexGuard<'static, ()>, | ||
| storage: TraceDecayStorageEnvGuard, |
There was a problem hiding this comment.
Keep the env lock until storage guards drop
When an IsolatedEnv is dropped, struct fields are dropped in declaration order, so _env_lock is released before storage restores HOME, TRACEDECAY_DATA_DIR, and the global DB override. If another test in this binary is waiting, it can acquire the lock and install its own isolated env while this guard's TraceDecayStorageEnvGuard then restores the old values over it, defeating the isolation this helper is meant to provide. Declare the lock after storage (or add a custom Drop) so it is dropped last.
Useful? React with 👍 / 👎.
…20260701 # Conflicts: # src/automation/runner.rs # tests/automation_session_reflector_runner_test.rs
After refreshing the binary, plugins, and daemon, `tracedecay update` now re-execs a post-update health pass: applies idempotent global-DB schema migrations, quarantines corrupt branch-meta.json files as branch-meta.json.corrupt-<timestamp>, purges stale registry rows under the system temp dir, and summarizes remaining doctor findings. The pass is failure-tolerant (warnings, never update failure) and skippable with --no-heal.
branch_meta now owns the one canonical parse used by both load_branch_meta and the post-update heal quarantine, so schema-corrupt files (valid JSON, wrong shape) are quarantined instead of warning on every open. Restructures the health pass into compute/render, fetches the registry once, makes stale_code_projects borrow with a named StaleRootScope predicate, adds a shared 0o600-at-create private-open helper in PrivateStoreIo, and documents the heal-by-default policy. Adds unit + integration tests for the schema-corrupt quarantine path.
The scheduler gate now loads the run ledger once and threads the records through the run context, so gate-level and post-gate skip dedup share that one read and append_skipped_record is a pure append-unless-repeat with no second I/O pass. Also inlines tokio::time::timeout for the daemon shutdown deadline (a panic in shutdown_all no longer reads as success) and derives the session-reflector trust-label representatives from named memory::trust constants with a drift-guard test.
Moves the update/post-update wiring (plugin refresh, daemon refresh, subprocess re-exec, health pass) into src/update_cmd.rs following the *_cmd convention, bringing main.rs to 871 lines. Also promotes the branch-DB tests' IsolatedEnv into tests/common as the canonical env-isolation helper.
…20260701 # Conflicts: # src/daemon.rs
…20260701 # Conflicts: # src/doctor.rs
…20260701 # Conflicts: # src/main.rs
Summary
Findings from a TraceDecay self-audit (session transcript mining +
doctor+ automation run/fact-proposal logs), fixed in one pass:"trust": "high"and the validator rejected every proposal. The validator now acceptslow/medium/highbucket labels (mapped to 0.15/0.5/0.85) and the prompt states the numeric requirement.skipped / scheduler_interval_not_elapsedrecords (1500+ noise rows). Consecutive identical scheduler skips per task now persist once; manual-trigger skips and reason/task transitions still persist.hook_analytics.jsonl: concurrent hook processes raced a read-modify-rewrite append, merging/dropping lines. Appends now use a singleO_APPENDwrite (PrivateStoreIo::append_line).branch_db_safety_test.rsran against the developer's~/.tracedecay, leaving 111 corruptbranch-meta.jsonfiles and ~7k stale registry rows (now repaired locally). The suite now runs under an isolated throwaway home (IsolatedEnv+TraceDecayStorageEnvGuard).Follow-up commits on this branch (in progress): daemon SIGTERM/SEGV investigation and an automatic post-update health pass for
tracedecay update.Test plan
cargo fmt --all -- --check,cargo clippy --all-targets -- -D warningscargo test --test automation_session_reflector_runner_test(7/7)cargo test --test branch_db_safety_test(5/5, verified no writes to the real~/.tracedecayduring the run)cargo test --lib automation::lifecycle(4/4, incl. 3 new skip-dedupe tests) andcargo test --lib hooks(36/36)