Skip to content

Fix/export workspace path override#114

Merged
wpak-ai merged 5 commits into
masterfrom
fix/export-workspace-path-override
Jun 23, 2026
Merged

Fix/export workspace path override#114
wpak-ai merged 5 commits into
masterfrom
fix/export-workspace-path-override

Conversation

@clean6378-max-it

@clean6378-max-it clean6378-max-it commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Closes #107

Summary

Replaces the CLI --base-dir workspace-path bypass that mutated os.environ["WORKSPACE_PATH"] with an explicit resolve_workspace_path(override=...) parameter. Stacks on Monday PR 1 (refactor/export-engine-consolidation); the core override plumbing landed there during review — this PR adds dedicated regression tests and a doc comment fix.

Sprint item Chen § #4 (3 pt, Medium).

Changes

  • utils/workspace_path.pyresolve_workspace_path(*, override: str | None = None); call-time override takes precedence over module override, env, and default; does not mutate WORKSPACE_PATH
  • scripts/export.pyresolve_workspace_path(override=opts.get("base_dir")); os.environ["WORKSPACE_PATH"] mutation removed
  • tests/test_export_base_dir_override.py (new) — asserts --base-dir is passed as override= and leaves WORKSPACE_PATH unchanged
  • tests/test_workspace_path_thread_safety.py — explicit-override precedence test (from PR 1 stack)

Out of scope

Acceptance criteria

  • scripts/export.py passes --base-dir explicitly to resolve_workspace_path() instead of mutating os.environ
  • os.environ["WORKSPACE_PATH"] mutation line removed
  • resolve_workspace_path() accepts optional override parameter
  • Existing export / workspace_path / thread-safety tests pass
  • Thread-safety tests unaffected
  • PR approved by at least 1 reviewer

Test plan

  • mypy --strict utils/workspace_path.py scripts/export.py
  • pytest tests/test_export_base_dir_override.py tests/test_workspace_path_thread_safety.py tests/test_cli_export_e2e.py -q
  • pytest -k "workspace_path or export or thread" -q
  • python scripts/export.py --base-dir <workspaceStorage> --help

Merge note

Base branch: refactor/export-engine-consolidation (PR 1 must merge first, or merge both in order).

Summary by CodeRabbit

  • Bug Fixes

    • Improved export robustness by skipping invalid composer rows and treating corrupt stored values more safely.
    • Fixed CLI session exports to derive session creation timestamps from provided metadata when available.
  • Improvements

    • Improved workspace project listing performance by adding cache-hit behavior when caching is enabled.
    • Enhanced --base-dir handling to correctly override workspace path resolution without permanently changing WORKSPACE_PATH.
  • Tests

    • Added regression coverage for --base-dir override behavior and environment variable preservation.

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9d5e46bb-da4d-41b1-b3b4-770e8d59a041

📥 Commits

Reviewing files that changed from the base of the PR and between dbec81e and 0587356.

📒 Files selected for processing (1)
  • services/export_engine.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • services/export_engine.py

📝 Walkthrough

Walkthrough

Adds a keyword-only override parameter to resolve_workspace_path to eliminate os.environ mutation in the CLI export path. Fixes TypeError handling during IDE composer JSON parsing and corrects CLI session createdAt timestamp conversion via to_epoch_ms. Refactors list_workspace_projects to use fingerprint-based cache lookups with pre-collected entries. Adds regression tests for the override parameter behavior.

Changes

Workspace Path Override, Export Engine Fixes, and Workspace Listing Caching

Layer / File(s) Summary
resolve_workspace_path override parameter and regression tests
utils/workspace_path.py, tests/test_export_base_dir_override.py
Adds override: str | None = None to resolve_workspace_path as a call-time override with highest resolution precedence. Updates module comments to document POST /api/set-workspace vs. CLI --base-dir distinction. Regression tests verify override is forwarded and WORKSPACE_PATH env var is not mutated.
Export engine robustness: composer parsing and session timestamps
services/export_engine.py, api/export_api.py
IDE composer export validates key contains : delimiter and skips rows with TypeError in JSON parsing. CLI session export converts meta["createdAt"] through to_epoch_ms instead of raw epoch-millis. api/export_api.py receives non-functional formatting normalization.
Workspace listing fingerprint-based caching and orchestration refactor
services/workspace_listing.py
Adds fingerprint_workspace_storage and global_storage_db_path imports. Refactors list_workspace_projects to pre-collect entries, compute fingerprint, and return early on cache hit. Rewrites _build_workspace_projects_uncached to accept WorkspaceOrchestration and derive ctx/workspace_entries from it.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

  • cppalliance/cppa-cursor-browser#54: Both PRs modify utils/workspace_path.py around resolve_workspace_path reading _workspace_path_override under a lock; the earlier PR introduced the lock semantics and this PR adds the call-time override parameter above it.
  • cppalliance/cppa-cursor-browser#112: Both PRs modify services/export_engine.py and services/workspace_listing.py; this PR applies incremental robustness fixes and orchestration refactoring to the shared export engine and listing logic extracted in #112.
  • cppalliance/cppa-cursor-browser#61: Both PRs modify services/workspace_listing.py around list_workspace_projects and its orchestration flow; this PR refactors the caching and orchestration integration while the earlier PR focused on loader helpers.

Suggested reviewers

  • bradjin8
  • wpak-ai
  • timon0305

Poem

🐇 Hoppity-hop, no more os.environ to poke,
The override param means no global env smoke.
TypeError caught, and createdAt now aligned,
The fingerprint cache leaves stale results behind.
This bunny approves — clean paths, no sleight of hand! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: fixing the export script's workspace path override mechanism by replacing environment mutation with explicit parameter passing.
Linked Issues check ✅ Passed All acceptance criteria from issue #107 are met: resolve_workspace_path() accepts override parameter, scripts/export.py passes --base-dir explicitly, environment mutation is eliminated, and new regression tests verify the behavior.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #107 scope: workspace path override parameter, export script updates, and regression tests. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/export-workspace-path-override

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
services/workspace_listing.py (1)

93-100: 🚀 Performance & Scalability | 🔵 Trivial | 🏗️ Heavy lift

Keep project-cache hits from doing full orchestration work.

prepare_workspace_orchestration() resolves workspace context and display maps before get_cached_projects(), but cache hits only need the fingerprint. Consider splitting fingerprint preparation from full orchestration or making the context/display-map fields lazy so cached workspace listings return cheaply.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/workspace_listing.py` around lines 93 - 100, The issue is that
prepare_workspace_orchestration() performs expensive work resolving workspace
context and display maps before checking get_cached_projects() for a cache hit.
Restructure the logic to compute only the fingerprint first (before full
orchestration), check get_cached_projects() with that fingerprint, and only
invoke the complete prepare_workspace_orchestration() if the cache check misses,
avoiding unnecessary work when returning cached results.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/export.py`:
- Around line 171-180: In the _read_last_export_ms function, after loading the
JSON with json.load(f) into the st variable, add a type validation check to
ensure st is a dictionary before calling st.get("lastExportTime"). If st is not
a dictionary (for example, if the JSON file contains an array, string, or
number), return 0 to treat it as unreadable state and fall back to full export.
You can either add an isinstance(st, dict) check before the st.get() call or add
AttributeError to the exception handler tuple to catch this case.

In `@services/export_engine.py`:
- Around line 371-382: The createdAt value from meta.get("createdAt") is used
directly without normalization in the assignment to created_ms, but this value
may be in seconds, an ISO string, or milliseconds depending on the source. This
causes issues in the max() comparison with db_mtime_ms (which is in
milliseconds) and elsewhere in the code like messages_to_bubbles() and filename
timestamps. Normalize the createdAt value to milliseconds before assigning it to
created_ms by adding logic that detects and converts seconds timestamps, ISO
strings, or other formats to milliseconds before the fallback to
datetime.now().timestamp() * 1000.
- Around line 217-225: The exception handler for json.loads(row["value"]) in the
try-except block is missing TypeError, which can be raised when row["value"] is
a non-string type like None or int. Update the except clause that currently
catches (json.JSONDecodeError, ValueError) to also include TypeError in the
tuple of exceptions, ensuring that malformed non-string values are logged and
skipped consistently with other similar error handling patterns in the codebase.

---

Nitpick comments:
In `@services/workspace_listing.py`:
- Around line 93-100: The issue is that prepare_workspace_orchestration()
performs expensive work resolving workspace context and display maps before
checking get_cached_projects() for a cache hit. Restructure the logic to compute
only the fingerprint first (before full orchestration), check
get_cached_projects() with that fingerprint, and only invoke the complete
prepare_workspace_orchestration() if the cache check misses, avoiding
unnecessary work when returning cached results.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f08194c8-1cc4-4c57-919a-1b3fabf9336f

📥 Commits

Reviewing files that changed from the base of the PR and between ee6a6e7 and 196f2b6.

📒 Files selected for processing (10)
  • api/export_api.py
  • pyproject.toml
  • scripts/export.py
  • services/export_engine.py
  • services/workspace_listing.py
  • tests/test_api_export.py
  • tests/test_export_base_dir_override.py
  • tests/test_export_engine.py
  • tests/test_workspace_path_thread_safety.py
  • utils/workspace_path.py
💤 Files with no reviewable changes (1)
  • pyproject.toml

Comment thread scripts/export.py Outdated
Comment thread services/export_engine.py
Comment thread services/export_engine.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
services/workspace_listing.py (1)

96-118: 🚀 Performance & Scalability | 🔵 Trivial

Duplicated fingerprint computation introduces a latent cache-key divergence risk.

The inline block (lines 100–108) recomputes global_storage_db_path, get_cli_chats_path, and fingerprint_workspace_storage identically to how prepare_workspace_orchestration does so internally (export_engine.py lines 129–135). On a cache miss this duplicates I/O, and more critically, the read key (fingerprint, line 109) and write key (orch.fingerprint, line 124) are produced by two separate copies of the same logic. They are byte-identical today, but if either site's conditional gating (e.g., os.path.isfile/os.path.isdir) or the rules argument ever change, cache reads and writes would key on different fingerprints, silently breaking the cache.

Consider threading the precomputed fingerprint into prepare_workspace_orchestration to maintain a single source of truth while preserving the cheap cache-hit short-circuit before the expensive ctx/display-map resolution.

♻️ Sketch: pass the computed fingerprint through
     orch = prepare_workspace_orchestration(
         workspace_path,
         rules,
         nocache=effective_nocache,
         workspace_entries=workspace_entries,
+        fingerprint=fingerprint if not effective_nocache else None,
     )

This requires adding a keyword-only fingerprint: dict[str, Any] | None = None parameter to prepare_workspace_orchestration that, when provided, is used instead of recomputing. That keeps a single fingerprint definition and removes the duplicate stat lookups on the cache-miss path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/workspace_listing.py` around lines 96 - 118, The fingerprint
computation in the inline block (lines 100-108) duplicates the identical logic
that prepare_workspace_orchestration performs internally, creating a cache-key
divergence risk if either implementation changes. Add a keyword-only fingerprint
parameter to prepare_workspace_orchestration with a default value of None, then
modify the function to use the provided fingerprint when available instead of
recomputing it. In the calling code within workspace_listing.py, pass the
computed fingerprint value to prepare_workspace_orchestration to eliminate the
duplication while preserving the cache-hit short-circuit before expensive
orchestration operations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@services/workspace_listing.py`:
- Around line 96-118: The fingerprint computation in the inline block (lines
100-108) duplicates the identical logic that prepare_workspace_orchestration
performs internally, creating a cache-key divergence risk if either
implementation changes. Add a keyword-only fingerprint parameter to
prepare_workspace_orchestration with a default value of None, then modify the
function to use the provided fingerprint when available instead of recomputing
it. In the calling code within workspace_listing.py, pass the computed
fingerprint value to prepare_workspace_orchestration to eliminate the
duplication while preserving the cache-hit short-circuit before expensive
orchestration operations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fa81b06d-e9a9-448b-abd6-114fc4760a91

📥 Commits

Reviewing files that changed from the base of the PR and between 196f2b6 and d1d8101.

📒 Files selected for processing (3)
  • scripts/export.py
  • services/export_engine.py
  • services/workspace_listing.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • services/export_engine.py
  • scripts/export.py

Comment thread api/export_api.py
Comment thread tests/test_export_base_dir_override.py
Comment thread scripts/export.py
Comment thread services/export_engine.py Outdated
Comment thread services/export_engine.py Outdated
@clean6378-max-it clean6378-max-it force-pushed the fix/export-workspace-path-override branch from 14c256f to dbec81e Compare June 23, 2026 19:26

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
services/export_engine.py (1)

386-392: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Guard session["meta"] shape before .get() calls.

At Lines 386-392, if meta is present but not a dict (or None), this raises AttributeError before the session-level read/parse guard, causing the whole export to fail on one bad session.

Suggested fix
-            meta = session.get("meta", {})
+            raw_meta = session.get("meta")
+            meta = raw_meta if isinstance(raw_meta, dict) else {}
             session_id = session["session_id"]
             created_raw = meta.get("createdAt")
             created_ms = to_epoch_ms(created_raw) if created_raw else int(
                 datetime.now().timestamp() * 1000,
             )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/export_engine.py` around lines 386 - 392, The code retrieves `meta`
from the session dictionary but does not validate that it is actually a
dictionary before calling `.get()` on it. If the session contains a non-dict
value for the meta key (such as None or a string), calling
`meta.get("createdAt")` or `meta.get("name")` will raise an AttributeError. Add
a type guard to check that `meta` is a dictionary before using its `.get()`
methods, defaulting it to an empty dictionary if it is not the correct type, so
that subsequent `.get()` calls operate safely without raising AttributeError.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/export_engine.py`:
- Line 233: The composer_id extraction on row["key"].split(":")[1] will raise an
IndexError if the key doesn't contain a colon, causing the entire export to
abort instead of gracefully handling malformed rows. Add validation before
splitting to check if the key contains the colon separator, and implement proper
error handling to skip malformed rows or provide a fallback value so the export
can continue processing remaining rows without interruption.

---

Duplicate comments:
In `@services/export_engine.py`:
- Around line 386-392: The code retrieves `meta` from the session dictionary but
does not validate that it is actually a dictionary before calling `.get()` on
it. If the session contains a non-dict value for the meta key (such as None or a
string), calling `meta.get("createdAt")` or `meta.get("name")` will raise an
AttributeError. Add a type guard to check that `meta` is a dictionary before
using its `.get()` methods, defaulting it to an empty dictionary if it is not
the correct type, so that subsequent `.get()` calls operate safely without
raising AttributeError.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f1e18b2d-d529-43cd-873f-05d7721079cc

📥 Commits

Reviewing files that changed from the base of the PR and between 14c256f and dbec81e.

📒 Files selected for processing (5)
  • api/export_api.py
  • services/export_engine.py
  • services/workspace_listing.py
  • tests/test_export_base_dir_override.py
  • utils/workspace_path.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/test_export_base_dir_override.py
  • utils/workspace_path.py
  • services/workspace_listing.py

Comment thread services/export_engine.py Outdated
@timon0305 timon0305 requested a review from wpak-ai June 23, 2026 19:31
@wpak-ai wpak-ai merged commit ae468e9 into master Jun 23, 2026
16 checks passed
@wpak-ai wpak-ai deleted the fix/export-workspace-path-override branch June 23, 2026 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cppa-cursor-browser: Export script — fix os.environ workspace path bypass

3 participants