feat: cache retrieved docs and add PyPI package docs lookup by ayhammouda · Pull Request #9 · ayhammouda/python-docs-mcp-server

ayhammouda · 2026-05-07T21:38:51Z

Summary

Fixes #8
Fixes #6

Changes

Adds an on-disk SQLite cache for completed get_docs results so retrieved docs survive MCP client/server restarts.
Keys cached docs by local index fingerprint, resolved Python docs version, slug, anchor, max_chars, and start_index.
Ignores stale cache entries automatically after the local index.db is rebuilt or replaced.
Adds lookup_package_docs, a controlled PyPI metadata lookup tool that returns only PyPI/project-declared docs/homepage/source URLs and makes the pypi-declared-metadata trust boundary explicit.
Documents cache location, cached fields, invalidation behavior, and the PyPI lookup scope.

Testing

uv run ruff check src/ tests/
uv run pyright src/
uv run pytest --tb=short -q — 247 passed, 3 skipped

Notes

lookup_package_docs does not crawl or fetch arbitrary package documentation pages. It queries the official PyPI JSON API and returns declared source URLs only, so it stays inside the controlled trust boundary instead of becoming a generic web search.

Summary by CodeRabbit

Release Notes

New Features
- Added PyPI package documentation lookup tool returning controlled package-declared sources
- Implemented persistent caching for doc retrieval results across server restarts
Bug Fixes
- Fixed MCP protocol stdio framing to output JSON-RPC exclusively to stdout
Documentation
- Updated README with PyPI package lookup and retrieved docs cache information
Tests
- Added validation tests for persistent caching, PyPI metadata lookup, and protocol framing

coderabbitai · 2026-05-08T21:09:04Z

📝 Walkthrough

Walkthrough

Adds persistent SQLite caching for get_docs results across server restarts, introduces PyPI-restricted package metadata lookup with controlled source filtering, and updates symbol search to resolve slug/anchor from database instead of URI. Includes transport fix for MCP stdout consistency and comprehensive test coverage.

Changes

Persistent Docs Cache and PyPI Package Lookup

Layer / File(s)	Summary
Data Models `src/mcp_server_python_docs/models.py`	New Pydantic models `PackageDocsInput`, `PackageDocsSource`, and `PackageDocsResult` define the contract for PyPI package metadata lookup with trust boundary and source labeling.
Persistent Cache Implementation `src/mcp_server_python_docs/services/persistent_cache.py`	SQLite-backed cache for `get_docs` results keyed by index fingerprint, version, slug, anchor, budget, and pagination; gracefully degrades on corruption; tracks hit/miss/write statistics.
PyPI Service `src/mcp_server_python_docs/services/package_docs.py`	Fetches official PyPI JSON metadata with bounded response size, extracts and allowlists declared documentation/homepage/source/repository URLs, and returns controlled-scope error notes for missing packages and network failures.
Service Integration `src/mcp_server_python_docs/services/content.py`, `src/mcp_server_python_docs/app_context.py`	ContentService updated to check persistent cache before computing get_docs and to store results after truncation; AppContext extended with PackageDocsService and PersistentDocsCache dependencies.
Server Wiring `src/mcp_server_python_docs/server.py`	Initializes persistent cache in lifespan, injects into ContentService, registers new async `lookup_package_docs` MCP tool with openWorldHint, and manages cache lifecycle (close on shutdown before DB close).
Symbol Resolution `src/mcp_server_python_docs/retrieval/ranker.py`	Adds helpers to resolve symbol URIs into database-backed `(slug, anchor)` pairs via document/section queries; updates `search_symbols` and `lookup_symbols_exact` to use resolved locations instead of deriving from URI.
Transport Fix & Protocol Tests `src/mcp_server_python_docs/__main__.py`, `tests/test_stdio_smoke.py`	Restores `sys.stdout` to `sys.__stdout__` after fd restoration; adds stricter JSON-RPC frame validation asserting all protocol goes to stdout only, none to stderr.
Persistent Cache Tests `tests/test_persistent_docs_cache.py`	Validates cache persistence across restarts, version key isolation, index-change invalidation, anchor null/empty distinction, budget/pagination keying, and best-effort fallback for corrupted cache or invalid JSON.
PyPI Service Tests `tests/test_package_docs.py`	Verifies official PyPI endpoint usage, trust boundary application, allowlisted source filtering, missing-package 404 handling, non-404 HTTP errors, oversized response rejection, and network/JSON decode error reporting.
MCP Smoke Tests `tests/test_mcp_get_docs_cache_smoke.py`	Subprocess-based integration test: verifies full-page and section-scoped `get_docs`, confirms cache row creation, restarts server to validate cache reuse, validates error handling for invalid anchors, and confirms fallback after cache corruption.
Tool Registration Tests `tests/test_services.py`	Updated to assert five MCP tools are registered and adds test for symbol hits on extensionless document slugs.
Documentation `README.md`, `PR9_MCP_TEST_PLAN.md`, `PR9_SMOKE_FAILURE_FIX_PLAN.md`	README describes five tools, adds "Retrieved docs cache" and "PyPI package docs lookup" sections with cache keys and trust boundaries; test plan specifies MCP-level validation; fix plan documents root causes and retest strategy.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

The PR introduces two independent feature areas (persistent caching and PyPI lookup) with significant logic density across multiple services, database schema changes, subprocess-based integration tests, and careful transport/framing fixes. The changes span models, services, server wiring, retrieval logic, and comprehensive test suites with heterogeneous concerns (caching, HTTP fetching, SQLite operations, MCP protocol). While individual components are well-scoped, the interdependencies between cache/content/server and the breadth of test scenarios require careful attention to correctness and edge cases.

Poem

🐰 A rabbit hops through docs with glee,
Caching pages for the next spree,
PyPI's official truth now near,
Persistent across restarts, crystal clear!
Protocol streams pure from stdout's mouth,
No stderr chaos, just the honest route.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 29.76% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately summarizes the two main changes: persistent caching of retrieved docs and addition of PyPI package docs lookup functionality.
Linked Issues check	✅ Passed	The PR fully implements both linked issues: `#8` (persistent cache with version/slug/anchor/budget keys, index fingerprint invalidation, cache fallback) and `#6` (PyPI metadata lookup for package-declared docs/homepage/source URLs with explicit trust boundary).
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the linked issues: persistent cache implementation, PyPI service, related models/tests, documentation updates, and supporting infrastructure (stdout hygiene fix, test utilities).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/open-issues-cache-pypi-docs

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (2)

tests/test_mcp_get_docs_cache_smoke.py (1)

10-16: ⚡ Quick win

Extract shared stdio helpers into a dedicated test utility module.

Importing private helpers from another test module tightly couples these tests and makes future refactors noisier. Move these helpers into a shared tests support module and import both smoke suites from there.

Refactor sketch

-from tests.test_stdio_smoke import (
+from tests.support.stdio_helpers import (
     _assert_protocol_on_stdout_only,
     _find_response,
     _isolated_cache_env,
     _make_notification,
     _make_request,
 )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_mcp_get_docs_cache_smoke.py` around lines 10 - 16, Move the
private stdio helper functions into a shared test helpers module and update
imports: create a new tests support module (e.g., tests.test_helpers or
tests._helpers) and relocate the functions _assert_protocol_on_stdout_only,
_find_response, _isolated_cache_env, _make_notification, and _make_request into
that module; then change imports in this test (and in tests that currently
import from tests.test_stdio_smoke) to import those helpers from the new module
instead of importing them from _stdio_smoke. Ensure the new module exports the
same helper names and update any relative import paths in affected tests to
reference the new module.

PR9_MCP_TEST_PLAN.md (1)

23-26: ⚡ Quick win

Make setup steps environment-agnostic.

The hardcoded workspace path and branch-specific checkout make this plan hard to run outside your local/review environment. Prefer generic “repo root” instructions plus optional branch notes.

Proposed edit

-cd /srv/openclaw/.openclaw/workspace/tmp/python-docs-mcp-review
-git checkout fix/open-issues-cache-pypi-docs || git checkout review-pr-9
-git pull --ff-only origin fix/open-issues-cache-pypi-docs
+cd <repo-root>
+# optional: checkout the PR branch under test
+# git checkout fix/open-issues-cache-pypi-docs
+git pull --ff-only

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@PR9_MCP_TEST_PLAN.md` around lines 23 - 26, The setup steps in
PR9_MCP_TEST_PLAN.md are environment-specific—replace the hardcoded path "cd
/srv/openclaw/.openclaw/workspace/tmp/python-docs-mcp-review" with a generic
repo-root instruction (e.g., "cd <repo-root> or open project root"), and change
the branch checkout commands "git checkout fix/open-issues-cache-pypi-docs ||
git checkout review-pr-9" to suggest a generic branch instruction (e.g., "git
fetch && git checkout <branch-or-PR-branch> (or use the PR branch name)") plus
an optional note showing the exact branch names used for this review; keep "git
pull --ff-only origin <branch>" and "uv sync --all-extras" but parametrize the
branch placeholder so the plan is runnable in any environment.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@PR9_MCP_TEST_PLAN.md`:
- Line 42: Update the pytest summary line in PR9_MCP_TEST_PLAN.md that currently
reads "254 passed, 3 skipped" to reflect the PR's actual test totals "247
passed, 3 skipped" so the documented expected results match the reported CI/PR
summary and avoid false FAILs.
- Line 71: Change the headings and inline phrases to use hyphenated compound
adjectives for consistency: replace "Full page retrieval" with "Full-page
retrieval" and any occurrences of "page not found style response" or similar
with "page-not-found-style response" (check the heading at "### 2.1 Full page
retrieval" and the repeated occurrence around lines noted as 111-111) so all
compound adjectives in PR9_MCP_TEST_PLAN.md follow the hyphenated style.

In `@src/mcp_server_python_docs/retrieval/ranker.py`:
- Around line 83-99: The current _resolve_symbol_location logic returns empty
string "" for a missing anchor which makes get_docs treat it as a section hit;
change the return value to use None (or normalize blank anchors to None) when
there is no valid section anchor. Specifically, in the branch that checks
section_row and the final return, return doc_row["slug"], None (or convert
section_row["anchor"] or "" into None) instead of "", and apply the same change
in the other occurrences referenced (the other blocks around lines with similar
logic) so doc_row, fallback_anchor, section_row and _resolve_symbol_location
consistently return None for “no anchor.”

In `@src/mcp_server_python_docs/services/persistent_cache.py`:
- Around line 32-35: The shared sqlite connection (self._conn) is used with
check_same_thread=False but not protected against concurrent access; serialize
DB access by adding a threading.Lock (e.g., self._db_lock created where the
connection is initialized) and wrap every database operation (methods like
get(), put(), and any other places that call self._conn.execute/commit) inside a
with self._db_lock: block (or acquire/release) so execute()/commit() calls
cannot run concurrently; alternatively, change the implementation to open a
short-lived sqlite3.connect(...) per operation and close it after the
query—apply the same change for all DB use sites referenced around the
__init__/connection code and the blocks noted (lines ~66-99, 101-123).

In `@tests/test_stdio_smoke.py`:
- Around line 151-162: The helper _assert_protocol_on_stdout_only currently
verifies JSON-RPC frames but doesn't assert the process exit status; update this
function to check result.returncode == 0 (or use result.check_returncode()) and
assert a successful exit with a clear message so smoke tests fail if the server
exited non-zero; locate the function _assert_protocol_on_stdout_only and add a
return-code assertion after reading/parsing stdout/stderr responses.

---

Nitpick comments:
In `@PR9_MCP_TEST_PLAN.md`:
- Around line 23-26: The setup steps in PR9_MCP_TEST_PLAN.md are
environment-specific—replace the hardcoded path "cd
/srv/openclaw/.openclaw/workspace/tmp/python-docs-mcp-review" with a generic
repo-root instruction (e.g., "cd <repo-root> or open project root"), and change
the branch checkout commands "git checkout fix/open-issues-cache-pypi-docs ||
git checkout review-pr-9" to suggest a generic branch instruction (e.g., "git
fetch && git checkout <branch-or-PR-branch> (or use the PR branch name)") plus
an optional note showing the exact branch names used for this review; keep "git
pull --ff-only origin <branch>" and "uv sync --all-extras" but parametrize the
branch placeholder so the plan is runnable in any environment.

In `@tests/test_mcp_get_docs_cache_smoke.py`:
- Around line 10-16: Move the private stdio helper functions into a shared test
helpers module and update imports: create a new tests support module (e.g.,
tests.test_helpers or tests._helpers) and relocate the functions
_assert_protocol_on_stdout_only, _find_response, _isolated_cache_env,
_make_notification, and _make_request into that module; then change imports in
this test (and in tests that currently import from tests.test_stdio_smoke) to
import those helpers from the new module instead of importing them from
_stdio_smoke. Ensure the new module exports the same helper names and update any
relative import paths in affected tests to reference the new module.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 1968fe20-5be0-4725-92d0-2853ce1948b3

📥 Commits

Reviewing files that changed from the base of the PR and between 8f6736c and 694c205.

📒 Files selected for processing (17)

PR9_MCP_TEST_PLAN.md
PR9_SMOKE_FAILURE_FIX_PLAN.md
README.md
src/mcp_server_python_docs/__main__.py
src/mcp_server_python_docs/app_context.py
src/mcp_server_python_docs/models.py
src/mcp_server_python_docs/retrieval/ranker.py
src/mcp_server_python_docs/server.py
src/mcp_server_python_docs/services/content.py
src/mcp_server_python_docs/services/package_docs.py
src/mcp_server_python_docs/services/persistent_cache.py
tests/conftest.py
tests/test_mcp_get_docs_cache_smoke.py
tests/test_package_docs.py
tests/test_persistent_docs_cache.py
tests/test_services.py
tests/test_stdio_smoke.py

coderabbitai · 2026-05-08T21:18:31Z

+
+- Ruff passes.
+- Pyright passes for `src/`.
+- Pytest passes: currently expected `254 passed, 3 skipped`.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the expected pytest totals to match this PR context.

Line 42 says 254 passed, 3 skipped, but the PR summary for this change set reports 247 passed, 3 skipped. This can cause false FAIL interpretation during manual validation.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@PR9_MCP_TEST_PLAN.md` at line 42, Update the pytest summary line in PR9_MCP_TEST_PLAN.md that currently reads "254 passed, 3 skipped" to reflect the PR's actual test totals "247 passed, 3 skipped" so the documented expected results match the reported CI/PR summary and avoid false FAILs.

coderabbitai · 2026-05-08T21:18:31Z

+
+## Test Area 2 — `get_docs` Functional Retrieval
+
+### 2.1 Full page retrieval


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use hyphenated compound adjectives for consistency.

Consider “Full-page retrieval” and “page-not-found-style response” for clearer technical writing.

Also applies to: 111-111

🧰 Tools

🪛 LanguageTool

[uncategorized] ~71-~71: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...get_docs` Functional Retrieval ### 2.1 Full page retrieval Call: ```text get_docs(slug...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@PR9_MCP_TEST_PLAN.md` at line 71, Change the headings and inline phrases to use hyphenated compound adjectives for consistency: replace "Full page retrieval" with "Full-page retrieval" and any occurrences of "page not found style response" or similar with "page-not-found-style response" (check the heading at "### 2.1 Full page retrieval" and the repeated occurrence around lines noted as 111-111) so all compound adjectives in PR9_MCP_TEST_PLAN.md follow the hyphenated style.

coderabbitai · 2026-05-08T21:18:31Z

+    if doc_row is None:
+        return fallback_slug, fallback_anchor
+
+    if fallback_anchor:
+        section_row = conn.execute(
+            """
+            SELECT anchor
+            FROM sections
+            WHERE document_id = ? AND anchor = ?
+            LIMIT 1
+            """,
+            (doc_row["id"], fallback_anchor),
+        ).fetchone()
+        if section_row is not None:
+            return doc_row["slug"], section_row["anchor"] or ""
+
+    return doc_row["slug"], ""


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't emit anchor="" for document-level symbol hits.

When _resolve_symbol_location() falls back to a page-only location, both symbol search paths surface an empty-string anchor. get_docs() treats any non-None anchor as a section lookup, so forwarding that hit verbatim turns a retrievable page hit into PageNotFoundError. Return None for “no anchor”, or normalize blank anchors before exposing them downstream.

Also applies to: 235-247, 352-365

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/mcp_server_python_docs/retrieval/ranker.py` around lines 83 - 99, The current _resolve_symbol_location logic returns empty string "" for a missing anchor which makes get_docs treat it as a section hit; change the return value to use None (or normalize blank anchors to None) when there is no valid section anchor. Specifically, in the branch that checks section_row and the final return, return doc_row["slug"], None (or convert section_row["anchor"] or "" into None) instead of "", and apply the same change in the other occurrences referenced (the other blocks around lines with similar logic) so doc_row, fallback_anchor, section_row and _resolve_symbol_location consistently return None for “no anchor.”

coderabbitai · 2026-05-08T21:18:31Z

+        try:
+            self._cache_path.parent.mkdir(parents=True, exist_ok=True)
+            self._conn = sqlite3.connect(str(self._cache_path), check_same_thread=False)
+            self._conn.execute("PRAGMA synchronous = NORMAL")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Per the Python sqlite3 documentation, does setting check_same_thread=Falsemake a single connection safe for concurrentexecute()/commit() calls from multiple threads, or must access still be serialized by the application?

💡 Result:

No, setting check_same_thread=False does not make a single sqlite3 connection safe for concurrent execute() or commit() calls from multiple threads. Access must still be serialized by the application, particularly for write operations, to avoid data corruption.[1][2][3] The official Python sqlite3 documentation states: "If False, the connection may be accessed in multiple threads; write operations may need to be serialized by the user to avoid data corruption. See threadsafety for more information."[1][2][3] The sqlite3.threadsafety attribute indicates the underlying SQLite library's threading mode, typically 2 (multi-thread) or 3 (serialized).[1][2] In multi-thread mode (common default), no single connection should be used simultaneously by multiple threads.[4] Even in serialized mode, Python's wrapper may require additional synchronization, as noted in CPython issues.[5] Secondary sources confirm: applications must use locks to serialize access to shared connections.[6] Concurrent writes without serialization risk database corruption or errors.[7][8][9]

Citations:

1: https://docs.python.org/3/library/sqlite3.html

2: https://docs.python.org/3.13/library/sqlite3.html

3: https://docs.python.org/3/library/sqlite3.html?highlight=connect

4: https://sqlite.org/threadsafe.html

5: sqlite3, check_same_thread should be true? python/cpython#128697

6: https://stackoverflow.com/questions/22739590/how-to-share-single-sqlite-connection-in-multi-threaded-python-application

7: Is `check_same_thread=False` Safe? fastapi/fastapi#5199

8: https://stackoverflow.com/questions/393554/python-sqlite3-and-concurrency

9: https://stackoverflow.com/questions/35717263/how-threadsafe-is-sqlite3

Serialize access to the shared cache connection.

check_same_thread=False only disables sqlite3's thread-affinity check—it does not make a single connection safe for concurrent execute() or commit() calls. Since this cache instance is shared across tool invocations, parallel get() and put() requests will race and risk data corruption or dropped writes. Wrap all database operations with a lock, or use a short-lived connection per operation.

🔒 One straightforward fix

+import threading ... class PersistentDocsCache: @@ def __init__(self, cache_path: Path, index_path: Path) -> None: @@ self._hits = self._misses = self._writes = 0 + self._lock = threading.Lock() self._conn: sqlite3.Connection | None = None @@ try: - row = self._conn.execute( - "SELECT result_json FROM retrieved_docs_cache WHERE index_fingerprint = ? " - "AND version = ? AND slug = ? AND anchor = ? AND max_chars = ? AND start_index = ?", - ( - self._fingerprint, - version, - slug, - self._anchor_key(anchor), - max_chars, - start_index, - ), - ).fetchone() + with self._lock: + row = self._conn.execute( + "SELECT result_json FROM retrieved_docs_cache WHERE index_fingerprint = ? " + "AND version = ? AND slug = ? AND anchor = ? AND max_chars = ? AND start_index = ?", + ( + self._fingerprint, + version, + slug, + self._anchor_key(anchor), + max_chars, + start_index, + ), + ).fetchone() @@ try: - self._conn.execute( - "INSERT OR REPLACE INTO retrieved_docs_cache " - "(index_fingerprint, version, slug, anchor, max_chars, start_index, result_json) " - "VALUES (?, ?, ?, ?, ?, ?, ?)", - ( - self._fingerprint, - result.version, - result.slug, - self._anchor_key(result.anchor), - max_chars, - start_index, - result.model_dump_json(), - ), - ) - self._conn.commit() + with self._lock: + self._conn.execute( + "INSERT OR REPLACE INTO retrieved_docs_cache " + "(index_fingerprint, version, slug, anchor, max_chars, start_index, result_json) " + "VALUES (?, ?, ?, ?, ?, ?, ?)", + ( + self._fingerprint, + result.version, + result.slug, + self._anchor_key(result.anchor), + max_chars, + start_index, + result.model_dump_json(), + ), + ) + self._conn.commit()

Also applies to: 66-99, 101-123

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/mcp_server_python_docs/services/persistent_cache.py` around lines 32 - 35, The shared sqlite connection (self._conn) is used with check_same_thread=False but not protected against concurrent access; serialize DB access by adding a threading.Lock (e.g., self._db_lock created where the connection is initialized) and wrap every database operation (methods like get(), put(), and any other places that call self._conn.execute/commit) inside a with self._db_lock: block (or acquire/release) so execute()/commit() calls cannot run concurrently; alternatively, change the implementation to open a short-lived sqlite3.connect(...) per operation and close it after the query—apply the same change for all DB use sites referenced around the __init__/connection code and the blocks noted (lines ~66-99, 101-123).

coderabbitai · 2026-05-08T21:18:31Z

+def _assert_protocol_on_stdout_only(result: subprocess.CompletedProcess) -> list[dict]:
+    """Assert JSON-RPC protocol frames are emitted only on stdout."""
+    responses = _read_responses(result.stdout)
+    assert responses, f"No JSON-RPC responses on stdout; stderr was: {result.stderr!r}"
+
+    for resp in responses:
+        assert "_raw" not in resp, f"Non-JSON stdout pollution: {resp.get('_raw')}"
+        assert resp.get("jsonrpc") == "2.0", f"Missing JSON-RPC frame on stdout: {resp}"
+
+    stderr_frames = _jsonrpc_frames(result.stderr)
+    assert stderr_frames == [], f"JSON-RPC frames leaked to stderr: {stderr_frames}"
+    return responses


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert successful process exit in protocol helper.

_assert_protocol_on_stdout_only validates framing but can still pass if the server emits responses and then exits non-zero. Add a return-code check to keep smoke tests aligned with the “no crash” intent.

Suggested patch

def _assert_protocol_on_stdout_only(result: subprocess.CompletedProcess) -> list[dict]: """Assert JSON-RPC protocol frames are emitted only on stdout.""" + assert result.returncode == 0, ( + f"Server exited with non-zero status {result.returncode}; stderr was: {result.stderr!r}" + ) responses = _read_responses(result.stdout) assert responses, f"No JSON-RPC responses on stdout; stderr was: {result.stderr!r}"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_stdio_smoke.py` around lines 151 - 162, The helper _assert_protocol_on_stdout_only currently verifies JSON-RPC frames but doesn't assert the process exit status; update this function to check result.returncode == 0 (or use result.check_returncode()) and assert a successful exit with a clear message so smoke tests fail if the server exited non-zero; locate the function _assert_protocol_on_stdout_only and add a return-code assertion after reading/parsing stdout/stderr responses.

bluecloud-gilfoyle Bot and others added 8 commits May 7, 2026 21:38

feat: cache docs and expose PyPI package docs lookup

61e0994

fix: bound PyPI metadata response reads

15b2bc5

fix: address PR 9 review cache and PyPI errors

5d8e8bc

docs: align PyPI lookup scope

f7fe49b

test: add PR 9 MCP smoke test plan

a54a2fe

docs: add PR 9 smoke failure fix plan

02d3821

fix: restore stdio transport stdout

d8c5c7e

fix: return retrievable slugs for symbol hits

694c205

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: cache retrieved docs and add PyPI package docs lookup#9

feat: cache retrieved docs and add PyPI package docs lookup#9
ayhammouda wants to merge 8 commits intomainfrom
fix/open-issues-cache-pypi-docs

ayhammouda commented May 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		## Test Area 2 — `get_docs` Functional Retrieval

		### 2.1 Full page retrieval

Conversation

ayhammouda commented May 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Notes

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ayhammouda commented May 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading