Skip to content

feat: cache retrieved docs and add PyPI package docs lookup#9

Open
ayhammouda wants to merge 8 commits intomainfrom
fix/open-issues-cache-pypi-docs
Open

feat: cache retrieved docs and add PyPI package docs lookup#9
ayhammouda wants to merge 8 commits intomainfrom
fix/open-issues-cache-pypi-docs

Conversation

@ayhammouda
Copy link
Copy Markdown
Owner

@ayhammouda ayhammouda commented May 7, 2026

Summary

Fixes #8
Fixes #6

Changes

  • Adds an on-disk SQLite cache for completed get_docs results so retrieved docs survive MCP client/server restarts.
  • Keys cached docs by local index fingerprint, resolved Python docs version, slug, anchor, max_chars, and start_index.
  • Ignores stale cache entries automatically after the local index.db is rebuilt or replaced.
  • Adds lookup_package_docs, a controlled PyPI metadata lookup tool that returns only PyPI/project-declared docs/homepage/source URLs and makes the pypi-declared-metadata trust boundary explicit.
  • Documents cache location, cached fields, invalidation behavior, and the PyPI lookup scope.

Testing

  • uv run ruff check src/ tests/
  • uv run pyright src/
  • uv run pytest --tb=short -q — 247 passed, 3 skipped

Notes

  • lookup_package_docs does not crawl or fetch arbitrary package documentation pages. It queries the official PyPI JSON API and returns declared source URLs only, so it stays inside the controlled trust boundary instead of becoming a generic web search.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added PyPI package documentation lookup tool returning controlled package-declared sources
    • Implemented persistent caching for doc retrieval results across server restarts
  • Bug Fixes

    • Fixed MCP protocol stdio framing to output JSON-RPC exclusively to stdout
  • Documentation

    • Updated README with PyPI package lookup and retrieved docs cache information
  • Tests

    • Added validation tests for persistent caching, PyPI metadata lookup, and protocol framing

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds persistent SQLite caching for get_docs results across server restarts, introduces PyPI-restricted package metadata lookup with controlled source filtering, and updates symbol search to resolve slug/anchor from database instead of URI. Includes transport fix for MCP stdout consistency and comprehensive test coverage.

Changes

Persistent Docs Cache and PyPI Package Lookup

Layer / File(s) Summary
Data Models
src/mcp_server_python_docs/models.py
New Pydantic models PackageDocsInput, PackageDocsSource, and PackageDocsResult define the contract for PyPI package metadata lookup with trust boundary and source labeling.
Persistent Cache Implementation
src/mcp_server_python_docs/services/persistent_cache.py
SQLite-backed cache for get_docs results keyed by index fingerprint, version, slug, anchor, budget, and pagination; gracefully degrades on corruption; tracks hit/miss/write statistics.
PyPI Service
src/mcp_server_python_docs/services/package_docs.py
Fetches official PyPI JSON metadata with bounded response size, extracts and allowlists declared documentation/homepage/source/repository URLs, and returns controlled-scope error notes for missing packages and network failures.
Service Integration
src/mcp_server_python_docs/services/content.py, src/mcp_server_python_docs/app_context.py
ContentService updated to check persistent cache before computing get_docs and to store results after truncation; AppContext extended with PackageDocsService and PersistentDocsCache dependencies.
Server Wiring
src/mcp_server_python_docs/server.py
Initializes persistent cache in lifespan, injects into ContentService, registers new async lookup_package_docs MCP tool with openWorldHint, and manages cache lifecycle (close on shutdown before DB close).
Symbol Resolution
src/mcp_server_python_docs/retrieval/ranker.py
Adds helpers to resolve symbol URIs into database-backed (slug, anchor) pairs via document/section queries; updates search_symbols and lookup_symbols_exact to use resolved locations instead of deriving from URI.
Transport Fix & Protocol Tests
src/mcp_server_python_docs/__main__.py, tests/test_stdio_smoke.py
Restores sys.stdout to sys.__stdout__ after fd restoration; adds stricter JSON-RPC frame validation asserting all protocol goes to stdout only, none to stderr.
Persistent Cache Tests
tests/test_persistent_docs_cache.py
Validates cache persistence across restarts, version key isolation, index-change invalidation, anchor null/empty distinction, budget/pagination keying, and best-effort fallback for corrupted cache or invalid JSON.
PyPI Service Tests
tests/test_package_docs.py
Verifies official PyPI endpoint usage, trust boundary application, allowlisted source filtering, missing-package 404 handling, non-404 HTTP errors, oversized response rejection, and network/JSON decode error reporting.
MCP Smoke Tests
tests/test_mcp_get_docs_cache_smoke.py
Subprocess-based integration test: verifies full-page and section-scoped get_docs, confirms cache row creation, restarts server to validate cache reuse, validates error handling for invalid anchors, and confirms fallback after cache corruption.
Tool Registration Tests
tests/test_services.py
Updated to assert five MCP tools are registered and adds test for symbol hits on extensionless document slugs.
Documentation
README.md, PR9_MCP_TEST_PLAN.md, PR9_SMOKE_FAILURE_FIX_PLAN.md
README describes five tools, adds "Retrieved docs cache" and "PyPI package docs lookup" sections with cache keys and trust boundaries; test plan specifies MCP-level validation; fix plan documents root causes and retest strategy.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

The PR introduces two independent feature areas (persistent caching and PyPI lookup) with significant logic density across multiple services, database schema changes, subprocess-based integration tests, and careful transport/framing fixes. The changes span models, services, server wiring, retrieval logic, and comprehensive test suites with heterogeneous concerns (caching, HTTP fetching, SQLite operations, MCP protocol). While individual components are well-scoped, the interdependencies between cache/content/server and the breadth of test scenarios require careful attention to correctness and edge cases.

Poem

🐰 A rabbit hops through docs with glee,
Caching pages for the next spree,
PyPI's official truth now near,
Persistent across restarts, crystal clear!
Protocol streams pure from stdout's mouth,
No stderr chaos, just the honest route.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.76% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the two main changes: persistent caching of retrieved docs and addition of PyPI package docs lookup functionality.
Linked Issues check ✅ Passed The PR fully implements both linked issues: #8 (persistent cache with version/slug/anchor/budget keys, index fingerprint invalidation, cache fallback) and #6 (PyPI metadata lookup for package-declared docs/homepage/source URLs with explicit trust boundary).
Out of Scope Changes check ✅ Passed All changes are directly scoped to the linked issues: persistent cache implementation, PyPI service, related models/tests, documentation updates, and supporting infrastructure (stdout hygiene fix, test utilities).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/open-issues-cache-pypi-docs

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
tests/test_mcp_get_docs_cache_smoke.py (1)

10-16: ⚡ Quick win

Extract shared stdio helpers into a dedicated test utility module.

Importing private helpers from another test module tightly couples these tests and makes future refactors noisier. Move these helpers into a shared tests support module and import both smoke suites from there.

Refactor sketch
-from tests.test_stdio_smoke import (
+from tests.support.stdio_helpers import (
     _assert_protocol_on_stdout_only,
     _find_response,
     _isolated_cache_env,
     _make_notification,
     _make_request,
 )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_mcp_get_docs_cache_smoke.py` around lines 10 - 16, Move the
private stdio helper functions into a shared test helpers module and update
imports: create a new tests support module (e.g., tests.test_helpers or
tests._helpers) and relocate the functions _assert_protocol_on_stdout_only,
_find_response, _isolated_cache_env, _make_notification, and _make_request into
that module; then change imports in this test (and in tests that currently
import from tests.test_stdio_smoke) to import those helpers from the new module
instead of importing them from _stdio_smoke. Ensure the new module exports the
same helper names and update any relative import paths in affected tests to
reference the new module.
PR9_MCP_TEST_PLAN.md (1)

23-26: ⚡ Quick win

Make setup steps environment-agnostic.

The hardcoded workspace path and branch-specific checkout make this plan hard to run outside your local/review environment. Prefer generic “repo root” instructions plus optional branch notes.

Proposed edit
-cd /srv/openclaw/.openclaw/workspace/tmp/python-docs-mcp-review
-git checkout fix/open-issues-cache-pypi-docs || git checkout review-pr-9
-git pull --ff-only origin fix/open-issues-cache-pypi-docs
+cd <repo-root>
+# optional: checkout the PR branch under test
+# git checkout fix/open-issues-cache-pypi-docs
+git pull --ff-only
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@PR9_MCP_TEST_PLAN.md` around lines 23 - 26, The setup steps in
PR9_MCP_TEST_PLAN.md are environment-specific—replace the hardcoded path "cd
/srv/openclaw/.openclaw/workspace/tmp/python-docs-mcp-review" with a generic
repo-root instruction (e.g., "cd <repo-root> or open project root"), and change
the branch checkout commands "git checkout fix/open-issues-cache-pypi-docs ||
git checkout review-pr-9" to suggest a generic branch instruction (e.g., "git
fetch && git checkout <branch-or-PR-branch> (or use the PR branch name)") plus
an optional note showing the exact branch names used for this review; keep "git
pull --ff-only origin <branch>" and "uv sync --all-extras" but parametrize the
branch placeholder so the plan is runnable in any environment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@PR9_MCP_TEST_PLAN.md`:
- Line 42: Update the pytest summary line in PR9_MCP_TEST_PLAN.md that currently
reads "254 passed, 3 skipped" to reflect the PR's actual test totals "247
passed, 3 skipped" so the documented expected results match the reported CI/PR
summary and avoid false FAILs.
- Line 71: Change the headings and inline phrases to use hyphenated compound
adjectives for consistency: replace "Full page retrieval" with "Full-page
retrieval" and any occurrences of "page not found style response" or similar
with "page-not-found-style response" (check the heading at "### 2.1 Full page
retrieval" and the repeated occurrence around lines noted as 111-111) so all
compound adjectives in PR9_MCP_TEST_PLAN.md follow the hyphenated style.

In `@src/mcp_server_python_docs/retrieval/ranker.py`:
- Around line 83-99: The current _resolve_symbol_location logic returns empty
string "" for a missing anchor which makes get_docs treat it as a section hit;
change the return value to use None (or normalize blank anchors to None) when
there is no valid section anchor. Specifically, in the branch that checks
section_row and the final return, return doc_row["slug"], None (or convert
section_row["anchor"] or "" into None) instead of "", and apply the same change
in the other occurrences referenced (the other blocks around lines with similar
logic) so doc_row, fallback_anchor, section_row and _resolve_symbol_location
consistently return None for “no anchor.”

In `@src/mcp_server_python_docs/services/persistent_cache.py`:
- Around line 32-35: The shared sqlite connection (self._conn) is used with
check_same_thread=False but not protected against concurrent access; serialize
DB access by adding a threading.Lock (e.g., self._db_lock created where the
connection is initialized) and wrap every database operation (methods like
get(), put(), and any other places that call self._conn.execute/commit) inside a
with self._db_lock: block (or acquire/release) so execute()/commit() calls
cannot run concurrently; alternatively, change the implementation to open a
short-lived sqlite3.connect(...) per operation and close it after the
query—apply the same change for all DB use sites referenced around the
__init__/connection code and the blocks noted (lines ~66-99, 101-123).

In `@tests/test_stdio_smoke.py`:
- Around line 151-162: The helper _assert_protocol_on_stdout_only currently
verifies JSON-RPC frames but doesn't assert the process exit status; update this
function to check result.returncode == 0 (or use result.check_returncode()) and
assert a successful exit with a clear message so smoke tests fail if the server
exited non-zero; locate the function _assert_protocol_on_stdout_only and add a
return-code assertion after reading/parsing stdout/stderr responses.

---

Nitpick comments:
In `@PR9_MCP_TEST_PLAN.md`:
- Around line 23-26: The setup steps in PR9_MCP_TEST_PLAN.md are
environment-specific—replace the hardcoded path "cd
/srv/openclaw/.openclaw/workspace/tmp/python-docs-mcp-review" with a generic
repo-root instruction (e.g., "cd <repo-root> or open project root"), and change
the branch checkout commands "git checkout fix/open-issues-cache-pypi-docs ||
git checkout review-pr-9" to suggest a generic branch instruction (e.g., "git
fetch && git checkout <branch-or-PR-branch> (or use the PR branch name)") plus
an optional note showing the exact branch names used for this review; keep "git
pull --ff-only origin <branch>" and "uv sync --all-extras" but parametrize the
branch placeholder so the plan is runnable in any environment.

In `@tests/test_mcp_get_docs_cache_smoke.py`:
- Around line 10-16: Move the private stdio helper functions into a shared test
helpers module and update imports: create a new tests support module (e.g.,
tests.test_helpers or tests._helpers) and relocate the functions
_assert_protocol_on_stdout_only, _find_response, _isolated_cache_env,
_make_notification, and _make_request into that module; then change imports in
this test (and in tests that currently import from tests.test_stdio_smoke) to
import those helpers from the new module instead of importing them from
_stdio_smoke. Ensure the new module exports the same helper names and update any
relative import paths in affected tests to reference the new module.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 1968fe20-5be0-4725-92d0-2853ce1948b3

📥 Commits

Reviewing files that changed from the base of the PR and between 8f6736c and 694c205.

📒 Files selected for processing (17)
  • PR9_MCP_TEST_PLAN.md
  • PR9_SMOKE_FAILURE_FIX_PLAN.md
  • README.md
  • src/mcp_server_python_docs/__main__.py
  • src/mcp_server_python_docs/app_context.py
  • src/mcp_server_python_docs/models.py
  • src/mcp_server_python_docs/retrieval/ranker.py
  • src/mcp_server_python_docs/server.py
  • src/mcp_server_python_docs/services/content.py
  • src/mcp_server_python_docs/services/package_docs.py
  • src/mcp_server_python_docs/services/persistent_cache.py
  • tests/conftest.py
  • tests/test_mcp_get_docs_cache_smoke.py
  • tests/test_package_docs.py
  • tests/test_persistent_docs_cache.py
  • tests/test_services.py
  • tests/test_stdio_smoke.py

Comment thread PR9_MCP_TEST_PLAN.md

- Ruff passes.
- Pyright passes for `src/`.
- Pytest passes: currently expected `254 passed, 3 skipped`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the expected pytest totals to match this PR context.

Line 42 says 254 passed, 3 skipped, but the PR summary for this change set reports 247 passed, 3 skipped. This can cause false FAIL interpretation during manual validation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@PR9_MCP_TEST_PLAN.md` at line 42, Update the pytest summary line in
PR9_MCP_TEST_PLAN.md that currently reads "254 passed, 3 skipped" to reflect the
PR's actual test totals "247 passed, 3 skipped" so the documented expected
results match the reported CI/PR summary and avoid false FAILs.

Comment thread PR9_MCP_TEST_PLAN.md

## Test Area 2 — `get_docs` Functional Retrieval

### 2.1 Full page retrieval
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use hyphenated compound adjectives for consistency.

Consider “Full-page retrieval” and “page-not-found-style response” for clearer technical writing.

Also applies to: 111-111

🧰 Tools
🪛 LanguageTool

[uncategorized] ~71-~71: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...get_docs` Functional Retrieval ### 2.1 Full page retrieval Call: ```text get_docs(slug...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@PR9_MCP_TEST_PLAN.md` at line 71, Change the headings and inline phrases to
use hyphenated compound adjectives for consistency: replace "Full page
retrieval" with "Full-page retrieval" and any occurrences of "page not found
style response" or similar with "page-not-found-style response" (check the
heading at "### 2.1 Full page retrieval" and the repeated occurrence around
lines noted as 111-111) so all compound adjectives in PR9_MCP_TEST_PLAN.md
follow the hyphenated style.

Comment on lines +83 to +99
if doc_row is None:
return fallback_slug, fallback_anchor

if fallback_anchor:
section_row = conn.execute(
"""
SELECT anchor
FROM sections
WHERE document_id = ? AND anchor = ?
LIMIT 1
""",
(doc_row["id"], fallback_anchor),
).fetchone()
if section_row is not None:
return doc_row["slug"], section_row["anchor"] or ""

return doc_row["slug"], ""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't emit anchor="" for document-level symbol hits.

When _resolve_symbol_location() falls back to a page-only location, both symbol search paths surface an empty-string anchor. get_docs() treats any non-None anchor as a section lookup, so forwarding that hit verbatim turns a retrievable page hit into PageNotFoundError. Return None for “no anchor”, or normalize blank anchors before exposing them downstream.

Also applies to: 235-247, 352-365

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/mcp_server_python_docs/retrieval/ranker.py` around lines 83 - 99, The
current _resolve_symbol_location logic returns empty string "" for a missing
anchor which makes get_docs treat it as a section hit; change the return value
to use None (or normalize blank anchors to None) when there is no valid section
anchor. Specifically, in the branch that checks section_row and the final
return, return doc_row["slug"], None (or convert section_row["anchor"] or ""
into None) instead of "", and apply the same change in the other occurrences
referenced (the other blocks around lines with similar logic) so doc_row,
fallback_anchor, section_row and _resolve_symbol_location consistently return
None for “no anchor.”

Comment on lines +32 to +35
try:
self._cache_path.parent.mkdir(parents=True, exist_ok=True)
self._conn = sqlite3.connect(str(self._cache_path), check_same_thread=False)
self._conn.execute("PRAGMA synchronous = NORMAL")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Per the Python sqlite3 documentation, does setting check_same_thread=Falsemake a single connection safe for concurrentexecute()/commit() calls from multiple threads, or must access still be serialized by the application?

💡 Result:

No, setting check_same_thread=False does not make a single sqlite3 connection safe for concurrent execute() or commit() calls from multiple threads. Access must still be serialized by the application, particularly for write operations, to avoid data corruption.[1][2][3] The official Python sqlite3 documentation states: "If False, the connection may be accessed in multiple threads; write operations may need to be serialized by the user to avoid data corruption. See threadsafety for more information."[1][2][3] The sqlite3.threadsafety attribute indicates the underlying SQLite library's threading mode, typically 2 (multi-thread) or 3 (serialized).[1][2] In multi-thread mode (common default), no single connection should be used simultaneously by multiple threads.[4] Even in serialized mode, Python's wrapper may require additional synchronization, as noted in CPython issues.[5] Secondary sources confirm: applications must use locks to serialize access to shared connections.[6] Concurrent writes without serialization risk database corruption or errors.[7][8][9]

Citations:


Serialize access to the shared cache connection.

check_same_thread=False only disables sqlite3's thread-affinity check—it does not make a single connection safe for concurrent execute() or commit() calls. Since this cache instance is shared across tool invocations, parallel get() and put() requests will race and risk data corruption or dropped writes. Wrap all database operations with a lock, or use a short-lived connection per operation.

🔒 One straightforward fix
+import threading
...
 class PersistentDocsCache:
@@
     def __init__(self, cache_path: Path, index_path: Path) -> None:
@@
         self._hits = self._misses = self._writes = 0
+        self._lock = threading.Lock()
         self._conn: sqlite3.Connection | None = None
@@
         try:
-            row = self._conn.execute(
-                "SELECT result_json FROM retrieved_docs_cache WHERE index_fingerprint = ? "
-                "AND version = ? AND slug = ? AND anchor = ? AND max_chars = ? AND start_index = ?",
-                (
-                    self._fingerprint,
-                    version,
-                    slug,
-                    self._anchor_key(anchor),
-                    max_chars,
-                    start_index,
-                ),
-            ).fetchone()
+            with self._lock:
+                row = self._conn.execute(
+                    "SELECT result_json FROM retrieved_docs_cache WHERE index_fingerprint = ? "
+                    "AND version = ? AND slug = ? AND anchor = ? AND max_chars = ? AND start_index = ?",
+                    (
+                        self._fingerprint,
+                        version,
+                        slug,
+                        self._anchor_key(anchor),
+                        max_chars,
+                        start_index,
+                    ),
+                ).fetchone()
@@
         try:
-            self._conn.execute(
-                "INSERT OR REPLACE INTO retrieved_docs_cache "
-                "(index_fingerprint, version, slug, anchor, max_chars, start_index, result_json) "
-                "VALUES (?, ?, ?, ?, ?, ?, ?)",
-                (
-                    self._fingerprint,
-                    result.version,
-                    result.slug,
-                    self._anchor_key(result.anchor),
-                    max_chars,
-                    start_index,
-                    result.model_dump_json(),
-                ),
-            )
-            self._conn.commit()
+            with self._lock:
+                self._conn.execute(
+                    "INSERT OR REPLACE INTO retrieved_docs_cache "
+                    "(index_fingerprint, version, slug, anchor, max_chars, start_index, result_json) "
+                    "VALUES (?, ?, ?, ?, ?, ?, ?)",
+                    (
+                        self._fingerprint,
+                        result.version,
+                        result.slug,
+                        self._anchor_key(result.anchor),
+                        max_chars,
+                        start_index,
+                        result.model_dump_json(),
+                    ),
+                )
+                self._conn.commit()

Also applies to: 66-99, 101-123

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/mcp_server_python_docs/services/persistent_cache.py` around lines 32 -
35, The shared sqlite connection (self._conn) is used with
check_same_thread=False but not protected against concurrent access; serialize
DB access by adding a threading.Lock (e.g., self._db_lock created where the
connection is initialized) and wrap every database operation (methods like
get(), put(), and any other places that call self._conn.execute/commit) inside a
with self._db_lock: block (or acquire/release) so execute()/commit() calls
cannot run concurrently; alternatively, change the implementation to open a
short-lived sqlite3.connect(...) per operation and close it after the
query—apply the same change for all DB use sites referenced around the
__init__/connection code and the blocks noted (lines ~66-99, 101-123).

Comment thread tests/test_stdio_smoke.py
Comment on lines +151 to +162
def _assert_protocol_on_stdout_only(result: subprocess.CompletedProcess) -> list[dict]:
"""Assert JSON-RPC protocol frames are emitted only on stdout."""
responses = _read_responses(result.stdout)
assert responses, f"No JSON-RPC responses on stdout; stderr was: {result.stderr!r}"

for resp in responses:
assert "_raw" not in resp, f"Non-JSON stdout pollution: {resp.get('_raw')}"
assert resp.get("jsonrpc") == "2.0", f"Missing JSON-RPC frame on stdout: {resp}"

stderr_frames = _jsonrpc_frames(result.stderr)
assert stderr_frames == [], f"JSON-RPC frames leaked to stderr: {stderr_frames}"
return responses
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert successful process exit in protocol helper.

_assert_protocol_on_stdout_only validates framing but can still pass if the server emits responses and then exits non-zero. Add a return-code check to keep smoke tests aligned with the “no crash” intent.

Suggested patch
 def _assert_protocol_on_stdout_only(result: subprocess.CompletedProcess) -> list[dict]:
     """Assert JSON-RPC protocol frames are emitted only on stdout."""
+    assert result.returncode == 0, (
+        f"Server exited with non-zero status {result.returncode}; stderr was: {result.stderr!r}"
+    )
     responses = _read_responses(result.stdout)
     assert responses, f"No JSON-RPC responses on stdout; stderr was: {result.stderr!r}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_stdio_smoke.py` around lines 151 - 162, The helper
_assert_protocol_on_stdout_only currently verifies JSON-RPC frames but doesn't
assert the process exit status; update this function to check result.returncode
== 0 (or use result.check_returncode()) and assert a successful exit with a
clear message so smoke tests fail if the server exited non-zero; locate the
function _assert_protocol_on_stdout_only and add a return-code assertion after
reading/parsing stdout/stderr responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache retrieved docs across AI sessions by Python version add official PyPI package-docs lookup

1 participant