feat: Add setup_statements parameter to execute_sql for session setup by robertlacok · Pull Request #96 · deepnote/deepnote-toolkit

robertlacok · 2026-04-23T11:45:10Z

Summary

Add an optional setup_statements: list[str] parameter to execute_sql and execute_sql_with_connection_json.
When provided, each statement is run via connection.exec_driver_sql on the same DBAPI connection inside the same engine.begin() block as the main query, so any session state it sets (Snowflake USE WAREHOUSE, USE ROLE, SET ..., etc.) is in effect for the main query.
Single call, single engine, single tunnel, single connection. Engine lifecycle is unchanged from today.

Why

Each execute_sql* call currently does create_engine(...) and then engine.dispose() in finally. That makes it impossible to run a sequence like

USE WAREHOUSE abc;
SELECT 123

as two separate execute_sql calls and have the second inherit session state from the first — the connection is gone before the second call runs.

setup_statements solves that without changing engine, pool, or SSH tunnel lifecycle: the setup statements piggy-back on the same connection the main query uses.

How

New setup_statements kwarg on execute_sql and execute_sql_with_connection_json, plumbed through _execute_sql_with_caching and _query_data_source to _execute_sql_on_engine.

Inside _execute_sql_on_engine, before pandas.read_sql_query:

with engine.begin() as connection:
    for stmt in setup_statements or []:
        connection.exec_driver_sql(stmt)
    ...
    return pd.read_sql_query(query, ...)

Setup statements are not Jinja-rendered, parameter-bound, or audit-commented — they're raw SQL executed in order. A failing setup statement aborts the main query (the engine.begin() rolls back).
DuckDB branch in _execute_sql_with_caching runs setup statements on the process-wide DuckDB singleton before the main query, for API parity (DuckDB session state already persists across calls anyway).

Test plan

New tests in tests/unit/test_sql_execution_internal.py:
- setup statements execute in order, on the same connection, before the main query
- empty list and None are no-ops
- a failing setup statement propagates and the main query is not run
New tests in tests/unit/test_sql_execution.py:
- setup_statements is plumbed from execute_sql down to _query_data_source
- default is None
poetry run pytest tests/unit/ — 755 passed (excludes pre-existing pyspark collection errors in test_chart.py / test_ocelots.py).
poetry run black and poetry run flake8 clean.

Usage

df = _dntk.execute_sql(
    "SELECT 123",
    "SQL_INTEG",
    setup_statements=["USE WAREHOUSE abc"],
)

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added support for setup SQL statements that execute before the main query on the same database connection. Setup statements run sequentially in the order provided, and any execution failure prevents the main query from running.

…-51) Cache the SQLAlchemy engine per IPython cell execution so that two back-to-back `_dntk.execute_sql*` calls in the same generated cell share the same physical DBAPI connection. This lets Snowflake session state set by `USE WAREHOUSE`, `USE ROLE`, `SET ...` etc. persist onto the next query, which unblocks supporting `USE WAREHOUSE abc; SELECT 123` in Snowflake SQL blocks. Cached engines are created with `pool_size=1, max_overflow=0` so `engine.begin()` always checks out the same connection. Disposed via a `post_run_cell` IPython hook. Caching is skipped for SSH-tunneled engines, user-supplied custom pool config, and when no IPython shell is available (script/CLI use) — all of those fall back to the previous create-then-dispose-per-call behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

linear · 2026-04-23T11:45:14Z

SAL-51 Support USE WAREHOUSE in Snowflake blocks

coderabbitai · 2026-04-23T11:45:23Z

📝 Walkthrough

Walkthrough

This PR introduces a setup_statements parameter to the SQL execution APIs (execute_sql_with_connection_json and execute_sql). The parameter accepts a list of SQL statements that execute in order immediately before the main query on the same underlying connection. For DuckDB, setup statements run via the existing execute_duckdb_sql function; for SQLAlchemy-backed engines, they execute within the transaction block using connection.exec_driver_sql before the main query. If any setup statement fails, the main query is prevented from executing. The parameter is threaded through five internal functions and covered by unit tests verifying correct execution order, no-op behavior when absent, and error propagation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Updates Docs	⚠️ Warning	Pull request adds setup_statements parameter to public API functions but documentation was not updated to reflect this new functionality.	Update documentation in docs/user/configuration.md or README.md to describe the setup_statements parameter, its behavior, and include usage examples.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Title accurately describes the main change: adding a setup_statements parameter to execute_sql for session initialization.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-23T11:45:42Z

📦 Python package built successfully!

Version: 2.2.1.dev4+c040781
Wheel: deepnote_toolkit-2.2.1.dev4+c040781-py3-none-any.whl

Install:

pip install "deepnote-toolkit @ https://deepnote-staging-runtime-artifactory.s3.amazonaws.com/deepnote-toolkit-packages/2.2.1.dev4%2Bc040781/deepnote_toolkit-2.2.1.dev4%2Bc040781-py3-none-any.whl"

codecov · 2026-04-23T11:45:53Z

Codecov Report

❌ Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.25%. Comparing base (f2ac75a) to head (029e245).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
deepnote_toolkit/sql/sql_execution.py	66.66%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #96      +/-   ##
==========================================
- Coverage   74.32%   74.25%   -0.08%     
==========================================
  Files          94       94              
  Lines        5535     5539       +4     
  Branches      824      826       +2     
==========================================
- Hits         4114     4113       -1     
- Misses       1155     1158       +3     
- Partials      266      268       +2

Flag	Coverage Δ
combined	`74.25% <66.66%> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

deepnote_toolkit/ipython_utils.py (1)
6-34: 🛠️ Refactor suggestion | 🟠 Major

Type the hook registry and callback explicitly.

Lines 6-18 add a new hook API, but both _registered_post_run_cell_callbacks and callback are untyped. That weakens static checking around event registration.
Suggested fix
+from collections.abc import Callable
+from typing import Any
+
-_registered_post_run_cell_callbacks: set = set()
+_registered_post_run_cell_callbacks: set[Callable[..., Any]] = set()
 
-
-def register_post_run_cell_hook(callback) -> bool:
+def register_post_run_cell_hook(callback: Callable[..., Any]) -> bool:
As per coding guidelines, "Use explicit type hints for function parameters and return values" and "Use type hints consistently".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/ipython_utils.py` around lines 6 - 34, Add explicit type
hints for the hook registry and callback: import typing symbols (Set, Callable,
Any) and change the module-level variable to
_registered_post_run_cell_callbacks: Set[Callable[..., Any]] = set(); update the
register_post_run_cell_hook signature to def
register_post_run_cell_hook(callback: Callable[..., Any]) -> bool: so static
type checkers know the stored callable shape when registering with ip.events.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 465-485: The cache key in _compute_engine_cache_key currently uses
only ssh_options, pool-related params, integration_id, or url which ignores
other engine-defining options; update it to incorporate a deterministic
serialization of the relevant connection params (e.g., the params dict and any
connect_args/auth headers present in sql_alchemy_dict) into the returned key
while preserving the existing early returns for ssh_options-enabled and explicit
pool config; specifically, when integration_id exists include it plus a stable
encoding (sorted/JSON) of params/connect_args in the key (and if no
integration_id, include the URL plus the same stable encoding) so that differing
params produce distinct cache keys.

---

Outside diff comments:
In `@deepnote_toolkit/ipython_utils.py`:
- Around line 6-34: Add explicit type hints for the hook registry and callback:
import typing symbols (Set, Callable, Any) and change the module-level variable
to _registered_post_run_cell_callbacks: Set[Callable[..., Any]] = set(); update
the register_post_run_cell_hook signature to def
register_post_run_cell_hook(callback: Callable[..., Any]) -> bool: so static
type checkers know the stored callable shape when registering with ip.events.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b6e1c41b-ba25-4588-aa46-4a16f4340388

📥 Commits

Reviewing files that changed from the base of the PR and between f2ac75a and c716237.

📒 Files selected for processing (4)

deepnote_toolkit/ipython_utils.py
deepnote_toolkit/sql/sql_execution.py
tests/unit/test_ipython_utils.py
tests/unit/test_sql_execution_internal.py

coderabbitai · 2026-04-23T11:50:19Z

+def _compute_engine_cache_key(sql_alchemy_dict) -> Optional[str]:
+    """Stable key identifying the engine for *sql_alchemy_dict*, or None when
+    this connection should not be cached.
+
+    SSH-tunneled engines can't be cached because the tunnel only lives inside
+    `_create_sql_ssh_uri`'s context manager. Connections that already specify
+    a custom pool config in user `params` are also skipped to avoid clashing
+    with the `pool_size=1` we set on cached engines.
+    """
+    if sql_alchemy_dict.get("ssh_options", {}).get("enabled"):
+        return None
+
+    params = sql_alchemy_dict.get("params") or {}
+    if "pool_size" in params or "max_overflow" in params or "poolclass" in params:
+        return None
+
+    integration_id = sql_alchemy_dict.get("integration_id")
+    if integration_id:
+        return f"integration:{integration_id}"
+
+    return f"url:{sql_alchemy_dict['url']}"


⚠️ Potential issue | 🟠 Major

Include engine-defining params in the cache key.

Line 481 collapses the key to integration_id, and Line 485 falls back to just the raw URL. create_engine() also depends on params, so two calls in one cell can incorrectly share an engine even when connect_args, auth headers, or other connection options differ. That can silently run the second query with the wrong connection/session configuration.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@deepnote_toolkit/sql/sql_execution.py` around lines 465 - 485, The cache key in _compute_engine_cache_key currently uses only ssh_options, pool-related params, integration_id, or url which ignores other engine-defining options; update it to incorporate a deterministic serialization of the relevant connection params (e.g., the params dict and any connect_args/auth headers present in sql_alchemy_dict) into the returned key while preserving the existing early returns for ssh_options-enabled and explicit pool config; specifically, when integration_id exists include it plus a stable encoding (sorted/JSON) of params/connect_args in the key (and if no integration_id, include the URL plus the same stable encoding) so that differing params produce distinct cache keys.

deepnote-bot · 2026-04-23T11:55:16Z

🚀 Review App Deployment Started

📝 Description	🌐 Link / Info
🌍 Review application	ra-96
🔑 Sign-in URL	Click to sign-in
📊 Application logs	View logs
🔄 Actions	Click to redeploy
🚀 ArgoCD deployment	View deployment
⏰ Last deployed	2026-04-23 12:40:59 (UTC)
📜 Deployed commit	`b4fc42a6593b8a694f0ada1e9a5993aa72dde677`
🛠️ Toolkit version	`c040781`

Drops the implicit cache + IPython post_run_cell hook in favor of an explicit `_dntk.sql_session()` context manager. Inside the with-block each unique connection (keyed by integration_id, or URL when not present) opens its SSH tunnel and SQLAlchemy engine once and reuses both for every `execute_sql*` call inside the block. On exit the session disposes the engine and closes the tunnel together. This makes SSH tunnels work with SAL-51 reuse — the previous cache had to skip SSH because the tunnel's lifetime was bound to a single call by `_create_sql_ssh_uri`'s context manager. With the session owning both, the tunnel's lifetime now matches the engine's. Other gains: - No IPython coupling. Lifetime is whatever the caller (deepnote- internal codegen) wraps in the `with` block. - No "magic" cache key vs. live state divergence — outside a session behavior is exactly the previous per-call create+dispose. - Nested `sql_session()` blocks no-op on the inner block so the outer retains ownership. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

deepnote_toolkit/sql/sql_execution.py (1)

475-488: ⚠️ Potential issue | 🟠 Major

Expand the session cache key beyond integration_id / URL.

_session_resource_key() still aliases every call for the same integration or raw URL. Inside one sql_session(), that lets a later call reuse an engine and SSH tunnel created with different params, auth headers, or SSH settings.

Possible fix

 def _session_resource_key(sql_alchemy_dict) -> Optional[str]:
     params = sql_alchemy_dict.get("params") or {}
     if "pool_size" in params or "max_overflow" in params or "poolclass" in params:
         return None
-
-    integration_id = sql_alchemy_dict.get("integration_id")
-    if integration_id:
-        return f"integration:{integration_id}"
-
-    return f"url:{sql_alchemy_dict['url']}"
+    key_payload = {
+        "integration_id": sql_alchemy_dict.get("integration_id"),
+        "url": sql_alchemy_dict["url"],
+        "params": params,
+        "ssh_options": sql_alchemy_dict.get("ssh_options"),
+    }
+    return json.dumps(key_payload, sort_keys=True, default=str)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/sql/sql_execution.py` around lines 475 - 488,
_session_resource_key currently returns the same key for all connections sharing
an integration_id or URL, allowing reuse of engines/tunnels with different
connection-specific settings; change _session_resource_key(sql_alchemy_dict) to
include a stable fingerprint of the connection-specific attributes (e.g.,
serialize and include params, auth headers, and SSH settings such as
sql_alchemy_dict.get("params"), sql_alchemy_dict.get("headers") or auth token
field, and sql_alchemy_dict.get("ssh_config")/ssh-related keys) when computing
the key (but keep the existing early return None for explicit
pool_size/max_overflow/poolclass). Use a deterministic serialization or short
hash of those fields to build the returned string (e.g.,
f"integration:{integration_id}:{hash}" or f"url:{url}:{hash}") so
engines/tunnels are only reused when all relevant settings match.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 365-399: Change _open_ssh_tunnel and _close_ssh_tunnel to have
explicit type hints and raise a specific exception type instead of a bare
Exception: add a small custom exception class (e.g., SSHTunnelError or
SSHTunnelSetupError) and use it when PRIVATE_SSH_KEY_BLOB is missing or
create_ssh_tunnel fails; annotate _open_ssh_tunnel signature to return
Tuple[Any, URL] (or the concrete tunnel type if available) and accept
sql_alchemy_dict: Mapping[str, Any], and annotate _close_ssh_tunnel(server:
Optional[Any]) -> None; update docstrings to state ownership/closing contract
and failure modes; keep references to the existing symbols (_open_ssh_tunnel,
_close_ssh_tunnel, create_ssh_tunnel, dnenv.get_env, make_url, URL) so reviewers
can find and validate the changes.
- Around line 561-572: The call to create_engine currently passes two kwarg
dicts separately which can cause a TypeError if sql_alchemy_dict["params"]
contains overlapping keys like pool_pre_ping; before calling create_engine (in
the block using suppress_third_party_deprecation_warnings and the in_session
logic), merge sql_alchemy_dict["params"] and extra_engine_params into a single
dict (e.g., copy sql_alchemy_dict["params"] then use setdefault or update
carefully) so extra_engine_params values only apply when keys are absent, then
call create_engine(url, **merged_params) instead of passing two dicts.

In `@tests/unit/test_sql_execution_internal.py`:
- Around line 380-386: Update the helper _make_sql_alchemy_dict to add explicit
type hints: annotate parameters url and params as Optional[str] and
Optional[Dict[str, Any]] (or appropriate param dict type) respectively, and
declare the function return type as Dict[str, Any] (or a more specific mapping
type used in tests). Adjust the function signature and any necessary imports
(from typing import Optional, Dict, Any) so the function reads e.g. def
_make_sql_alchemy_dict(integration_id: str = "integration_a", url: Optional[str]
= None, params: Optional[Dict[str, Any]] = None) -> Dict[str, Any].

---

Duplicate comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 475-488: _session_resource_key currently returns the same key for
all connections sharing an integration_id or URL, allowing reuse of
engines/tunnels with different connection-specific settings; change
_session_resource_key(sql_alchemy_dict) to include a stable fingerprint of the
connection-specific attributes (e.g., serialize and include params, auth
headers, and SSH settings such as sql_alchemy_dict.get("params"),
sql_alchemy_dict.get("headers") or auth token field, and
sql_alchemy_dict.get("ssh_config")/ssh-related keys) when computing the key (but
keep the existing early return None for explicit
pool_size/max_overflow/poolclass). Use a deterministic serialization or short
hash of those fields to build the returned string (e.g.,
f"integration:{integration_id}:{hash}" or f"url:{url}:{hash}") so
engines/tunnels are only reused when all relevant settings match.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 22352dcb-ac84-4524-b98f-c797b6e3f455

📥 Commits

Reviewing files that changed from the base of the PR and between c716237 and 6762926.

📒 Files selected for processing (3)

deepnote_toolkit/__init__.py
deepnote_toolkit/sql/sql_execution.py
tests/unit/test_sql_execution_internal.py

coderabbitai · 2026-04-23T12:04:49Z

+def _open_ssh_tunnel(sql_alchemy_dict) -> tuple[Any, Any]:
+    """Open an SSH tunnel for *sql_alchemy_dict* and return ``(server, url)``.
+
+    Caller is responsible for eventually closing the returned server with
+    ``_close_ssh_tunnel``.
+    """
+    base64_encoded_key = dnenv.get_env("PRIVATE_SSH_KEY_BLOB")
+    if not base64_encoded_key:
+        raise Exception(
+            "The private key needed to establish the SSH connection is missing. Please try again or contact support."
+        )
+    original_url = make_url(sql_alchemy_dict["url"])
+    server = create_ssh_tunnel(
+        ssh_host=sql_alchemy_dict["ssh_options"]["host"],
+        ssh_port=int(sql_alchemy_dict["ssh_options"]["port"]),
+        ssh_user=sql_alchemy_dict["ssh_options"]["user"],
+        remote_host=original_url.host,
+        remote_port=int(original_url.port),
+        private_key=base64.b64decode(base64_encoded_key).decode("utf-8"),
+    )
+    url = URL.create(
+        drivername=original_url.drivername,
+        username=original_url.username,
+        password=original_url.password,
+        host=server.local_bind_host,
+        port=server.local_bind_port,
+        database=original_url.database,
+        query=original_url.query,
+    )
+    return server, url
+
+
+def _close_ssh_tunnel(server) -> None:
+    if server is not None and server.is_active:
+        server.close()


🛠️ Refactor suggestion | 🟠 Major

Type the new SSH/session helpers and use a specific exception.

These helpers are new control-path code, but their signatures are still untyped and _open_ssh_tunnel() raises a bare Exception. Please make the contract explicit so callers can reason about ownership and failures.

As per coding guidelines, "Use explicit type hints for function parameters and return values", "Use appropriate exception types for error handling", and "Document functions and classes with docstrings".

Also applies to: 475-581

🧰 Tools

🪛 Ruff (0.15.10)

[warning] 373-375: Create your own exception

(TRY002)

[warning] 373-375: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@deepnote_toolkit/sql/sql_execution.py` around lines 365 - 399, Change _open_ssh_tunnel and _close_ssh_tunnel to have explicit type hints and raise a specific exception type instead of a bare Exception: add a small custom exception class (e.g., SSHTunnelError or SSHTunnelSetupError) and use it when PRIVATE_SSH_KEY_BLOB is missing or create_ssh_tunnel fails; annotate _open_ssh_tunnel signature to return Tuple[Any, URL] (or the concrete tunnel type if available) and accept sql_alchemy_dict: Mapping[str, Any], and annotate _close_ssh_tunnel(server: Optional[Any]) -> None; update docstrings to state ownership/closing contract and failure modes; keep references to the existing symbols (_open_ssh_tunnel, _close_ssh_tunnel, create_ssh_tunnel, dnenv.get_env, make_url, URL) so reviewers can find and validate the changes.

coderabbitai · 2026-04-23T12:04:49Z

+    extra_engine_params: dict[str, Any] = {"pool_pre_ping": True}
+    if in_session:
+        # See sql_session() docstring for why pool_size=1 is required to make
+        # session state (USE WAREHOUSE, ...) persist across calls.
+        extra_engine_params["pool_size"] = 1
+        extra_engine_params["max_overflow"] = 0
+
+    try:
+        with suppress_third_party_deprecation_warnings():
+            engine = create_engine(
+                url, **sql_alchemy_dict["params"], **extra_engine_params
+            )


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

In Python, what happens when the same keyword argument is supplied by two **kwargs expansions in a single function call?

💡 Result:

In Python (3.5+), supplying the same keyword argument from two kwargs expansions in a single function call, such as f(d1, d2) where d1 and d2 both contain the same key, raises TypeError: f got multiple values for keyword argument 'keyname'. This behavior is specified in PEP 448, which introduced multiple unpacking: "Currently, if an argument is given multiple times — such as a positional argument given both positionally and by keyword — a TypeError is raised. This remains true for duplicate arguments provided through multiple unpackings, e.g. f({'x': 2}, {'x': 3}), except that the error will be detected at runtime." Python merges the kwargs from left to right, but detects duplicates during argument processing in getargs.c and raises the error. This distinguishes from dictionary merging {d1, d2}, which keeps the last value without error. Example: def f(kwargs): pass d1 = {'a': 1} d2 = {'a': 2} f(d1, d2) # TypeError: f got multiple values for keyword argument 'a'

Citations:

1: https://peps.python.org/pep-0448/

2: https://stackoverflow.com/questions/38860676/multiple-kwargs-in-a-function-call

3: kwargs regression when there are multiple entries with the same key python/cpython#79815

4: python/cpython@e036ef8

5: https://discuss.python.org/t/why-dict-expansion-allows-dup-keys-but-kwargs-does-not/71291

🏁 Script executed:

cd deepnote_toolkit && find . -type f -name "sql_execution.py" | head -1

Repository: deepnote/deepnote-toolkit

Length of output: 91

🏁 Script executed:

# Check the exact file and read lines around 561-572 wc -l deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 110

🏁 Script executed:

# Read the relevant section sed -n '555,580p' deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 973

🏁 Script executed:

# Search for sql_alchemy_dict definition and usage rg "sql_alchemy_dict" deepnote_toolkit/sql/sql_execution.py -B 3 -A 3

Repository: deepnote/deepnote-toolkit

Length of output: 8322

🏁 Script executed:

# Check where pool_pre_ping is used in the codebase rg "pool_pre_ping" deepnote_toolkit/

Repository: deepnote/deepnote-toolkit

Length of output: 172

🏁 Script executed:

# Check for tests that might show if pool_pre_ping is passed in params find . -name "*test*sql*" -type f | head -10

Repository: deepnote/deepnote-toolkit

Length of output: 349

🏁 Script executed:

# Search for any configuration or default params that might include pool_pre_ping rg "pool_pre_ping\|params.*=" deepnote_toolkit/sql/ -B 2 -A 2

Repository: deepnote/deepnote-toolkit

Length of output: 51

🏁 Script executed:

# Check the _session_resource_key function more carefully to understand the pattern sed -n '730,750p' deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 910

🏁 Script executed:

# Search for the _session_resource_key function definition rg "_session_resource_key" deepnote_toolkit/sql/sql_execution.py -A 15

Repository: deepnote/deepnote-toolkit

Length of output: 1266

🏁 Script executed:

# Look for test files that might test the engine creation with params grep -l "pool_pre_ping\|create_engine\|_acquire_engine" tests/unit/test_sql_execution*.py

Repository: deepnote/deepnote-toolkit

Length of output: 143

🏁 Script executed:

# Check test_sql_execution.py for relevant tests head -100 tests/unit/test_sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 3193

🏁 Script executed:

# Check if there are any tests showing pool_pre_ping in params rg "pool_pre_ping" tests/

Repository: deepnote/deepnote-toolkit

Length of output: 238

🏁 Script executed:

# Search for any examples of params being passed in tests rg "params.*:" tests/unit/test_sql_execution.py | head -20

Repository: deepnote/deepnote-toolkit

Length of output: 1661

🏁 Script executed:

# Look at the full context of _acquire_engine to understand the flow sed -n '760,800p' deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 1689

🏁 Script executed:

# Look at the test_sql_execution_internal.py file to see the pool_pre_ping tests grep -B 10 -A 10 "pool_pre_ping" tests/unit/test_sql_execution_internal.py

Repository: deepnote/deepnote-toolkit

Length of output: 1726

🏁 Script executed:

# Check if sql_alchemy_dict["params"] could ever contain pool_pre_ping by searching all uses rg "\"params\".*pool_pre_ping|pool_pre_ping.*\"params\"" deepnote_toolkit/

Repository: deepnote/deepnote-toolkit

Length of output: 51

🏁 Script executed:

# Look at how sql_alchemy_dict["params"] is built/populated sed -n '420,480p' deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 2670

🏁 Script executed:

# Check the _make_sql_alchemy_dict function used in tests to see what params typically look like rg "_make_sql_alchemy_dict" tests/unit/test_sql_execution_internal.py -A 20

Repository: deepnote/deepnote-toolkit

Length of output: 6504

🏁 Script executed:

# Look at the execute_sql_with_connection_json to see how sql_alchemy_dict is created in real usage sed -n '200,250p' deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 1598

🏁 Script executed:

# Check if there's any documentation or comments about where pool_pre_ping could come from rg -B 3 -A 3 "pool_pre_ping|extra_engine_params" deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 691

Merge engine kwargs before calling create_engine().

If sql_alchemy_dict["params"] contains pool_pre_ping, passing create_engine(url, **params, **extra_engine_params) raises TypeError due to duplicate kwargs. Merge to a single dict first and use setdefault() to avoid conflicts.

Suggested fix

- extra_engine_params: dict[str, Any] = {"pool_pre_ping": True} + engine_params: dict[str, Any] = dict(sql_alchemy_dict["params"]) + engine_params.setdefault("pool_pre_ping", True) if in_session: # See sql_session() docstring for why pool_size=1 is required to make # session state (USE WAREHOUSE, ...) persist across calls. - extra_engine_params["pool_size"] = 1 - extra_engine_params["max_overflow"] = 0 + engine_params["pool_size"] = 1 + engine_params["max_overflow"] = 0 try: with suppress_third_party_deprecation_warnings(): - engine = create_engine( - url, **sql_alchemy_dict["params"], **extra_engine_params - ) + engine = create_engine(url, **engine_params)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@deepnote_toolkit/sql/sql_execution.py` around lines 561 - 572, The call to create_engine currently passes two kwarg dicts separately which can cause a TypeError if sql_alchemy_dict["params"] contains overlapping keys like pool_pre_ping; before calling create_engine (in the block using suppress_third_party_deprecation_warnings and the in_session logic), merge sql_alchemy_dict["params"] and extra_engine_params into a single dict (e.g., copy sql_alchemy_dict["params"] then use setdefault or update carefully) so extra_engine_params values only apply when keys are absent, then call create_engine(url, **merged_params) instead of passing two dicts.

coderabbitai · 2026-04-23T12:04:50Z

+def _make_sql_alchemy_dict(integration_id="integration_a", url=None, params=None):
+    return {
+        "url": url or "postgresql://u:p@localhost:5432/db",
+        "params": params if params is not None else {},
+        "param_style": "qmark",
+        "integration_id": integration_id,
+    }


🛠️ Refactor suggestion | 🟠 Major

Annotate _make_sql_alchemy_dict() explicitly.

url=None and params=None should be Optional[...], and this helper should declare its return type. It is small, but it defines most of the new session test setup.

As per coding guidelines, "Always use Optional[T] for parameters that can be None (not T = None)" and "Use explicit type hints for function parameters and return values".

🧰 Tools

🪛 Ruff (0.15.10)

[warning] 380-380: Missing return type annotation for private function _make_sql_alchemy_dict

(ANN202)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_sql_execution_internal.py` around lines 380 - 386, Update the helper _make_sql_alchemy_dict to add explicit type hints: annotate parameters url and params as Optional[str] and Optional[Dict[str, Any]] (or appropriate param dict type) respectively, and declare the function return type as Dict[str, Any] (or a more specific mapping type used in tests). Adjust the function signature and any necessary imports (from typing import Optional, Dict, Any) so the function reads e.g. def _make_sql_alchemy_dict(integration_id: str = "integration_a", url: Optional[str] = None, params: Optional[Dict[str, Any]] = None) -> Dict[str, Any].

Drops the sql_session() context manager and the engine lifecycle changes it required (per-cell registry, pool_size=1, SSH tunnel lifetime extension). Restores the original per-call create_engine + dispose flow. In its place, execute_sql / execute_sql_with_connection_json grow an optional setup_statements: list[str] parameter. When provided, those statements are run via connection.exec_driver_sql on the same DBAPI connection inside the same engine.begin() block as the main query, so session state set by `USE WAREHOUSE`, `USE ROLE`, `SET ...`, etc. is in effect for the main query without changing the engine's lifetime. Single call, single engine, single tunnel, single connection — no shared mutable state across calls and no special-casing for SSH or custom pool config. DuckDB's process-wide singleton means setup statements naturally persist there too; the duckdb branch executes them on the singleton before the main query for API parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

deepnote_toolkit/sql/sql_execution.py (1)

419-447: ⚠️ Potential issue | 🟠 Major

setup_statements are not part of cache semantics.

Line 435 enables cache even when setup_statements changes session context (e.g., warehouse/role/schema). That can return stale/wrong data from a different setup context.

Suggested fix (safe default: disable cache when setup statements are present)

 def _execute_sql_with_caching(
@@
-    sql_caching_enabled = (
-        sql_cache_mode != "cache_disabled" and return_variable_type == "dataframe"
-    )
+    has_setup_statements = bool(setup_statements)
+    sql_caching_enabled = (
+        sql_cache_mode != "cache_disabled"
+        and return_variable_type == "dataframe"
+        and not has_setup_statements
+    )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/sql/sql_execution.py` around lines 419 - 447, The current
cache path (controlled by sql_caching_enabled / can_get_sql_cache and
get_sql_cache) ignores setup_statements, which can change session context and
produce stale/wrong results; update the logic that computes sql_caching_enabled
/ can_get_sql_cache (or add an explicit guard before calling get_sql_cache) to
disable caching when setup_statements is non-empty (i.e., treat any truthy
setup_statements as a reason to skip get_sql_cache), ensuring the existing
duckdb short-circuit (requires_duckdb) and return_variable_type checks remain
unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Line 631: In _execute_sql_on_engine, validate setup_statements before
iterating: ensure it's either None or an iterable of statements (not a plain
string). Add a guard that detects isinstance(setup_statements, str) and either
wrap it as a single-item list or raise a TypeError, and also validate items are
strings (or convert them) before the loop that executes each statement to avoid
iterating over characters and producing misleading SQL errors.
- Around line 105-106: Several functions in this module are missing parameter
and return type annotations; replace any parameter defaults like
setup_statements=None with an explicit Optional type (e.g., setup_statements:
Optional[Sequence[str]] = None) and add explicit return type annotations for
each function that currently lacks them. Import typing primitives you need (from
typing import Optional, Sequence, Iterable, Any) and update the function
signatures (including the function that declares setup_statements) to use
Optional[T] rather than T = None, and add concrete return types (e.g., -> None,
-> int, -> List[Any], or -> Iterable[str] as appropriate) to the five
un-annotated functions in this module so all parameters and returns are fully
typed. Ensure the parameter name setup_statements is changed to
setup_statements: Optional[<appropriate type>] = None and that each function
signature and its return type reflect the actual values they produce.

---

Outside diff comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 419-447: The current cache path (controlled by sql_caching_enabled
/ can_get_sql_cache and get_sql_cache) ignores setup_statements, which can
change session context and produce stale/wrong results; update the logic that
computes sql_caching_enabled / can_get_sql_cache (or add an explicit guard
before calling get_sql_cache) to disable caching when setup_statements is
non-empty (i.e., treat any truthy setup_statements as a reason to skip
get_sql_cache), ensuring the existing duckdb short-circuit (requires_duckdb) and
return_variable_type checks remain unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1e59874a-01db-41f1-9708-1368007f2686

📥 Commits

Reviewing files that changed from the base of the PR and between 6762926 and 029e245.

📒 Files selected for processing (3)

deepnote_toolkit/sql/sql_execution.py
tests/unit/test_sql_execution.py
tests/unit/test_sql_execution_internal.py

coderabbitai · 2026-04-23T12:38:24Z

+    setup_statements=None,
 ):


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

wc -l deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 110

🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '100,115p'

Repository: deepnote/deepnote-toolkit

Length of output: 976

🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '235,250p'

Repository: deepnote/deepnote-toolkit

Length of output: 868

🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '414,430p'

Repository: deepnote/deepnote-toolkit

Length of output: 973

🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '492,505p'

Repository: deepnote/deepnote-toolkit

Length of output: 581

🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '625,640p'

Repository: deepnote/deepnote-toolkit

Length of output: 957

🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '95,110p'

Repository: deepnote/deepnote-toolkit

Length of output: 682

🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '230,250p'

Repository: deepnote/deepnote-toolkit

Length of output: 979

🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '408,435p'

Repository: deepnote/deepnote-toolkit

Length of output: 1272

🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '485,510p'

Repository: deepnote/deepnote-toolkit

Length of output: 928

🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '625,660p'

Repository: deepnote/deepnote-toolkit

Length of output: 1882

🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '99,107p'

Repository: deepnote/deepnote-toolkit

Length of output: 344

🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '234,242p'

Repository: deepnote/deepnote-toolkit

Length of output: 331

🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '410,421p'

Repository: deepnote/deepnote-toolkit

Length of output: 453

🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '490,499p'

Repository: deepnote/deepnote-toolkit

Length of output: 395

🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '631,647p'

Repository: deepnote/deepnote-toolkit

Length of output: 1076

Add type hints to all function parameters and return annotations.

The five functions at lines 99, 234, 410, 490, and 631 lack any parameter or return type annotations. Replace setup_statements=None with setup_statements: Optional[...] and add explicit return type annotations to each function. Per coding guidelines, use Optional[T] instead of T = None, and include return types for all functions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@deepnote_toolkit/sql/sql_execution.py` around lines 105 - 106, Several functions in this module are missing parameter and return type annotations; replace any parameter defaults like setup_statements=None with an explicit Optional type (e.g., setup_statements: Optional[Sequence[str]] = None) and add explicit return type annotations for each function that currently lacks them. Import typing primitives you need (from typing import Optional, Sequence, Iterable, Any) and update the function signatures (including the function that declares setup_statements) to use Optional[T] rather than T = None, and add concrete return types (e.g., -> None, -> int, -> List[Any], or -> Iterable[str] as appropriate) to the five un-annotated functions in this module so all parameters and returns are fully typed. Ensure the parameter name setup_statements is changed to setup_statements: Optional[<appropriate type>] = None and that each function signature and its return type reflect the actual values they produce.

coderabbitai · 2026-04-23T12:38:24Z



-def _execute_sql_on_engine(engine, query, bind_params):
+def _execute_sql_on_engine(engine, query, bind_params, setup_statements=None):


⚠️ Potential issue | 🟠 Major

Validate setup_statements type before iterating.

Line 669 accepts any iterable. If a caller passes a string, it runs one-character “statements” and fails with misleading SQL errors.

Suggested fix

def _execute_sql_on_engine(engine, query, bind_params, setup_statements=None): @@ + if setup_statements is None: + normalized_setup_statements: list[str] = [] + elif isinstance(setup_statements, list) and all( + isinstance(stmt, str) for stmt in setup_statements + ): + normalized_setup_statements = setup_statements + else: + raise TypeError("setup_statements must be a list[str] or None") + with engine.begin() as connection: @@ - for stmt in setup_statements or []: + for stmt in normalized_setup_statements: connection.exec_driver_sql(stmt)

Also applies to: 667-670

🧰 Tools

🪛 Ruff (0.15.10)

[warning] 631-631: Missing return type annotation for private function _execute_sql_on_engine

(ANN202)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@deepnote_toolkit/sql/sql_execution.py` at line 631, In _execute_sql_on_engine, validate setup_statements before iterating: ensure it's either None or an iterable of statements (not a plain string). Add a guard that detects isinstance(setup_statements, str) and either wrap it as a single-item list or raise a TypeError, and also validate items are strings (or convert them) before the loop that executes each statement to avoid iterating over characters and producing misleading SQL errors.

robertlacok requested a review from a team as a code owner April 23, 2026 11:45

robertlacok requested a review from m1so April 23, 2026 11:45

robertlacok marked this pull request as draft April 23, 2026 11:47

coderabbitai Bot requested changes Apr 23, 2026

View reviewed changes

robertlacok changed the title ~~feat: Reuse SQLAlchemy engine across execute_sql calls in a cell (SAL-51)~~ feat: Add _dntk.sql_session() to share engine + SSH tunnel across calls (SAL-51) Apr 23, 2026

robertlacok changed the title ~~feat: Add _dntk.sql_session() to share engine + SSH tunnel across calls (SAL-51)~~ feat: Add sql_session() to share engine + SSH tunnel across calls Apr 23, 2026

coderabbitai Bot requested changes Apr 23, 2026

View reviewed changes

robertlacok changed the title ~~feat: Add sql_session() to share engine + SSH tunnel across calls~~ feat: Add setup_statements parameter to execute_sql for session setup Apr 23, 2026

coderabbitai Bot requested changes Apr 23, 2026

View reviewed changes



		def _execute_sql_on_engine(engine, query, bind_params):
		def _execute_sql_on_engine(engine, query, bind_params, setup_statements=None):

Conversation

robertlacok commented Apr 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

How

Test plan

Usage

Summary by CodeRabbit

Uh oh!

linear Bot commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

github-actions Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

deepnote-bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robertlacok commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

github-actions Bot commented Apr 23, 2026 •

edited

Loading

codecov Bot commented Apr 23, 2026 •

edited

Loading

deepnote-bot commented Apr 23, 2026 •

edited

Loading