Skip to content

feat: Add setup_statements parameter to execute_sql for session setup#96

Draft
robertlacok wants to merge 3 commits intomainfrom
robert/sal-51-reuse-engine-per-cell
Draft

feat: Add setup_statements parameter to execute_sql for session setup#96
robertlacok wants to merge 3 commits intomainfrom
robert/sal-51-reuse-engine-per-cell

Conversation

@robertlacok
Copy link
Copy Markdown
Contributor

@robertlacok robertlacok commented Apr 23, 2026

Summary

  • Add an optional setup_statements: list[str] parameter to execute_sql and execute_sql_with_connection_json.
  • When provided, each statement is run via connection.exec_driver_sql on the same DBAPI connection inside the same engine.begin() block as the main query, so any session state it sets (Snowflake USE WAREHOUSE, USE ROLE, SET ..., etc.) is in effect for the main query.
  • Single call, single engine, single tunnel, single connection. Engine lifecycle is unchanged from today.

Why

Each execute_sql* call currently does create_engine(...) and then engine.dispose() in finally. That makes it impossible to run a sequence like

USE WAREHOUSE abc;
SELECT 123

as two separate execute_sql calls and have the second inherit session state from the first — the connection is gone before the second call runs.

setup_statements solves that without changing engine, pool, or SSH tunnel lifecycle: the setup statements piggy-back on the same connection the main query uses.

How

  • New setup_statements kwarg on execute_sql and execute_sql_with_connection_json, plumbed through _execute_sql_with_caching and _query_data_source to _execute_sql_on_engine.
  • Inside _execute_sql_on_engine, before pandas.read_sql_query:
    with engine.begin() as connection:
        for stmt in setup_statements or []:
            connection.exec_driver_sql(stmt)
        ...
        return pd.read_sql_query(query, ...)
  • Setup statements are not Jinja-rendered, parameter-bound, or audit-commented — they're raw SQL executed in order. A failing setup statement aborts the main query (the engine.begin() rolls back).
  • DuckDB branch in _execute_sql_with_caching runs setup statements on the process-wide DuckDB singleton before the main query, for API parity (DuckDB session state already persists across calls anyway).

Test plan

  • New tests in tests/unit/test_sql_execution_internal.py:
    • setup statements execute in order, on the same connection, before the main query
    • empty list and None are no-ops
    • a failing setup statement propagates and the main query is not run
  • New tests in tests/unit/test_sql_execution.py:
    • setup_statements is plumbed from execute_sql down to _query_data_source
    • default is None
  • poetry run pytest tests/unit/ — 755 passed (excludes pre-existing pyspark collection errors in test_chart.py / test_ocelots.py).
  • poetry run black and poetry run flake8 clean.

Usage

df = _dntk.execute_sql(
    "SELECT 123",
    "SQL_INTEG",
    setup_statements=["USE WAREHOUSE abc"],
)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added support for setup SQL statements that execute before the main query on the same database connection. Setup statements run sequentially in the order provided, and any execution failure prevents the main query from running.

…-51)

Cache the SQLAlchemy engine per IPython cell execution so that two
back-to-back `_dntk.execute_sql*` calls in the same generated cell share
the same physical DBAPI connection. This lets Snowflake session state
set by `USE WAREHOUSE`, `USE ROLE`, `SET ...` etc. persist onto the next
query, which unblocks supporting `USE WAREHOUSE abc; SELECT 123` in
Snowflake SQL blocks.

Cached engines are created with `pool_size=1, max_overflow=0` so
`engine.begin()` always checks out the same connection. Disposed via a
`post_run_cell` IPython hook. Caching is skipped for SSH-tunneled
engines, user-supplied custom pool config, and when no IPython shell is
available (script/CLI use) — all of those fall back to the previous
create-then-dispose-per-call behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@robertlacok robertlacok requested a review from a team as a code owner April 23, 2026 11:45
@robertlacok robertlacok requested a review from m1so April 23, 2026 11:45
@linear
Copy link
Copy Markdown

linear Bot commented Apr 23, 2026

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

This PR introduces a setup_statements parameter to the SQL execution APIs (execute_sql_with_connection_json and execute_sql). The parameter accepts a list of SQL statements that execute in order immediately before the main query on the same underlying connection. For DuckDB, setup statements run via the existing execute_duckdb_sql function; for SQLAlchemy-backed engines, they execute within the transaction block using connection.exec_driver_sql before the main query. If any setup statement fails, the main query is prevented from executing. The parameter is threaded through five internal functions and covered by unit tests verifying correct execution order, no-op behavior when absent, and error propagation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Updates Docs ⚠️ Warning Pull request adds setup_statements parameter to public API functions but documentation was not updated to reflect this new functionality. Update documentation in docs/user/configuration.md or README.md to describe the setup_statements parameter, its behavior, and include usage examples.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title accurately describes the main change: adding a setup_statements parameter to execute_sql for session initialization.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

📦 Python package built successfully!

  • Version: 2.2.1.dev4+c040781
  • Wheel: deepnote_toolkit-2.2.1.dev4+c040781-py3-none-any.whl
  • Install:
    pip install "deepnote-toolkit @ https://deepnote-staging-runtime-artifactory.s3.amazonaws.com/deepnote-toolkit-packages/2.2.1.dev4%2Bc040781/deepnote_toolkit-2.2.1.dev4%2Bc040781-py3-none-any.whl"

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.25%. Comparing base (f2ac75a) to head (029e245).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
deepnote_toolkit/sql/sql_execution.py 66.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #96      +/-   ##
==========================================
- Coverage   74.32%   74.25%   -0.08%     
==========================================
  Files          94       94              
  Lines        5535     5539       +4     
  Branches      824      826       +2     
==========================================
- Hits         4114     4113       -1     
- Misses       1155     1158       +3     
- Partials      266      268       +2     
Flag Coverage Δ
combined 74.25% <66.66%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@robertlacok robertlacok marked this pull request as draft April 23, 2026 11:47
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deepnote_toolkit/ipython_utils.py (1)

6-34: 🛠️ Refactor suggestion | 🟠 Major

Type the hook registry and callback explicitly.

Lines 6-18 add a new hook API, but both _registered_post_run_cell_callbacks and callback are untyped. That weakens static checking around event registration.

Suggested fix
+from collections.abc import Callable
+from typing import Any
+
-_registered_post_run_cell_callbacks: set = set()
+_registered_post_run_cell_callbacks: set[Callable[..., Any]] = set()
 
-
-def register_post_run_cell_hook(callback) -> bool:
+def register_post_run_cell_hook(callback: Callable[..., Any]) -> bool:

As per coding guidelines, "Use explicit type hints for function parameters and return values" and "Use type hints consistently".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/ipython_utils.py` around lines 6 - 34, Add explicit type
hints for the hook registry and callback: import typing symbols (Set, Callable,
Any) and change the module-level variable to
_registered_post_run_cell_callbacks: Set[Callable[..., Any]] = set(); update the
register_post_run_cell_hook signature to def
register_post_run_cell_hook(callback: Callable[..., Any]) -> bool: so static
type checkers know the stored callable shape when registering with ip.events.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 465-485: The cache key in _compute_engine_cache_key currently uses
only ssh_options, pool-related params, integration_id, or url which ignores
other engine-defining options; update it to incorporate a deterministic
serialization of the relevant connection params (e.g., the params dict and any
connect_args/auth headers present in sql_alchemy_dict) into the returned key
while preserving the existing early returns for ssh_options-enabled and explicit
pool config; specifically, when integration_id exists include it plus a stable
encoding (sorted/JSON) of params/connect_args in the key (and if no
integration_id, include the URL plus the same stable encoding) so that differing
params produce distinct cache keys.

---

Outside diff comments:
In `@deepnote_toolkit/ipython_utils.py`:
- Around line 6-34: Add explicit type hints for the hook registry and callback:
import typing symbols (Set, Callable, Any) and change the module-level variable
to _registered_post_run_cell_callbacks: Set[Callable[..., Any]] = set(); update
the register_post_run_cell_hook signature to def
register_post_run_cell_hook(callback: Callable[..., Any]) -> bool: so static
type checkers know the stored callable shape when registering with ip.events.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b6e1c41b-ba25-4588-aa46-4a16f4340388

📥 Commits

Reviewing files that changed from the base of the PR and between f2ac75a and c716237.

📒 Files selected for processing (4)
  • deepnote_toolkit/ipython_utils.py
  • deepnote_toolkit/sql/sql_execution.py
  • tests/unit/test_ipython_utils.py
  • tests/unit/test_sql_execution_internal.py

Comment thread deepnote_toolkit/sql/sql_execution.py Outdated
Comment on lines +465 to +485
def _compute_engine_cache_key(sql_alchemy_dict) -> Optional[str]:
"""Stable key identifying the engine for *sql_alchemy_dict*, or None when
this connection should not be cached.

SSH-tunneled engines can't be cached because the tunnel only lives inside
`_create_sql_ssh_uri`'s context manager. Connections that already specify
a custom pool config in user `params` are also skipped to avoid clashing
with the `pool_size=1` we set on cached engines.
"""
if sql_alchemy_dict.get("ssh_options", {}).get("enabled"):
return None

params = sql_alchemy_dict.get("params") or {}
if "pool_size" in params or "max_overflow" in params or "poolclass" in params:
return None

integration_id = sql_alchemy_dict.get("integration_id")
if integration_id:
return f"integration:{integration_id}"

return f"url:{sql_alchemy_dict['url']}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Include engine-defining params in the cache key.

Line 481 collapses the key to integration_id, and Line 485 falls back to just the raw URL. create_engine() also depends on params, so two calls in one cell can incorrectly share an engine even when connect_args, auth headers, or other connection options differ. That can silently run the second query with the wrong connection/session configuration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/sql/sql_execution.py` around lines 465 - 485, The cache key
in _compute_engine_cache_key currently uses only ssh_options, pool-related
params, integration_id, or url which ignores other engine-defining options;
update it to incorporate a deterministic serialization of the relevant
connection params (e.g., the params dict and any connect_args/auth headers
present in sql_alchemy_dict) into the returned key while preserving the existing
early returns for ssh_options-enabled and explicit pool config; specifically,
when integration_id exists include it plus a stable encoding (sorted/JSON) of
params/connect_args in the key (and if no integration_id, include the URL plus
the same stable encoding) so that differing params produce distinct cache keys.

@deepnote-bot
Copy link
Copy Markdown

deepnote-bot commented Apr 23, 2026

🚀 Review App Deployment Started

📝 Description 🌐 Link / Info
🌍 Review application ra-96
🔑 Sign-in URL Click to sign-in
📊 Application logs View logs
🔄 Actions Click to redeploy
🚀 ArgoCD deployment View deployment
Last deployed 2026-04-23 12:40:59 (UTC)
📜 Deployed commit b4fc42a6593b8a694f0ada1e9a5993aa72dde677
🛠️ Toolkit version c040781

Drops the implicit cache + IPython post_run_cell hook in favor of an
explicit `_dntk.sql_session()` context manager. Inside the with-block
each unique connection (keyed by integration_id, or URL when not
present) opens its SSH tunnel and SQLAlchemy engine once and reuses
both for every `execute_sql*` call inside the block. On exit the
session disposes the engine and closes the tunnel together.

This makes SSH tunnels work with SAL-51 reuse — the previous cache
had to skip SSH because the tunnel's lifetime was bound to a single
call by `_create_sql_ssh_uri`'s context manager. With the session
owning both, the tunnel's lifetime now matches the engine's.

Other gains:
- No IPython coupling. Lifetime is whatever the caller (deepnote-
  internal codegen) wraps in the `with` block.
- No "magic" cache key vs. live state divergence — outside a session
  behavior is exactly the previous per-call create+dispose.
- Nested `sql_session()` blocks no-op on the inner block so the outer
  retains ownership.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@robertlacok robertlacok changed the title feat: Reuse SQLAlchemy engine across execute_sql calls in a cell (SAL-51) feat: Add _dntk.sql_session() to share engine + SSH tunnel across calls (SAL-51) Apr 23, 2026
@robertlacok robertlacok changed the title feat: Add _dntk.sql_session() to share engine + SSH tunnel across calls (SAL-51) feat: Add sql_session() to share engine + SSH tunnel across calls Apr 23, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
deepnote_toolkit/sql/sql_execution.py (1)

475-488: ⚠️ Potential issue | 🟠 Major

Expand the session cache key beyond integration_id / URL.

_session_resource_key() still aliases every call for the same integration or raw URL. Inside one sql_session(), that lets a later call reuse an engine and SSH tunnel created with different params, auth headers, or SSH settings.

Possible fix
 def _session_resource_key(sql_alchemy_dict) -> Optional[str]:
     params = sql_alchemy_dict.get("params") or {}
     if "pool_size" in params or "max_overflow" in params or "poolclass" in params:
         return None
-
-    integration_id = sql_alchemy_dict.get("integration_id")
-    if integration_id:
-        return f"integration:{integration_id}"
-
-    return f"url:{sql_alchemy_dict['url']}"
+    key_payload = {
+        "integration_id": sql_alchemy_dict.get("integration_id"),
+        "url": sql_alchemy_dict["url"],
+        "params": params,
+        "ssh_options": sql_alchemy_dict.get("ssh_options"),
+    }
+    return json.dumps(key_payload, sort_keys=True, default=str)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/sql/sql_execution.py` around lines 475 - 488,
_session_resource_key currently returns the same key for all connections sharing
an integration_id or URL, allowing reuse of engines/tunnels with different
connection-specific settings; change _session_resource_key(sql_alchemy_dict) to
include a stable fingerprint of the connection-specific attributes (e.g.,
serialize and include params, auth headers, and SSH settings such as
sql_alchemy_dict.get("params"), sql_alchemy_dict.get("headers") or auth token
field, and sql_alchemy_dict.get("ssh_config")/ssh-related keys) when computing
the key (but keep the existing early return None for explicit
pool_size/max_overflow/poolclass). Use a deterministic serialization or short
hash of those fields to build the returned string (e.g.,
f"integration:{integration_id}:{hash}" or f"url:{url}:{hash}") so
engines/tunnels are only reused when all relevant settings match.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 365-399: Change _open_ssh_tunnel and _close_ssh_tunnel to have
explicit type hints and raise a specific exception type instead of a bare
Exception: add a small custom exception class (e.g., SSHTunnelError or
SSHTunnelSetupError) and use it when PRIVATE_SSH_KEY_BLOB is missing or
create_ssh_tunnel fails; annotate _open_ssh_tunnel signature to return
Tuple[Any, URL] (or the concrete tunnel type if available) and accept
sql_alchemy_dict: Mapping[str, Any], and annotate _close_ssh_tunnel(server:
Optional[Any]) -> None; update docstrings to state ownership/closing contract
and failure modes; keep references to the existing symbols (_open_ssh_tunnel,
_close_ssh_tunnel, create_ssh_tunnel, dnenv.get_env, make_url, URL) so reviewers
can find and validate the changes.
- Around line 561-572: The call to create_engine currently passes two kwarg
dicts separately which can cause a TypeError if sql_alchemy_dict["params"]
contains overlapping keys like pool_pre_ping; before calling create_engine (in
the block using suppress_third_party_deprecation_warnings and the in_session
logic), merge sql_alchemy_dict["params"] and extra_engine_params into a single
dict (e.g., copy sql_alchemy_dict["params"] then use setdefault or update
carefully) so extra_engine_params values only apply when keys are absent, then
call create_engine(url, **merged_params) instead of passing two dicts.

In `@tests/unit/test_sql_execution_internal.py`:
- Around line 380-386: Update the helper _make_sql_alchemy_dict to add explicit
type hints: annotate parameters url and params as Optional[str] and
Optional[Dict[str, Any]] (or appropriate param dict type) respectively, and
declare the function return type as Dict[str, Any] (or a more specific mapping
type used in tests). Adjust the function signature and any necessary imports
(from typing import Optional, Dict, Any) so the function reads e.g. def
_make_sql_alchemy_dict(integration_id: str = "integration_a", url: Optional[str]
= None, params: Optional[Dict[str, Any]] = None) -> Dict[str, Any].

---

Duplicate comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 475-488: _session_resource_key currently returns the same key for
all connections sharing an integration_id or URL, allowing reuse of
engines/tunnels with different connection-specific settings; change
_session_resource_key(sql_alchemy_dict) to include a stable fingerprint of the
connection-specific attributes (e.g., serialize and include params, auth
headers, and SSH settings such as sql_alchemy_dict.get("params"),
sql_alchemy_dict.get("headers") or auth token field, and
sql_alchemy_dict.get("ssh_config")/ssh-related keys) when computing the key (but
keep the existing early return None for explicit
pool_size/max_overflow/poolclass). Use a deterministic serialization or short
hash of those fields to build the returned string (e.g.,
f"integration:{integration_id}:{hash}" or f"url:{url}:{hash}") so
engines/tunnels are only reused when all relevant settings match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 22352dcb-ac84-4524-b98f-c797b6e3f455

📥 Commits

Reviewing files that changed from the base of the PR and between c716237 and 6762926.

📒 Files selected for processing (3)
  • deepnote_toolkit/__init__.py
  • deepnote_toolkit/sql/sql_execution.py
  • tests/unit/test_sql_execution_internal.py

Comment thread deepnote_toolkit/sql/sql_execution.py Outdated
Comment on lines +365 to +399
def _open_ssh_tunnel(sql_alchemy_dict) -> tuple[Any, Any]:
"""Open an SSH tunnel for *sql_alchemy_dict* and return ``(server, url)``.

Caller is responsible for eventually closing the returned server with
``_close_ssh_tunnel``.
"""
base64_encoded_key = dnenv.get_env("PRIVATE_SSH_KEY_BLOB")
if not base64_encoded_key:
raise Exception(
"The private key needed to establish the SSH connection is missing. Please try again or contact support."
)
original_url = make_url(sql_alchemy_dict["url"])
server = create_ssh_tunnel(
ssh_host=sql_alchemy_dict["ssh_options"]["host"],
ssh_port=int(sql_alchemy_dict["ssh_options"]["port"]),
ssh_user=sql_alchemy_dict["ssh_options"]["user"],
remote_host=original_url.host,
remote_port=int(original_url.port),
private_key=base64.b64decode(base64_encoded_key).decode("utf-8"),
)
url = URL.create(
drivername=original_url.drivername,
username=original_url.username,
password=original_url.password,
host=server.local_bind_host,
port=server.local_bind_port,
database=original_url.database,
query=original_url.query,
)
return server, url


def _close_ssh_tunnel(server) -> None:
if server is not None and server.is_active:
server.close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Type the new SSH/session helpers and use a specific exception.

These helpers are new control-path code, but their signatures are still untyped and _open_ssh_tunnel() raises a bare Exception. Please make the contract explicit so callers can reason about ownership and failures.

As per coding guidelines, "Use explicit type hints for function parameters and return values", "Use appropriate exception types for error handling", and "Document functions and classes with docstrings".

Also applies to: 475-581

🧰 Tools
🪛 Ruff (0.15.10)

[warning] 373-375: Create your own exception

(TRY002)


[warning] 373-375: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/sql/sql_execution.py` around lines 365 - 399, Change
_open_ssh_tunnel and _close_ssh_tunnel to have explicit type hints and raise a
specific exception type instead of a bare Exception: add a small custom
exception class (e.g., SSHTunnelError or SSHTunnelSetupError) and use it when
PRIVATE_SSH_KEY_BLOB is missing or create_ssh_tunnel fails; annotate
_open_ssh_tunnel signature to return Tuple[Any, URL] (or the concrete tunnel
type if available) and accept sql_alchemy_dict: Mapping[str, Any], and annotate
_close_ssh_tunnel(server: Optional[Any]) -> None; update docstrings to state
ownership/closing contract and failure modes; keep references to the existing
symbols (_open_ssh_tunnel, _close_ssh_tunnel, create_ssh_tunnel, dnenv.get_env,
make_url, URL) so reviewers can find and validate the changes.

Comment thread deepnote_toolkit/sql/sql_execution.py Outdated
Comment on lines +561 to +572
extra_engine_params: dict[str, Any] = {"pool_pre_ping": True}
if in_session:
# See sql_session() docstring for why pool_size=1 is required to make
# session state (USE WAREHOUSE, ...) persist across calls.
extra_engine_params["pool_size"] = 1
extra_engine_params["max_overflow"] = 0

try:
with suppress_third_party_deprecation_warnings():
engine = create_engine(
url, **sql_alchemy_dict["params"], **extra_engine_params
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

In Python, what happens when the same keyword argument is supplied by two **kwargs expansions in a single function call?

💡 Result:

In Python (3.5+), supplying the same keyword argument from two kwargs expansions in a single function call, such as f(d1, d2) where d1 and d2 both contain the same key, raises TypeError: f got multiple values for keyword argument 'keyname'. This behavior is specified in PEP 448, which introduced multiple unpacking: "Currently, if an argument is given multiple times — such as a positional argument given both positionally and by keyword — a TypeError is raised. This remains true for duplicate arguments provided through multiple unpackings, e.g. f({'x': 2}, {'x': 3}), except that the error will be detected at runtime." Python merges the kwargs from left to right, but detects duplicates during argument processing in getargs.c and raises the error. This distinguishes from dictionary merging {d1, d2}, which keeps the last value without error. Example: def f(kwargs): pass d1 = {'a': 1} d2 = {'a': 2} f(d1, d2) # TypeError: f got multiple values for keyword argument 'a'

Citations:


🏁 Script executed:

cd deepnote_toolkit && find . -type f -name "sql_execution.py" | head -1

Repository: deepnote/deepnote-toolkit

Length of output: 91


🏁 Script executed:

# Check the exact file and read lines around 561-572
wc -l deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 110


🏁 Script executed:

# Read the relevant section
sed -n '555,580p' deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 973


🏁 Script executed:

# Search for sql_alchemy_dict definition and usage
rg "sql_alchemy_dict" deepnote_toolkit/sql/sql_execution.py -B 3 -A 3

Repository: deepnote/deepnote-toolkit

Length of output: 8322


🏁 Script executed:

# Check where pool_pre_ping is used in the codebase
rg "pool_pre_ping" deepnote_toolkit/

Repository: deepnote/deepnote-toolkit

Length of output: 172


🏁 Script executed:

# Check for tests that might show if pool_pre_ping is passed in params
find . -name "*test*sql*" -type f | head -10

Repository: deepnote/deepnote-toolkit

Length of output: 349


🏁 Script executed:

# Search for any configuration or default params that might include pool_pre_ping
rg "pool_pre_ping\|params.*=" deepnote_toolkit/sql/ -B 2 -A 2

Repository: deepnote/deepnote-toolkit

Length of output: 51


🏁 Script executed:

# Check the _session_resource_key function more carefully to understand the pattern
sed -n '730,750p' deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 910


🏁 Script executed:

# Search for the _session_resource_key function definition
rg "_session_resource_key" deepnote_toolkit/sql/sql_execution.py -A 15

Repository: deepnote/deepnote-toolkit

Length of output: 1266


🏁 Script executed:

# Look for test files that might test the engine creation with params
grep -l "pool_pre_ping\|create_engine\|_acquire_engine" tests/unit/test_sql_execution*.py

Repository: deepnote/deepnote-toolkit

Length of output: 143


🏁 Script executed:

# Check test_sql_execution.py for relevant tests
head -100 tests/unit/test_sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 3193


🏁 Script executed:

# Check if there are any tests showing pool_pre_ping in params
rg "pool_pre_ping" tests/

Repository: deepnote/deepnote-toolkit

Length of output: 238


🏁 Script executed:

# Search for any examples of params being passed in tests
rg "params.*:" tests/unit/test_sql_execution.py | head -20

Repository: deepnote/deepnote-toolkit

Length of output: 1661


🏁 Script executed:

# Look at the full context of _acquire_engine to understand the flow
sed -n '760,800p' deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 1689


🏁 Script executed:

# Look at the test_sql_execution_internal.py file to see the pool_pre_ping tests
grep -B 10 -A 10 "pool_pre_ping" tests/unit/test_sql_execution_internal.py

Repository: deepnote/deepnote-toolkit

Length of output: 1726


🏁 Script executed:

# Check if sql_alchemy_dict["params"] could ever contain pool_pre_ping by searching all uses
rg "\"params\".*pool_pre_ping|pool_pre_ping.*\"params\"" deepnote_toolkit/

Repository: deepnote/deepnote-toolkit

Length of output: 51


🏁 Script executed:

# Look at how sql_alchemy_dict["params"] is built/populated
sed -n '420,480p' deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 2670


🏁 Script executed:

# Check the _make_sql_alchemy_dict function used in tests to see what params typically look like
rg "_make_sql_alchemy_dict" tests/unit/test_sql_execution_internal.py -A 20

Repository: deepnote/deepnote-toolkit

Length of output: 6504


🏁 Script executed:

# Look at the execute_sql_with_connection_json to see how sql_alchemy_dict is created in real usage
sed -n '200,250p' deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 1598


🏁 Script executed:

# Check if there's any documentation or comments about where pool_pre_ping could come from
rg -B 3 -A 3 "pool_pre_ping|extra_engine_params" deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 691


Merge engine kwargs before calling create_engine().

If sql_alchemy_dict["params"] contains pool_pre_ping, passing create_engine(url, **params, **extra_engine_params) raises TypeError due to duplicate kwargs. Merge to a single dict first and use setdefault() to avoid conflicts.

Suggested fix
-    extra_engine_params: dict[str, Any] = {"pool_pre_ping": True}
+    engine_params: dict[str, Any] = dict(sql_alchemy_dict["params"])
+    engine_params.setdefault("pool_pre_ping", True)
     if in_session:
         # See sql_session() docstring for why pool_size=1 is required to make
         # session state (USE WAREHOUSE, ...) persist across calls.
-        extra_engine_params["pool_size"] = 1
-        extra_engine_params["max_overflow"] = 0
+        engine_params["pool_size"] = 1
+        engine_params["max_overflow"] = 0

     try:
         with suppress_third_party_deprecation_warnings():
-            engine = create_engine(
-                url, **sql_alchemy_dict["params"], **extra_engine_params
-            )
+            engine = create_engine(url, **engine_params)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/sql/sql_execution.py` around lines 561 - 572, The call to
create_engine currently passes two kwarg dicts separately which can cause a
TypeError if sql_alchemy_dict["params"] contains overlapping keys like
pool_pre_ping; before calling create_engine (in the block using
suppress_third_party_deprecation_warnings and the in_session logic), merge
sql_alchemy_dict["params"] and extra_engine_params into a single dict (e.g.,
copy sql_alchemy_dict["params"] then use setdefault or update carefully) so
extra_engine_params values only apply when keys are absent, then call
create_engine(url, **merged_params) instead of passing two dicts.

Comment on lines +380 to +386
def _make_sql_alchemy_dict(integration_id="integration_a", url=None, params=None):
return {
"url": url or "postgresql://u:p@localhost:5432/db",
"params": params if params is not None else {},
"param_style": "qmark",
"integration_id": integration_id,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Annotate _make_sql_alchemy_dict() explicitly.

url=None and params=None should be Optional[...], and this helper should declare its return type. It is small, but it defines most of the new session test setup.

As per coding guidelines, "Always use Optional[T] for parameters that can be None (not T = None)" and "Use explicit type hints for function parameters and return values".

🧰 Tools
🪛 Ruff (0.15.10)

[warning] 380-380: Missing return type annotation for private function _make_sql_alchemy_dict

(ANN202)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_sql_execution_internal.py` around lines 380 - 386, Update the
helper _make_sql_alchemy_dict to add explicit type hints: annotate parameters
url and params as Optional[str] and Optional[Dict[str, Any]] (or appropriate
param dict type) respectively, and declare the function return type as Dict[str,
Any] (or a more specific mapping type used in tests). Adjust the function
signature and any necessary imports (from typing import Optional, Dict, Any) so
the function reads e.g. def _make_sql_alchemy_dict(integration_id: str =
"integration_a", url: Optional[str] = None, params: Optional[Dict[str, Any]] =
None) -> Dict[str, Any].

Drops the sql_session() context manager and the engine lifecycle
changes it required (per-cell registry, pool_size=1, SSH tunnel
lifetime extension). Restores the original per-call create_engine +
dispose flow.

In its place, execute_sql / execute_sql_with_connection_json grow an
optional setup_statements: list[str] parameter. When provided, those
statements are run via connection.exec_driver_sql on the same DBAPI
connection inside the same engine.begin() block as the main query, so
session state set by `USE WAREHOUSE`, `USE ROLE`, `SET ...`, etc. is
in effect for the main query without changing the engine's lifetime.

Single call, single engine, single tunnel, single connection — no
shared mutable state across calls and no special-casing for SSH or
custom pool config.

DuckDB's process-wide singleton means setup statements naturally
persist there too; the duckdb branch executes them on the singleton
before the main query for API parity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@robertlacok robertlacok changed the title feat: Add sql_session() to share engine + SSH tunnel across calls feat: Add setup_statements parameter to execute_sql for session setup Apr 23, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deepnote_toolkit/sql/sql_execution.py (1)

419-447: ⚠️ Potential issue | 🟠 Major

setup_statements are not part of cache semantics.

Line 435 enables cache even when setup_statements changes session context (e.g., warehouse/role/schema). That can return stale/wrong data from a different setup context.

Suggested fix (safe default: disable cache when setup statements are present)
 def _execute_sql_with_caching(
@@
-    sql_caching_enabled = (
-        sql_cache_mode != "cache_disabled" and return_variable_type == "dataframe"
-    )
+    has_setup_statements = bool(setup_statements)
+    sql_caching_enabled = (
+        sql_cache_mode != "cache_disabled"
+        and return_variable_type == "dataframe"
+        and not has_setup_statements
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/sql/sql_execution.py` around lines 419 - 447, The current
cache path (controlled by sql_caching_enabled / can_get_sql_cache and
get_sql_cache) ignores setup_statements, which can change session context and
produce stale/wrong results; update the logic that computes sql_caching_enabled
/ can_get_sql_cache (or add an explicit guard before calling get_sql_cache) to
disable caching when setup_statements is non-empty (i.e., treat any truthy
setup_statements as a reason to skip get_sql_cache), ensuring the existing
duckdb short-circuit (requires_duckdb) and return_variable_type checks remain
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Line 631: In _execute_sql_on_engine, validate setup_statements before
iterating: ensure it's either None or an iterable of statements (not a plain
string). Add a guard that detects isinstance(setup_statements, str) and either
wrap it as a single-item list or raise a TypeError, and also validate items are
strings (or convert them) before the loop that executes each statement to avoid
iterating over characters and producing misleading SQL errors.
- Around line 105-106: Several functions in this module are missing parameter
and return type annotations; replace any parameter defaults like
setup_statements=None with an explicit Optional type (e.g., setup_statements:
Optional[Sequence[str]] = None) and add explicit return type annotations for
each function that currently lacks them. Import typing primitives you need (from
typing import Optional, Sequence, Iterable, Any) and update the function
signatures (including the function that declares setup_statements) to use
Optional[T] rather than T = None, and add concrete return types (e.g., -> None,
-> int, -> List[Any], or -> Iterable[str] as appropriate) to the five
un-annotated functions in this module so all parameters and returns are fully
typed. Ensure the parameter name setup_statements is changed to
setup_statements: Optional[<appropriate type>] = None and that each function
signature and its return type reflect the actual values they produce.

---

Outside diff comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 419-447: The current cache path (controlled by sql_caching_enabled
/ can_get_sql_cache and get_sql_cache) ignores setup_statements, which can
change session context and produce stale/wrong results; update the logic that
computes sql_caching_enabled / can_get_sql_cache (or add an explicit guard
before calling get_sql_cache) to disable caching when setup_statements is
non-empty (i.e., treat any truthy setup_statements as a reason to skip
get_sql_cache), ensuring the existing duckdb short-circuit (requires_duckdb) and
return_variable_type checks remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1e59874a-01db-41f1-9708-1368007f2686

📥 Commits

Reviewing files that changed from the base of the PR and between 6762926 and 029e245.

📒 Files selected for processing (3)
  • deepnote_toolkit/sql/sql_execution.py
  • tests/unit/test_sql_execution.py
  • tests/unit/test_sql_execution_internal.py

Comment on lines +105 to 106
setup_statements=None,
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

wc -l deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 110


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '100,115p'

Repository: deepnote/deepnote-toolkit

Length of output: 976


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '235,250p'

Repository: deepnote/deepnote-toolkit

Length of output: 868


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '414,430p'

Repository: deepnote/deepnote-toolkit

Length of output: 973


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '492,505p'

Repository: deepnote/deepnote-toolkit

Length of output: 581


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '625,640p'

Repository: deepnote/deepnote-toolkit

Length of output: 957


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '95,110p'

Repository: deepnote/deepnote-toolkit

Length of output: 682


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '230,250p'

Repository: deepnote/deepnote-toolkit

Length of output: 979


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '408,435p'

Repository: deepnote/deepnote-toolkit

Length of output: 1272


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '485,510p'

Repository: deepnote/deepnote-toolkit

Length of output: 928


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '625,660p'

Repository: deepnote/deepnote-toolkit

Length of output: 1882


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '99,107p'

Repository: deepnote/deepnote-toolkit

Length of output: 344


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '234,242p'

Repository: deepnote/deepnote-toolkit

Length of output: 331


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '410,421p'

Repository: deepnote/deepnote-toolkit

Length of output: 453


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '490,499p'

Repository: deepnote/deepnote-toolkit

Length of output: 395


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '631,647p'

Repository: deepnote/deepnote-toolkit

Length of output: 1076


Add type hints to all function parameters and return annotations.

The five functions at lines 99, 234, 410, 490, and 631 lack any parameter or return type annotations. Replace setup_statements=None with setup_statements: Optional[...] and add explicit return type annotations to each function. Per coding guidelines, use Optional[T] instead of T = None, and include return types for all functions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/sql/sql_execution.py` around lines 105 - 106, Several
functions in this module are missing parameter and return type annotations;
replace any parameter defaults like setup_statements=None with an explicit
Optional type (e.g., setup_statements: Optional[Sequence[str]] = None) and add
explicit return type annotations for each function that currently lacks them.
Import typing primitives you need (from typing import Optional, Sequence,
Iterable, Any) and update the function signatures (including the function that
declares setup_statements) to use Optional[T] rather than T = None, and add
concrete return types (e.g., -> None, -> int, -> List[Any], or -> Iterable[str]
as appropriate) to the five un-annotated functions in this module so all
parameters and returns are fully typed. Ensure the parameter name
setup_statements is changed to setup_statements: Optional[<appropriate type>] =
None and that each function signature and its return type reflect the actual
values they produce.



def _execute_sql_on_engine(engine, query, bind_params):
def _execute_sql_on_engine(engine, query, bind_params, setup_statements=None):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate setup_statements type before iterating.

Line 669 accepts any iterable. If a caller passes a string, it runs one-character “statements” and fails with misleading SQL errors.

Suggested fix
 def _execute_sql_on_engine(engine, query, bind_params, setup_statements=None):
@@
+    if setup_statements is None:
+        normalized_setup_statements: list[str] = []
+    elif isinstance(setup_statements, list) and all(
+        isinstance(stmt, str) for stmt in setup_statements
+    ):
+        normalized_setup_statements = setup_statements
+    else:
+        raise TypeError("setup_statements must be a list[str] or None")
+
     with engine.begin() as connection:
@@
-        for stmt in setup_statements or []:
+        for stmt in normalized_setup_statements:
             connection.exec_driver_sql(stmt)

Also applies to: 667-670

🧰 Tools
🪛 Ruff (0.15.10)

[warning] 631-631: Missing return type annotation for private function _execute_sql_on_engine

(ANN202)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/sql/sql_execution.py` at line 631, In
_execute_sql_on_engine, validate setup_statements before iterating: ensure it's
either None or an iterable of statements (not a plain string). Add a guard that
detects isinstance(setup_statements, str) and either wrap it as a single-item
list or raise a TypeError, and also validate items are strings (or convert them)
before the loop that executes each statement to avoid iterating over characters
and producing misleading SQL errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants