feat: Add setup_statements parameter to execute_sql for session setup#96
feat: Add setup_statements parameter to execute_sql for session setup#96robertlacok wants to merge 3 commits intomainfrom
Conversation
…-51) Cache the SQLAlchemy engine per IPython cell execution so that two back-to-back `_dntk.execute_sql*` calls in the same generated cell share the same physical DBAPI connection. This lets Snowflake session state set by `USE WAREHOUSE`, `USE ROLE`, `SET ...` etc. persist onto the next query, which unblocks supporting `USE WAREHOUSE abc; SELECT 123` in Snowflake SQL blocks. Cached engines are created with `pool_size=1, max_overflow=0` so `engine.begin()` always checks out the same connection. Disposed via a `post_run_cell` IPython hook. Caching is skipped for SSH-tunneled engines, user-supplied custom pool config, and when no IPython shell is available (script/CLI use) — all of those fall back to the previous create-then-dispose-per-call behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR introduces a Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
📦 Python package built successfully!
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #96 +/- ##
==========================================
- Coverage 74.32% 74.25% -0.08%
==========================================
Files 94 94
Lines 5535 5539 +4
Branches 824 826 +2
==========================================
- Hits 4114 4113 -1
- Misses 1155 1158 +3
- Partials 266 268 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
deepnote_toolkit/ipython_utils.py (1)
6-34: 🛠️ Refactor suggestion | 🟠 MajorType the hook registry and callback explicitly.
Lines 6-18 add a new hook API, but both
_registered_post_run_cell_callbacksandcallbackare untyped. That weakens static checking around event registration.Suggested fix
+from collections.abc import Callable +from typing import Any + -_registered_post_run_cell_callbacks: set = set() +_registered_post_run_cell_callbacks: set[Callable[..., Any]] = set() - -def register_post_run_cell_hook(callback) -> bool: +def register_post_run_cell_hook(callback: Callable[..., Any]) -> bool:As per coding guidelines, "Use explicit type hints for function parameters and return values" and "Use type hints consistently".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepnote_toolkit/ipython_utils.py` around lines 6 - 34, Add explicit type hints for the hook registry and callback: import typing symbols (Set, Callable, Any) and change the module-level variable to _registered_post_run_cell_callbacks: Set[Callable[..., Any]] = set(); update the register_post_run_cell_hook signature to def register_post_run_cell_hook(callback: Callable[..., Any]) -> bool: so static type checkers know the stored callable shape when registering with ip.events.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 465-485: The cache key in _compute_engine_cache_key currently uses
only ssh_options, pool-related params, integration_id, or url which ignores
other engine-defining options; update it to incorporate a deterministic
serialization of the relevant connection params (e.g., the params dict and any
connect_args/auth headers present in sql_alchemy_dict) into the returned key
while preserving the existing early returns for ssh_options-enabled and explicit
pool config; specifically, when integration_id exists include it plus a stable
encoding (sorted/JSON) of params/connect_args in the key (and if no
integration_id, include the URL plus the same stable encoding) so that differing
params produce distinct cache keys.
---
Outside diff comments:
In `@deepnote_toolkit/ipython_utils.py`:
- Around line 6-34: Add explicit type hints for the hook registry and callback:
import typing symbols (Set, Callable, Any) and change the module-level variable
to _registered_post_run_cell_callbacks: Set[Callable[..., Any]] = set(); update
the register_post_run_cell_hook signature to def
register_post_run_cell_hook(callback: Callable[..., Any]) -> bool: so static
type checkers know the stored callable shape when registering with ip.events.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b6e1c41b-ba25-4588-aa46-4a16f4340388
📒 Files selected for processing (4)
deepnote_toolkit/ipython_utils.pydeepnote_toolkit/sql/sql_execution.pytests/unit/test_ipython_utils.pytests/unit/test_sql_execution_internal.py
| def _compute_engine_cache_key(sql_alchemy_dict) -> Optional[str]: | ||
| """Stable key identifying the engine for *sql_alchemy_dict*, or None when | ||
| this connection should not be cached. | ||
|
|
||
| SSH-tunneled engines can't be cached because the tunnel only lives inside | ||
| `_create_sql_ssh_uri`'s context manager. Connections that already specify | ||
| a custom pool config in user `params` are also skipped to avoid clashing | ||
| with the `pool_size=1` we set on cached engines. | ||
| """ | ||
| if sql_alchemy_dict.get("ssh_options", {}).get("enabled"): | ||
| return None | ||
|
|
||
| params = sql_alchemy_dict.get("params") or {} | ||
| if "pool_size" in params or "max_overflow" in params or "poolclass" in params: | ||
| return None | ||
|
|
||
| integration_id = sql_alchemy_dict.get("integration_id") | ||
| if integration_id: | ||
| return f"integration:{integration_id}" | ||
|
|
||
| return f"url:{sql_alchemy_dict['url']}" |
There was a problem hiding this comment.
Include engine-defining params in the cache key.
Line 481 collapses the key to integration_id, and Line 485 falls back to just the raw URL. create_engine() also depends on params, so two calls in one cell can incorrectly share an engine even when connect_args, auth headers, or other connection options differ. That can silently run the second query with the wrong connection/session configuration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepnote_toolkit/sql/sql_execution.py` around lines 465 - 485, The cache key
in _compute_engine_cache_key currently uses only ssh_options, pool-related
params, integration_id, or url which ignores other engine-defining options;
update it to incorporate a deterministic serialization of the relevant
connection params (e.g., the params dict and any connect_args/auth headers
present in sql_alchemy_dict) into the returned key while preserving the existing
early returns for ssh_options-enabled and explicit pool config; specifically,
when integration_id exists include it plus a stable encoding (sorted/JSON) of
params/connect_args in the key (and if no integration_id, include the URL plus
the same stable encoding) so that differing params produce distinct cache keys.
|
🚀 Review App Deployment Started
|
Drops the implicit cache + IPython post_run_cell hook in favor of an explicit `_dntk.sql_session()` context manager. Inside the with-block each unique connection (keyed by integration_id, or URL when not present) opens its SSH tunnel and SQLAlchemy engine once and reuses both for every `execute_sql*` call inside the block. On exit the session disposes the engine and closes the tunnel together. This makes SSH tunnels work with SAL-51 reuse — the previous cache had to skip SSH because the tunnel's lifetime was bound to a single call by `_create_sql_ssh_uri`'s context manager. With the session owning both, the tunnel's lifetime now matches the engine's. Other gains: - No IPython coupling. Lifetime is whatever the caller (deepnote- internal codegen) wraps in the `with` block. - No "magic" cache key vs. live state divergence — outside a session behavior is exactly the previous per-call create+dispose. - Nested `sql_session()` blocks no-op on the inner block so the outer retains ownership. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
deepnote_toolkit/sql/sql_execution.py (1)
475-488:⚠️ Potential issue | 🟠 MajorExpand the session cache key beyond
integration_id/ URL.
_session_resource_key()still aliases every call for the same integration or raw URL. Inside onesql_session(), that lets a later call reuse an engine and SSH tunnel created with differentparams, auth headers, or SSH settings.Possible fix
def _session_resource_key(sql_alchemy_dict) -> Optional[str]: params = sql_alchemy_dict.get("params") or {} if "pool_size" in params or "max_overflow" in params or "poolclass" in params: return None - - integration_id = sql_alchemy_dict.get("integration_id") - if integration_id: - return f"integration:{integration_id}" - - return f"url:{sql_alchemy_dict['url']}" + key_payload = { + "integration_id": sql_alchemy_dict.get("integration_id"), + "url": sql_alchemy_dict["url"], + "params": params, + "ssh_options": sql_alchemy_dict.get("ssh_options"), + } + return json.dumps(key_payload, sort_keys=True, default=str)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepnote_toolkit/sql/sql_execution.py` around lines 475 - 488, _session_resource_key currently returns the same key for all connections sharing an integration_id or URL, allowing reuse of engines/tunnels with different connection-specific settings; change _session_resource_key(sql_alchemy_dict) to include a stable fingerprint of the connection-specific attributes (e.g., serialize and include params, auth headers, and SSH settings such as sql_alchemy_dict.get("params"), sql_alchemy_dict.get("headers") or auth token field, and sql_alchemy_dict.get("ssh_config")/ssh-related keys) when computing the key (but keep the existing early return None for explicit pool_size/max_overflow/poolclass). Use a deterministic serialization or short hash of those fields to build the returned string (e.g., f"integration:{integration_id}:{hash}" or f"url:{url}:{hash}") so engines/tunnels are only reused when all relevant settings match.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 365-399: Change _open_ssh_tunnel and _close_ssh_tunnel to have
explicit type hints and raise a specific exception type instead of a bare
Exception: add a small custom exception class (e.g., SSHTunnelError or
SSHTunnelSetupError) and use it when PRIVATE_SSH_KEY_BLOB is missing or
create_ssh_tunnel fails; annotate _open_ssh_tunnel signature to return
Tuple[Any, URL] (or the concrete tunnel type if available) and accept
sql_alchemy_dict: Mapping[str, Any], and annotate _close_ssh_tunnel(server:
Optional[Any]) -> None; update docstrings to state ownership/closing contract
and failure modes; keep references to the existing symbols (_open_ssh_tunnel,
_close_ssh_tunnel, create_ssh_tunnel, dnenv.get_env, make_url, URL) so reviewers
can find and validate the changes.
- Around line 561-572: The call to create_engine currently passes two kwarg
dicts separately which can cause a TypeError if sql_alchemy_dict["params"]
contains overlapping keys like pool_pre_ping; before calling create_engine (in
the block using suppress_third_party_deprecation_warnings and the in_session
logic), merge sql_alchemy_dict["params"] and extra_engine_params into a single
dict (e.g., copy sql_alchemy_dict["params"] then use setdefault or update
carefully) so extra_engine_params values only apply when keys are absent, then
call create_engine(url, **merged_params) instead of passing two dicts.
In `@tests/unit/test_sql_execution_internal.py`:
- Around line 380-386: Update the helper _make_sql_alchemy_dict to add explicit
type hints: annotate parameters url and params as Optional[str] and
Optional[Dict[str, Any]] (or appropriate param dict type) respectively, and
declare the function return type as Dict[str, Any] (or a more specific mapping
type used in tests). Adjust the function signature and any necessary imports
(from typing import Optional, Dict, Any) so the function reads e.g. def
_make_sql_alchemy_dict(integration_id: str = "integration_a", url: Optional[str]
= None, params: Optional[Dict[str, Any]] = None) -> Dict[str, Any].
---
Duplicate comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 475-488: _session_resource_key currently returns the same key for
all connections sharing an integration_id or URL, allowing reuse of
engines/tunnels with different connection-specific settings; change
_session_resource_key(sql_alchemy_dict) to include a stable fingerprint of the
connection-specific attributes (e.g., serialize and include params, auth
headers, and SSH settings such as sql_alchemy_dict.get("params"),
sql_alchemy_dict.get("headers") or auth token field, and
sql_alchemy_dict.get("ssh_config")/ssh-related keys) when computing the key (but
keep the existing early return None for explicit
pool_size/max_overflow/poolclass). Use a deterministic serialization or short
hash of those fields to build the returned string (e.g.,
f"integration:{integration_id}:{hash}" or f"url:{url}:{hash}") so
engines/tunnels are only reused when all relevant settings match.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 22352dcb-ac84-4524-b98f-c797b6e3f455
📒 Files selected for processing (3)
deepnote_toolkit/__init__.pydeepnote_toolkit/sql/sql_execution.pytests/unit/test_sql_execution_internal.py
| def _open_ssh_tunnel(sql_alchemy_dict) -> tuple[Any, Any]: | ||
| """Open an SSH tunnel for *sql_alchemy_dict* and return ``(server, url)``. | ||
|
|
||
| Caller is responsible for eventually closing the returned server with | ||
| ``_close_ssh_tunnel``. | ||
| """ | ||
| base64_encoded_key = dnenv.get_env("PRIVATE_SSH_KEY_BLOB") | ||
| if not base64_encoded_key: | ||
| raise Exception( | ||
| "The private key needed to establish the SSH connection is missing. Please try again or contact support." | ||
| ) | ||
| original_url = make_url(sql_alchemy_dict["url"]) | ||
| server = create_ssh_tunnel( | ||
| ssh_host=sql_alchemy_dict["ssh_options"]["host"], | ||
| ssh_port=int(sql_alchemy_dict["ssh_options"]["port"]), | ||
| ssh_user=sql_alchemy_dict["ssh_options"]["user"], | ||
| remote_host=original_url.host, | ||
| remote_port=int(original_url.port), | ||
| private_key=base64.b64decode(base64_encoded_key).decode("utf-8"), | ||
| ) | ||
| url = URL.create( | ||
| drivername=original_url.drivername, | ||
| username=original_url.username, | ||
| password=original_url.password, | ||
| host=server.local_bind_host, | ||
| port=server.local_bind_port, | ||
| database=original_url.database, | ||
| query=original_url.query, | ||
| ) | ||
| return server, url | ||
|
|
||
|
|
||
| def _close_ssh_tunnel(server) -> None: | ||
| if server is not None and server.is_active: | ||
| server.close() |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Type the new SSH/session helpers and use a specific exception.
These helpers are new control-path code, but their signatures are still untyped and _open_ssh_tunnel() raises a bare Exception. Please make the contract explicit so callers can reason about ownership and failures.
As per coding guidelines, "Use explicit type hints for function parameters and return values", "Use appropriate exception types for error handling", and "Document functions and classes with docstrings".
Also applies to: 475-581
🧰 Tools
🪛 Ruff (0.15.10)
[warning] 373-375: Create your own exception
(TRY002)
[warning] 373-375: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepnote_toolkit/sql/sql_execution.py` around lines 365 - 399, Change
_open_ssh_tunnel and _close_ssh_tunnel to have explicit type hints and raise a
specific exception type instead of a bare Exception: add a small custom
exception class (e.g., SSHTunnelError or SSHTunnelSetupError) and use it when
PRIVATE_SSH_KEY_BLOB is missing or create_ssh_tunnel fails; annotate
_open_ssh_tunnel signature to return Tuple[Any, URL] (or the concrete tunnel
type if available) and accept sql_alchemy_dict: Mapping[str, Any], and annotate
_close_ssh_tunnel(server: Optional[Any]) -> None; update docstrings to state
ownership/closing contract and failure modes; keep references to the existing
symbols (_open_ssh_tunnel, _close_ssh_tunnel, create_ssh_tunnel, dnenv.get_env,
make_url, URL) so reviewers can find and validate the changes.
| extra_engine_params: dict[str, Any] = {"pool_pre_ping": True} | ||
| if in_session: | ||
| # See sql_session() docstring for why pool_size=1 is required to make | ||
| # session state (USE WAREHOUSE, ...) persist across calls. | ||
| extra_engine_params["pool_size"] = 1 | ||
| extra_engine_params["max_overflow"] = 0 | ||
|
|
||
| try: | ||
| with suppress_third_party_deprecation_warnings(): | ||
| engine = create_engine( | ||
| url, **sql_alchemy_dict["params"], **extra_engine_params | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Python, what happens when the same keyword argument is supplied by two **kwargs expansions in a single function call?
💡 Result:
In Python (3.5+), supplying the same keyword argument from two kwargs expansions in a single function call, such as f(d1, d2) where d1 and d2 both contain the same key, raises TypeError: f got multiple values for keyword argument 'keyname'. This behavior is specified in PEP 448, which introduced multiple unpacking: "Currently, if an argument is given multiple times — such as a positional argument given both positionally and by keyword — a TypeError is raised. This remains true for duplicate arguments provided through multiple unpackings, e.g. f({'x': 2}, {'x': 3}), except that the error will be detected at runtime." Python merges the kwargs from left to right, but detects duplicates during argument processing in getargs.c and raises the error. This distinguishes from dictionary merging {d1, d2}, which keeps the last value without error. Example: def f(kwargs): pass d1 = {'a': 1} d2 = {'a': 2} f(d1, d2) # TypeError: f got multiple values for keyword argument 'a'
Citations:
- 1: https://peps.python.org/pep-0448/
- 2: https://stackoverflow.com/questions/38860676/multiple-kwargs-in-a-function-call
- 3: kwargs regression when there are multiple entries with the same key python/cpython#79815
- 4: python/cpython@e036ef8
- 5: https://discuss.python.org/t/why-dict-expansion-allows-dup-keys-but-kwargs-does-not/71291
🏁 Script executed:
cd deepnote_toolkit && find . -type f -name "sql_execution.py" | head -1Repository: deepnote/deepnote-toolkit
Length of output: 91
🏁 Script executed:
# Check the exact file and read lines around 561-572
wc -l deepnote_toolkit/sql/sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 110
🏁 Script executed:
# Read the relevant section
sed -n '555,580p' deepnote_toolkit/sql/sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 973
🏁 Script executed:
# Search for sql_alchemy_dict definition and usage
rg "sql_alchemy_dict" deepnote_toolkit/sql/sql_execution.py -B 3 -A 3Repository: deepnote/deepnote-toolkit
Length of output: 8322
🏁 Script executed:
# Check where pool_pre_ping is used in the codebase
rg "pool_pre_ping" deepnote_toolkit/Repository: deepnote/deepnote-toolkit
Length of output: 172
🏁 Script executed:
# Check for tests that might show if pool_pre_ping is passed in params
find . -name "*test*sql*" -type f | head -10Repository: deepnote/deepnote-toolkit
Length of output: 349
🏁 Script executed:
# Search for any configuration or default params that might include pool_pre_ping
rg "pool_pre_ping\|params.*=" deepnote_toolkit/sql/ -B 2 -A 2Repository: deepnote/deepnote-toolkit
Length of output: 51
🏁 Script executed:
# Check the _session_resource_key function more carefully to understand the pattern
sed -n '730,750p' deepnote_toolkit/sql/sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 910
🏁 Script executed:
# Search for the _session_resource_key function definition
rg "_session_resource_key" deepnote_toolkit/sql/sql_execution.py -A 15Repository: deepnote/deepnote-toolkit
Length of output: 1266
🏁 Script executed:
# Look for test files that might test the engine creation with params
grep -l "pool_pre_ping\|create_engine\|_acquire_engine" tests/unit/test_sql_execution*.pyRepository: deepnote/deepnote-toolkit
Length of output: 143
🏁 Script executed:
# Check test_sql_execution.py for relevant tests
head -100 tests/unit/test_sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 3193
🏁 Script executed:
# Check if there are any tests showing pool_pre_ping in params
rg "pool_pre_ping" tests/Repository: deepnote/deepnote-toolkit
Length of output: 238
🏁 Script executed:
# Search for any examples of params being passed in tests
rg "params.*:" tests/unit/test_sql_execution.py | head -20Repository: deepnote/deepnote-toolkit
Length of output: 1661
🏁 Script executed:
# Look at the full context of _acquire_engine to understand the flow
sed -n '760,800p' deepnote_toolkit/sql/sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 1689
🏁 Script executed:
# Look at the test_sql_execution_internal.py file to see the pool_pre_ping tests
grep -B 10 -A 10 "pool_pre_ping" tests/unit/test_sql_execution_internal.pyRepository: deepnote/deepnote-toolkit
Length of output: 1726
🏁 Script executed:
# Check if sql_alchemy_dict["params"] could ever contain pool_pre_ping by searching all uses
rg "\"params\".*pool_pre_ping|pool_pre_ping.*\"params\"" deepnote_toolkit/Repository: deepnote/deepnote-toolkit
Length of output: 51
🏁 Script executed:
# Look at how sql_alchemy_dict["params"] is built/populated
sed -n '420,480p' deepnote_toolkit/sql/sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 2670
🏁 Script executed:
# Check the _make_sql_alchemy_dict function used in tests to see what params typically look like
rg "_make_sql_alchemy_dict" tests/unit/test_sql_execution_internal.py -A 20Repository: deepnote/deepnote-toolkit
Length of output: 6504
🏁 Script executed:
# Look at the execute_sql_with_connection_json to see how sql_alchemy_dict is created in real usage
sed -n '200,250p' deepnote_toolkit/sql/sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 1598
🏁 Script executed:
# Check if there's any documentation or comments about where pool_pre_ping could come from
rg -B 3 -A 3 "pool_pre_ping|extra_engine_params" deepnote_toolkit/sql/sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 691
Merge engine kwargs before calling create_engine().
If sql_alchemy_dict["params"] contains pool_pre_ping, passing create_engine(url, **params, **extra_engine_params) raises TypeError due to duplicate kwargs. Merge to a single dict first and use setdefault() to avoid conflicts.
Suggested fix
- extra_engine_params: dict[str, Any] = {"pool_pre_ping": True}
+ engine_params: dict[str, Any] = dict(sql_alchemy_dict["params"])
+ engine_params.setdefault("pool_pre_ping", True)
if in_session:
# See sql_session() docstring for why pool_size=1 is required to make
# session state (USE WAREHOUSE, ...) persist across calls.
- extra_engine_params["pool_size"] = 1
- extra_engine_params["max_overflow"] = 0
+ engine_params["pool_size"] = 1
+ engine_params["max_overflow"] = 0
try:
with suppress_third_party_deprecation_warnings():
- engine = create_engine(
- url, **sql_alchemy_dict["params"], **extra_engine_params
- )
+ engine = create_engine(url, **engine_params)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepnote_toolkit/sql/sql_execution.py` around lines 561 - 572, The call to
create_engine currently passes two kwarg dicts separately which can cause a
TypeError if sql_alchemy_dict["params"] contains overlapping keys like
pool_pre_ping; before calling create_engine (in the block using
suppress_third_party_deprecation_warnings and the in_session logic), merge
sql_alchemy_dict["params"] and extra_engine_params into a single dict (e.g.,
copy sql_alchemy_dict["params"] then use setdefault or update carefully) so
extra_engine_params values only apply when keys are absent, then call
create_engine(url, **merged_params) instead of passing two dicts.
| def _make_sql_alchemy_dict(integration_id="integration_a", url=None, params=None): | ||
| return { | ||
| "url": url or "postgresql://u:p@localhost:5432/db", | ||
| "params": params if params is not None else {}, | ||
| "param_style": "qmark", | ||
| "integration_id": integration_id, | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Annotate _make_sql_alchemy_dict() explicitly.
url=None and params=None should be Optional[...], and this helper should declare its return type. It is small, but it defines most of the new session test setup.
As per coding guidelines, "Always use Optional[T] for parameters that can be None (not T = None)" and "Use explicit type hints for function parameters and return values".
🧰 Tools
🪛 Ruff (0.15.10)
[warning] 380-380: Missing return type annotation for private function _make_sql_alchemy_dict
(ANN202)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/test_sql_execution_internal.py` around lines 380 - 386, Update the
helper _make_sql_alchemy_dict to add explicit type hints: annotate parameters
url and params as Optional[str] and Optional[Dict[str, Any]] (or appropriate
param dict type) respectively, and declare the function return type as Dict[str,
Any] (or a more specific mapping type used in tests). Adjust the function
signature and any necessary imports (from typing import Optional, Dict, Any) so
the function reads e.g. def _make_sql_alchemy_dict(integration_id: str =
"integration_a", url: Optional[str] = None, params: Optional[Dict[str, Any]] =
None) -> Dict[str, Any].
Drops the sql_session() context manager and the engine lifecycle changes it required (per-cell registry, pool_size=1, SSH tunnel lifetime extension). Restores the original per-call create_engine + dispose flow. In its place, execute_sql / execute_sql_with_connection_json grow an optional setup_statements: list[str] parameter. When provided, those statements are run via connection.exec_driver_sql on the same DBAPI connection inside the same engine.begin() block as the main query, so session state set by `USE WAREHOUSE`, `USE ROLE`, `SET ...`, etc. is in effect for the main query without changing the engine's lifetime. Single call, single engine, single tunnel, single connection — no shared mutable state across calls and no special-casing for SSH or custom pool config. DuckDB's process-wide singleton means setup statements naturally persist there too; the duckdb branch executes them on the singleton before the main query for API parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
deepnote_toolkit/sql/sql_execution.py (1)
419-447:⚠️ Potential issue | 🟠 Major
setup_statementsare not part of cache semantics.Line 435 enables cache even when
setup_statementschanges session context (e.g., warehouse/role/schema). That can return stale/wrong data from a different setup context.Suggested fix (safe default: disable cache when setup statements are present)
def _execute_sql_with_caching( @@ - sql_caching_enabled = ( - sql_cache_mode != "cache_disabled" and return_variable_type == "dataframe" - ) + has_setup_statements = bool(setup_statements) + sql_caching_enabled = ( + sql_cache_mode != "cache_disabled" + and return_variable_type == "dataframe" + and not has_setup_statements + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepnote_toolkit/sql/sql_execution.py` around lines 419 - 447, The current cache path (controlled by sql_caching_enabled / can_get_sql_cache and get_sql_cache) ignores setup_statements, which can change session context and produce stale/wrong results; update the logic that computes sql_caching_enabled / can_get_sql_cache (or add an explicit guard before calling get_sql_cache) to disable caching when setup_statements is non-empty (i.e., treat any truthy setup_statements as a reason to skip get_sql_cache), ensuring the existing duckdb short-circuit (requires_duckdb) and return_variable_type checks remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Line 631: In _execute_sql_on_engine, validate setup_statements before
iterating: ensure it's either None or an iterable of statements (not a plain
string). Add a guard that detects isinstance(setup_statements, str) and either
wrap it as a single-item list or raise a TypeError, and also validate items are
strings (or convert them) before the loop that executes each statement to avoid
iterating over characters and producing misleading SQL errors.
- Around line 105-106: Several functions in this module are missing parameter
and return type annotations; replace any parameter defaults like
setup_statements=None with an explicit Optional type (e.g., setup_statements:
Optional[Sequence[str]] = None) and add explicit return type annotations for
each function that currently lacks them. Import typing primitives you need (from
typing import Optional, Sequence, Iterable, Any) and update the function
signatures (including the function that declares setup_statements) to use
Optional[T] rather than T = None, and add concrete return types (e.g., -> None,
-> int, -> List[Any], or -> Iterable[str] as appropriate) to the five
un-annotated functions in this module so all parameters and returns are fully
typed. Ensure the parameter name setup_statements is changed to
setup_statements: Optional[<appropriate type>] = None and that each function
signature and its return type reflect the actual values they produce.
---
Outside diff comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 419-447: The current cache path (controlled by sql_caching_enabled
/ can_get_sql_cache and get_sql_cache) ignores setup_statements, which can
change session context and produce stale/wrong results; update the logic that
computes sql_caching_enabled / can_get_sql_cache (or add an explicit guard
before calling get_sql_cache) to disable caching when setup_statements is
non-empty (i.e., treat any truthy setup_statements as a reason to skip
get_sql_cache), ensuring the existing duckdb short-circuit (requires_duckdb) and
return_variable_type checks remain unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1e59874a-01db-41f1-9708-1368007f2686
📒 Files selected for processing (3)
deepnote_toolkit/sql/sql_execution.pytests/unit/test_sql_execution.pytests/unit/test_sql_execution_internal.py
| setup_statements=None, | ||
| ): |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
wc -l deepnote_toolkit/sql/sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 110
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '100,115p'Repository: deepnote/deepnote-toolkit
Length of output: 976
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '235,250p'Repository: deepnote/deepnote-toolkit
Length of output: 868
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '414,430p'Repository: deepnote/deepnote-toolkit
Length of output: 973
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '492,505p'Repository: deepnote/deepnote-toolkit
Length of output: 581
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '625,640p'Repository: deepnote/deepnote-toolkit
Length of output: 957
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '95,110p'Repository: deepnote/deepnote-toolkit
Length of output: 682
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '230,250p'Repository: deepnote/deepnote-toolkit
Length of output: 979
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '408,435p'Repository: deepnote/deepnote-toolkit
Length of output: 1272
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '485,510p'Repository: deepnote/deepnote-toolkit
Length of output: 928
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '625,660p'Repository: deepnote/deepnote-toolkit
Length of output: 1882
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '99,107p'Repository: deepnote/deepnote-toolkit
Length of output: 344
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '234,242p'Repository: deepnote/deepnote-toolkit
Length of output: 331
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '410,421p'Repository: deepnote/deepnote-toolkit
Length of output: 453
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '490,499p'Repository: deepnote/deepnote-toolkit
Length of output: 395
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '631,647p'Repository: deepnote/deepnote-toolkit
Length of output: 1076
Add type hints to all function parameters and return annotations.
The five functions at lines 99, 234, 410, 490, and 631 lack any parameter or return type annotations. Replace setup_statements=None with setup_statements: Optional[...] and add explicit return type annotations to each function. Per coding guidelines, use Optional[T] instead of T = None, and include return types for all functions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepnote_toolkit/sql/sql_execution.py` around lines 105 - 106, Several
functions in this module are missing parameter and return type annotations;
replace any parameter defaults like setup_statements=None with an explicit
Optional type (e.g., setup_statements: Optional[Sequence[str]] = None) and add
explicit return type annotations for each function that currently lacks them.
Import typing primitives you need (from typing import Optional, Sequence,
Iterable, Any) and update the function signatures (including the function that
declares setup_statements) to use Optional[T] rather than T = None, and add
concrete return types (e.g., -> None, -> int, -> List[Any], or -> Iterable[str]
as appropriate) to the five un-annotated functions in this module so all
parameters and returns are fully typed. Ensure the parameter name
setup_statements is changed to setup_statements: Optional[<appropriate type>] =
None and that each function signature and its return type reflect the actual
values they produce.
|
|
||
|
|
||
| def _execute_sql_on_engine(engine, query, bind_params): | ||
| def _execute_sql_on_engine(engine, query, bind_params, setup_statements=None): |
There was a problem hiding this comment.
Validate setup_statements type before iterating.
Line 669 accepts any iterable. If a caller passes a string, it runs one-character “statements” and fails with misleading SQL errors.
Suggested fix
def _execute_sql_on_engine(engine, query, bind_params, setup_statements=None):
@@
+ if setup_statements is None:
+ normalized_setup_statements: list[str] = []
+ elif isinstance(setup_statements, list) and all(
+ isinstance(stmt, str) for stmt in setup_statements
+ ):
+ normalized_setup_statements = setup_statements
+ else:
+ raise TypeError("setup_statements must be a list[str] or None")
+
with engine.begin() as connection:
@@
- for stmt in setup_statements or []:
+ for stmt in normalized_setup_statements:
connection.exec_driver_sql(stmt)Also applies to: 667-670
🧰 Tools
🪛 Ruff (0.15.10)
[warning] 631-631: Missing return type annotation for private function _execute_sql_on_engine
(ANN202)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepnote_toolkit/sql/sql_execution.py` at line 631, In
_execute_sql_on_engine, validate setup_statements before iterating: ensure it's
either None or an iterable of statements (not a plain string). Add a guard that
detects isinstance(setup_statements, str) and either wrap it as a single-item
list or raise a TypeError, and also validate items are strings (or convert them)
before the loop that executes each statement to avoid iterating over characters
and producing misleading SQL errors.
Summary
setup_statements: list[str]parameter toexecute_sqlandexecute_sql_with_connection_json.connection.exec_driver_sqlon the same DBAPI connection inside the sameengine.begin()block as the main query, so any session state it sets (SnowflakeUSE WAREHOUSE,USE ROLE,SET ..., etc.) is in effect for the main query.Why
Each
execute_sql*call currently doescreate_engine(...)and thenengine.dispose()infinally. That makes it impossible to run a sequence likeas two separate
execute_sqlcalls and have the second inherit session state from the first — the connection is gone before the second call runs.setup_statementssolves that without changing engine, pool, or SSH tunnel lifecycle: the setup statements piggy-back on the same connection the main query uses.How
setup_statementskwarg onexecute_sqlandexecute_sql_with_connection_json, plumbed through_execute_sql_with_cachingand_query_data_sourceto_execute_sql_on_engine._execute_sql_on_engine, beforepandas.read_sql_query:engine.begin()rolls back)._execute_sql_with_cachingruns setup statements on the process-wide DuckDB singleton before the main query, for API parity (DuckDB session state already persists across calls anyway).Test plan
tests/unit/test_sql_execution_internal.py:Noneare no-opstests/unit/test_sql_execution.py:setup_statementsis plumbed fromexecute_sqldown to_query_data_sourceNonepoetry run pytest tests/unit/— 755 passed (excludes pre-existing pyspark collection errors intest_chart.py/test_ocelots.py).poetry run blackandpoetry run flake8clean.Usage
🤖 Generated with Claude Code
Summary by CodeRabbit