Skip to content

Add changes for compatibility with WASM components and collocated UDF servers#121

Open
kesmit13 wants to merge 42 commits into
mainfrom
wasm-compat
Open

Add changes for compatibility with WASM components and collocated UDF servers#121
kesmit13 wants to merge 42 commits into
mainfrom
wasm-compat

Conversation

@kesmit13
Copy link
Copy Markdown
Collaborator

@kesmit13 kesmit13 commented Apr 1, 2026

This PR makes several changes to allow the singlestoredb work in WASM environments. Many of these changes benefit standard installations as well such as lazy loading of numpy, pandas, polars, and pyarrow. Others move imports that are only needed in certain environments, but not within WASM.

A new collocated UDF server implementation is also included that uses a high-performance loop in the C extension to parse and call Python functions on each row. This function is used both by standard collocated servers as well as WASM-based UDF handlers.


Note

High Risk
High risk because it adds a new Unix-socket UDF execution mode with dynamic exec()-based function registration and significantly expands the C accelerator for ROWDAT_1 parsing/serialization and socket/mmap I/O, which can affect stability and memory-safety/performance on the UDF hot path.

Overview
Adds a new plugin-mode UDF server (singlestoredb.functions.ext.plugin) that serves Python UDFs over a Unix socket with thread or pre-fork process pools, including a CLI entry point (python-udf-server) and control signals like @@health, @@functions, and @@register (dynamic function registration).

Expands the C extension (accel.c) to support ROWDAT_1 DECIMAL/DATE/TIME/DATETIME/TIMESTAMP encode/decode, introduces a combined call_function_accel load→call→dump fast path, and adds low-level helpers (mmap_read, mmap_write, recv_exact) with WASM stubs. Updates JSON/ASGI UDF typing/encoding for datetime/decimal, implements VECTOR() dtype, and shifts numpy/pandas/polars/pyarrow handling toward lazy/type-map access.

Also refactors internal result handling in Connection._iquery to accept multiple dataframe/table return types via duck-typing, tweaks SHOW output key handling, and moves jwt to a local import to reduce baseline imports; docs are updated to describe the new plugin mode and environment variables.

Reviewed by Cursor Bugbot for commit 87def99. Bugbot is set up for automated code reviews on this repo. Configure here.

kesmit13 and others added 12 commits March 19, 2026 15:38
Defer top-level `import jwt` to function scope in auth.py,
management/manager.py, and management/utils.py (jwt unavailable in WASM).
Catch OSError in mysql/connection.py getpass handling (pwd module
unavailable in WASM). Broaden except clause for IPython import in
utils/events.py.

Add singlestoredb/functions/ext/wasm/ package with udf_handler.py and
numpy_stub.py so componentize-py components can `pip install` this
branch and import directly from singlestoredb.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Required by componentize-py to build function-handler components.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Build complete @udf-decorated Python functions from signature metadata
and raw function body instead of requiring full source code. This adds
dtype-to-Python type mapping and constructs properly annotated functions
at registration time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Heavy optional dependencies (numpy, pandas, polars, pyarrow) were
imported at module load time, causing failures in WASM environments
where these packages may not be available. This adds a lazy import
utility module and converts all eager try/except import patterns to
use cached lazy accessors. Type maps in dtypes.py are also converted
from module-level dicts to lru_cached factory functions. The pandas
DataFrame isinstance check in connection.py is replaced with a
duck-type hasattr check to avoid importing pandas at module scope.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace `str | None` with `Optional[str]` to maintain compatibility
with Python 3.9 and earlier.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add the call_function_accel function directly to accel.c, implementing
a combined load/call/dump operation for UDF function calls. This function
handles rowdat_1 deserialization, Python UDF invocation, and result
serialization in a single optimized C implementation.

Previously this function was injected at build time via a patch script
in the wasm-udf-server repository. Moving it into the source tree is a
prerequisite for cleaning up the custom componentize-py builder and
simplifying the WASM component build process.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add resources/build_wasm.sh that cross-compiles the package as a WASM
wheel targeting wasm32-wasip2. The script sets up a host venv, configures
the WASI SDK toolchain (clang, ar, linker flags), and uses `python -m
build` to produce the wheel, then unpacks it into build/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
numpy is lazy-loaded throughout the codebase via the _lazy_import
helpers, so the WASM numpy_stub that patched sys.modules['numpy']
is no longer needed. Delete the stub module and remove its
references from udf_handler.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a standalone collocated UDF server package that can run as a
drop-in replacement for the Rust wasm-udf-server. Uses pre-fork
worker processes (default) for true CPU parallelism, avoiding GIL
contention in the C-accelerated call path. Thread pool mode is
available via --process-mode thread.

Collapse the wasm subpackage into a single wasm.py module since it
only contained one class re-exported through __init__.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Each forked worker previously created its own independent SharedRegistry
and FunctionRegistry. When @@register arrived at a worker, only that
worker's local registry was updated — the main process and sibling
workers never learned about the new function.

Add Unix pipe-based IPC (matching the R UDF server fix): each worker
gets a pipe back to the main process. When a worker handles @@register,
it writes the registration payload to its pipe. The main process reads
it via select.poll(), applies the registration to its own SharedRegistry,
then kills and re-forks all workers so they inherit the updated state.

Thread mode is unaffected — pipe_write_fd is None and the pipe write
is a no-op.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add poll()-based timeout to C recv_exact to avoid the interaction between
Python's settimeout() (which sets O_NONBLOCK on the fd) and direct
fd-level recv() in the C code. When the fd was non-blocking, recv()
returned EAGAIN immediately when no data was available, which the C code
treated as an error, closing the connection and causing EPIPE on the
client side.

- accel.c: Add optional timeout_ms parameter to recv_exact that uses
  poll(POLLIN) before each recv() call, raising TimeoutError on timeout.
  Also add mmap_read and mmap_write C helpers for fd-level I/O.
- connection.py: Only call settimeout() for the Python fallback path;
  keep fd blocking for C accel path. Pass 100ms timeout to C recv_exact.
  Catch TimeoutError instead of socket.timeout. Replace select() loop
  with timeout-based recv. Add C accel paths for mmap read/write.
  Add optional per-request profiling via SINGLESTOREDB_UDF_PROFILE=1.
- registry.py: Consolidate accel imports (mmap_read, mmap_write,
  recv_exact) under single _has_accel flag.
- wasm.py: Update to use renamed _has_accel flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread accel.c
Comment thread accel.c
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces WASM-compatibility improvements (primarily by lazy-loading heavyweight optional dependencies and moving environment-specific imports into call sites) and adds a new collocated Python UDF server implementation, including a new C-extension hot path to accelerate rowdat_1 decode → Python call → rowdat_1 encode.

Changes:

  • Added a WIT interface definition and a WASM build helper script for external UDF component workflows.
  • Refactored optional dependency handling (numpy/pandas/polars/pyarrow, IPython, JWT) to be more robust in constrained/WASM-like environments.
  • Added a new collocated UDF server (socket + mmap protocol, thread/process modes, dynamic registration) and a C-extension accelerator entry point (call_function_accel).

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
wit/udf.wit Defines the external UDF WIT interface and exported world.
singlestoredb/utils/_lazy_import.py Adds cached lazy imports for heavy optional deps.
singlestoredb/utils/dtypes.py Converts dtype maps to lazily-evaluated, cached getters.
singlestoredb/utils/results.py Switches result formatting to lazy imports + cached type maps.
singlestoredb/utils/events.py Broadens IPython import failure handling.
singlestoredb/converters.py Uses lazy numpy import in vector converters.
singlestoredb/connection.py Adjusts internal result-to-dict conversion to avoid importing pandas.
singlestoredb/mysql/connection.py Adds WASM-friendly DEFAULT_USER detection (handles OSError).
singlestoredb/auth.py Moves jwt import into call site.
singlestoredb/management/utils.py Moves jwt import into call sites for WASM-friendliness.
singlestoredb/management/manager.py Moves jwt import into is_jwt call site.
singlestoredb/functions/dtypes.py Updates exports to use dtype-map getter functions.
singlestoredb/functions/ext/rowdat_1.py Replaces eager dtype maps with lazy getter functions.
singlestoredb/functions/ext/json.py Replaces eager dtype maps with lazy getter functions.
singlestoredb/functions/ext/collocated/* Adds collocated server, protocol handling, registry, control signals, and WASM adapter.
singlestoredb/tests/test_connection.py Makes pandas string dtype assertions version-tolerant.
resources/build_wasm.sh Adds a build helper for wasm32-wasip2 wheels.
pyproject.toml Adds python-udf-server CLI entry point.
accel.c Adds call_function_accel C hot path and exports it from the extension module.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread accel.c
Comment thread accel.c Outdated
Comment thread accel.c
Comment thread singlestoredb/functions/ext/plugin/registry.py Outdated
Comment thread singlestoredb/functions/ext/plugin/connection.py
Comment thread singlestoredb/functions/ext/plugin/connection.py
Comment thread singlestoredb/connection.py Outdated
Comment thread singlestoredb/utils/events.py Outdated
Comment thread singlestoredb/functions/ext/collocated/server.py Outdated
Comment thread singlestoredb/functions/ext/plugin/connection.py
When numpy is not available (e.g., WASM), the `np` name is undefined.
The has_numpy flag was already used elsewhere but this check was missed
when the numpy_stub was removed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The mmap_read, mmap_write, and recv_exact functions use poll.h,
sys/mman.h, and sys/socket.h which are unavailable in WASI. Wrap
these includes, function bodies, and PyMethodDef entries with
#ifndef __wasi__ guards so the C extension compiles for wasm32-wasip2.
The core call_function_accel optimization remains available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without this, the accel status log messages ("Using accelerated C
call_function_accel loop" / "Using pure Python call_function loop")
are silently dropped because no logging handler is configured in the
WASM handler path. setup_logging() was only called from __main__.py
(collocated server CLI).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The _singlestoredb_accel C extension ifdef'd out the mmap and socket
functions for __wasi__ builds, but registry.py imports all four symbols
(call_function_accel, mmap_read, mmap_write, recv_exact) in a single
try block. The missing exports caused the entire import to fail,
silently falling back to the pure Python call_function loop.

Add #else stubs that raise NotImplementedError if called, so the
symbols are importable and call_function_accel works in WASM. Also
capture the accel import error and log it in initialize() for future
diagnostics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 11 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread singlestoredb/functions/ext/collocated/registry.py Outdated
Comment thread accel.c Outdated
Comment thread accel.c
Comment thread accel.c
Comment thread singlestoredb/functions/ext/collocated/connection.py Outdated
Comment thread singlestoredb/functions/ext/collocated/connection.py Outdated
Comment thread singlestoredb/functions/ext/collocated/server.py Outdated
Comment thread singlestoredb/functions/ext/plugin/server.py
Comment thread wit/udf.wit Outdated
Comment thread singlestoredb/functions/ext/collocated/wasm.py Outdated
accel.c:
- Replace empty TODO type stubs with NotImplementedError raises
- Add CHECK_REMAINING macro for bounds checking on buffer reads
- Replace unaligned pointer-cast reads with memcpy for WASM/ARM safety
- Fix double-decref in output error paths (set to NULL before goto)
- Fix Py_None reference leak by removing pre-switch INCREF
- Fix MYSQL_TYPE_NULL consuming an extra byte from next column
- Add PyErr_Format in default switch cases
- Add PyErr_Occurred() checks after PyLong/PyFloat conversions

Python:
- Align list/tuple multi-return handling in registry.py with C path
- Add _write_all_fd helper for partial os.write() handling
- Harden handshake recvmsg: name length bound, ancdata validation,
  MSG_CTRUNC check, FD cleanup on error
- Wrap get_context('fork') with platform safety error
- Narrow events.py exception catch to (ImportError, OSError)
- Fix _iquery DataFrame check ordering (check before list())
- Expand setblocking(False) warning comment
- Update WIT and wasm.py docstrings for code parameter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
kesmit13 and others added 2 commits April 22, 2026 14:34
…est coverage

Add missing CHECKSIZE/CHECK_REMAINING guards for MYSQL_TYPE_NULL in both
the C accelerator and Python rowdat_1 paths. Refactor decimal and datetime
unpacking in rowdat_1.py to properly propagate null values. Expand tests
for ext func data parsing, plugin UDF server components, and VECTOR type
assertions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract shared datetime encode/decode into static inline helpers,
replace repeated 11-case string/binary label blocks with macros,
and replace the numpy pre-scan switch with a type descriptor table.
Reduces ~500 lines of duplication across 5 functions with zero
runtime overhead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread singlestoredb/connection.py
Connection._iquery now applies under2camel to dict results via fix_names,
making the second conversion in ShowAccessor._iquery unnecessary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lean up tests

Fix a bug in json.py _dump_vectors where the null mask was never applied
because `m is not None` is always True for boolean mask values. Also add
15 new tests for call_function_accel covering datetime/date/time/decimal
types, error paths, and edge cases. Remove leftover debug pprint, fix
test data inconsistency, and document @@register security boundary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 33 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread accel.c Outdated
Comment on lines +2678 to +2681
u64 = (uint64_t)py_dec;
memcpy(out_cols[i] + j * 8, &u64, 8);
CHECKRC(PyDict_SetItem(py_objs, PyLong_FromUnsignedLongLong(u64), py_dec));
Py_CLEAR(py_dec);
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These PyDict_SetItem calls pass a freshly-created PyLong key (PyLong_FromUnsignedLongLong(...)) without DECREF'ing it afterward. Since PyDict_SetItem increments references to both key and value, the temporary key object leaks here (and similarly for other inserted objects). Store the key in a local PyObject*, call PyDict_SetItem, then Py_DECREF(key), or use PyDict_SetItemString with a stable key strategy if possible.

Copilot uses AI. Check for mistakes.
f'Expected 2 FDs, got {len(received_fds)}',
)
return

Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handshake reads the function name using a single recvmsg(namelen, ...), but recvmsg is not guaranteed to return all requested bytes. If it returns a partial name, msg.decode() will succeed with truncated data and the protocol will desync. Treat short reads as an error (e.g., verify len(msg) == namelen and abort/close), or loop until the full name is received (while ensuring SCM_RIGHTS fds are only accepted once).

Suggested change
if len(msg) != namelen:
logger.warning(
f'Expected function name of {namelen} bytes, got {len(msg)}',
)
return

Copilot uses AI. Check for mistakes.
Comment on lines +202 to +205
sock_path = self.config['socket']
if os.path.exists(sock_path):
os.unlink(sock_path)

Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_bind_socket unconditionally unlinks any existing path. If the socket path is user-configurable (often under /tmp), this can be exploited to delete arbitrary files (e.g., via symlink or replacing the path with a regular file) when the server starts. Before unlinking, validate that the existing path is a Unix domain socket (and ideally owned by the current user), or use a safer strategy like binding to a uniquely generated path and refusing to overwrite non-sockets.

Copilot uses AI. Check for mistakes.
Comment on lines +383 to +391
name = '__main__'
compiled = compile(full_code, f'<{name}>', 'exec')

if name in sys.modules:
module = sys.modules[name]
else:
module = types.ModuleType(name)
module.__file__ = f'<{name}>'
sys.modules[name] = module
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_function executes generated code into the 'main' module when it exists (which it typically will for any running process). This can pollute/overwrite the server's own main globals and makes dynamic registrations interfere with unrelated code in the process. Use a dedicated module namespace for dynamic UDFs (e.g., a unique module name per registration batch or a private module like 'singlestoredb.functions.ext.plugin._dynamic'), and avoid reusing the process' real main module.

Suggested change
name = '__main__'
compiled = compile(full_code, f'<{name}>', 'exec')
if name in sys.modules:
module = sys.modules[name]
else:
module = types.ModuleType(name)
module.__file__ = f'<{name}>'
sys.modules[name] = module
name = (
'singlestoredb.functions.ext.plugin._dynamic.'
f'{func_name}_{os.urandom(8).hex()}'
)
compiled = compile(full_code, f'<{name}>', 'exec')
module = types.ModuleType(name)
module.__file__ = f'<{name}>'
module.__package__ = 'singlestoredb.functions.ext.plugin._dynamic'
sys.modules[name] = module

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +42
from functools import lru_cache
from typing import Any
from typing import Optional


@lru_cache(maxsize=None)
def get_numpy() -> Optional[Any]:
"""Return numpy module or None if not installed."""
try:
return importlib.import_module('numpy')
except ImportError:
return None


@lru_cache(maxsize=None)
def get_pandas() -> Optional[Any]:
"""Return pandas module or None if not installed."""
try:
return importlib.import_module('pandas')
except ImportError:
return None


@lru_cache(maxsize=None)
def get_polars() -> Optional[Any]:
"""Return polars module or None if not installed."""
try:
return importlib.import_module('polars')
except ImportError:
return None


@lru_cache(maxsize=None)
def get_pyarrow() -> Optional[Any]:
"""Return pyarrow module or None if not installed."""
try:
return importlib.import_module('pyarrow')
except ImportError:
return None
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lazy import helpers only catch ImportError. In constrained environments (e.g., WASM/WASI, missing shared library loaders, binary wheels built for the wrong platform), importing optional deps often raises OSError (or other exceptions) instead. To preserve the intended "optional" behavior, broaden the exception handling (at least ImportError + OSError) and consider storing/logging the exception for debugging when a module is present but fails to import.

Suggested change
from functools import lru_cache
from typing import Any
from typing import Optional
@lru_cache(maxsize=None)
def get_numpy() -> Optional[Any]:
"""Return numpy module or None if not installed."""
try:
return importlib.import_module('numpy')
except ImportError:
return None
@lru_cache(maxsize=None)
def get_pandas() -> Optional[Any]:
"""Return pandas module or None if not installed."""
try:
return importlib.import_module('pandas')
except ImportError:
return None
@lru_cache(maxsize=None)
def get_polars() -> Optional[Any]:
"""Return polars module or None if not installed."""
try:
return importlib.import_module('polars')
except ImportError:
return None
@lru_cache(maxsize=None)
def get_pyarrow() -> Optional[Any]:
"""Return pyarrow module or None if not installed."""
try:
return importlib.import_module('pyarrow')
except ImportError:
return None
import logging
from functools import lru_cache
from typing import Any
from typing import Optional
logger = logging.getLogger(__name__)
def _import_optional_module(name: str) -> Optional[Any]:
"""Return imported optional module or None if it cannot be imported."""
try:
return importlib.import_module(name)
except (ImportError, OSError) as exc:
logger.debug('Optional dependency %r could not be imported: %s', name, exc)
return None
@lru_cache(maxsize=None)
def get_numpy() -> Optional[Any]:
"""Return numpy module or None if not installed."""
return _import_optional_module('numpy')
@lru_cache(maxsize=None)
def get_pandas() -> Optional[Any]:
"""Return pandas module or None if not installed."""
return _import_optional_module('pandas')
@lru_cache(maxsize=None)
def get_polars() -> Optional[Any]:
"""Return polars module or None if not installed."""
return _import_optional_module('polars')
@lru_cache(maxsize=None)
def get_pyarrow() -> Optional[Any]:
"""Return pyarrow module or None if not installed."""
return _import_optional_module('pyarrow')

Copilot uses AI. Check for mistakes.
Comment on lines +140 to +150
def _pack_time(td: _dt.timedelta) -> int:
"""Pack a timedelta into int64 per rowdat_1 spec."""
total_us = int(td.total_seconds() * 1_000_000)
sign = -1 if total_us < 0 else 1
total_us = abs(total_us)
us = total_us % 1_000_000
total_secs = total_us // 1_000_000
ss = total_secs % 60
mm = (total_secs // 60) % 60
hh = total_secs // 3600
return sign * (hh * 10000 + mm * 100 + ss) * 1_000_000 + (sign * us)
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_pack_time uses td.total_seconds() (float) to derive microseconds, which can lose precision for large timedeltas and round-trip incorrectly. Prefer integer arithmetic based on td.days / td.seconds / td.microseconds to compute total microseconds exactly (and keep sign handling consistent).

Copilot uses AI. Check for mistakes.
Comment thread singlestoredb/functions/ext/plugin/registry.py
…ocol hardening

Fix PyDict_SetItem key reference leaks in load_rowdat_1_numpy by creating
temporary key objects and decrementing after use (7 call sites on hot path).

Remove NEWDECIMAL (246) from string_types so decimal_types handler is
reachable, returning decimal.Decimal instead of strings. Fix _pack_time
to use integer arithmetic instead of float total_seconds(). Reject
datetime.time UDF annotations with a clear TypeError (timedelta required).
Normalize VECTOR element_type to uppercase before SQL emission.

Add recvmsg partial-read check in plugin handshake to prevent protocol
desync. Validate socket path before unlink in _bind_socket to prevent
arbitrary file deletion. Use private module namespace for dynamic UDF
registration instead of __main__.

Broaden lazy import exception handling to catch OSError for WASM/WASI
environments where optional deps may not raise ImportError.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Required since Python 3.12 for '#' format codes in PyArg_ParseTuple.
Without this, mmap_write and other C accelerator functions fail with
"PY_SSIZE_T_CLEAN macro must be defined for '#' formats".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The _discover_udf_functions() method required user plugin modules to
import the WASM-specific Plugin class for their @udf functions to be
found. This broke native plugin servers where Plugin is irrelevant.
Replace the Plugin identity check with a direct scan for the
_singlestoredb_attrs marker set by @udf.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread accel.c Outdated
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread resources/build_wasm.sh
@@ -0,0 +1,41 @@
#!/bin/bash

set -eou pipefail
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WASM build script exits on startup

Medium Severity

The script uses set -eou pipefail, which parses -o incorrectly and treats u as an option name. Bash exits before running the build steps, so resources/build_wasm.sh cannot execute its WASM packaging workflow.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5f84e84. Configure here.

Comment thread accel.c
Comment thread accel.c
} else {
py_str = PyUnicode_FromStringAndSize(data, (Py_ssize_t)i64);
data += i64;
if (!py_str) goto error;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing bounds check on decimal payload length

High Severity

The new DECIMAL decoding paths consume i64 bytes from data without validating remaining buffer size. Truncated or malformed rowdat_1 input can advance past the end of the byte buffer, causing out-of-bounds reads and potential crashes in load_rowdat_1 and load_rowdat_1_numpy.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5f84e84. Configure here.

logger.warning(
f'Short read on function name: expected {namelen}, '
f'got {len(msg)}',
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handshake assumes single recvmsg reads full name

Medium Severity

The handshake reads the function name with one conn.recvmsg(...) and treats any short read as protocol failure. On SOCK_STREAM, partial reads are valid, so legitimate requests can be rejected intermittently when the name payload arrives in multiple segments.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5f84e84. Configure here.

The pointer-to-object remap switch only handled string/blob types,
so DECIMAL, DATE, TIME, DATETIME, and TIMESTAMP columns returned
raw pointer integers instead of Python objects in numpy arrays.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 6 total unresolved issues (including 5 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit aeab5ff. Configure here.

Comment thread singlestoredb/functions/ext/plugin/connection.py
Extract FDs from ancdata eagerly before validation so the
try/finally cleanup covers all early-return paths after recvmsg.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
kesmit13 and others added 2 commits May 12, 2026 10:40
Instead of scanning all sys.modules, pass the imported plugin module
directly so discovery is targeted and avoids false positives from
infrastructure modules. Also hardens the sys.modules fallback path
against RuntimeError (dict changed during iteration) and TypeError
from problematic module attributes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Detect vectorized functions (numpy, pandas, polars, arrow) via
args_data_format in the function signature and route through the
existing C-accelerated load_rowdat_1_numpy/dump_rowdat_1_numpy
infrastructure instead of the per-row scalar path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants