Skip to content

Fix handling of Python scalars in get_incompatible_dtype for NumPy 2.…#3143

Open
LuigiGonnella wants to merge 1 commit intoactiveloopai:release/3.9.52from
LuigiGonnella:fix/numpy2-nep50-get-incompatible-dtype
Open

Fix handling of Python scalars in get_incompatible_dtype for NumPy 2.…#3143
LuigiGonnella wants to merge 1 commit intoactiveloopai:release/3.9.52from
LuigiGonnella:fix/numpy2-nep50-get-incompatible-dtype

Conversation

@LuigiGonnella
Copy link

NumPy 2.x (NEP 50) compatibility fix in get_incompatible_dtype

🚀 🚀 Pull Request

Impact

  • Bug fix (non-breaking change which fixes expected existing functionality)

Description

NumPy 2.0 introduced NEP 50, which removed support for passing raw Python scalars (int, float, bool) as the first argument to np.can_cast. Under NumPy 2.x, calling np.can_cast(1, np.float32) raises a TypeError instead of returning a boolean.

This caused get_incompatible_dtype in deeplake/util/casting.py to crash whenever a Python scalar sample was validated against a tensor dtype.

Steps to reproduce (NumPy 2.x):

import numpy as np
np.can_cast(1, np.float32)  # TypeError: Cannot cast scalar of type int

Expected behavior: dtype compatibility check succeeds and returns True/False.
Actual behavior: TypeError: Cannot cast scalar of type int

Fix:

  • Added a module-level _PYTHON_TYPE_TO_NUMPY_DTYPE dict mapping Python scalar types (float, int, bool, complex) to their canonical NumPy dtype equivalents.
  • Refactored the scalar branch of get_incompatible_dtype to resolve Python scalars to a NumPy dtype first, then call np.can_cast(from_dtype, to_dtype, casting="same_kind") — an API that is valid in both NumPy 1.x and 2.x.

Things to be aware of

  • casting="same_kind" is used instead of the default "unsafe" to reject genuinely incompatible casts (e.g. float → int) while still allowing safe same-kind widening (e.g. float32 → float64). The downstream intelligent_cast call handles the actual value-level conversion.
  • NumPy scalars and arrays that already carry a .dtype attribute continue to be handled via their .dtype directly, unchanged in behavior.
  • No change to public API or existing function signatures.

Things to worry about

  • The _PYTHON_TYPE_TO_NUMPY_DTYPE mapping uses platform-independent dtypes (np.int64, np.float64). On 32-bit platforms or Windows, the default int dtype from np.array(0).dtype may be int32. This is intentional — we are doing a type-level cast check, not a value-level one, so using a wider dtype is conservative and safe.
  • _get_bigger_dtype at the top of the file still uses np.object (deprecated in NumPy 1.20, removed in 1.24). This is a pre-existing issue and out of scope for this PR.

Additional Context

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 3, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants