[MNT] add support for pandas 3 by solegalli · Pull Request #884 · feature-engine/feature_engine

solegalli · 2026-01-28T03:41:38Z

ankitlade12 · 2026-01-28T23:16:44Z

Hey @solegalli, I have fixed everything related to pandas #885 can you please take a look at it.

solegalli · 2026-01-29T03:51:05Z

Will do. Thanks a lot! I only have internet again from Saturday though. I ll do my best o have it ready by beginning of next week.

* fix: Pandas 3 compatibility - robust dtype checks and test fixes - Fix UnboundLocalError in _variable_type_checks.py by initializing is_cat/is_dt - Add robust dtype checking using both is_object_dtype and is_string_dtype - Update find_variables.py with same robust logic for consistency - Fix warning count assertions in encoder tests (Pandas 3 adds extra deprecation warnings) - Fix floating point precision assertion in recursive feature elimination test - Apply ruff formatting and fix linting errors - All 1900 tests passing * fix: Remove whitespace before colon in slice notation (flake8 E203) * feat: finalize Pandas 3 compatibility fixes and test updates * style: fix flake8 line length and linting issues * style: fix remaining flake8 C416 issue * Fix Pandas 3 regressions in check_y, _check_contains_inf, and StringSimilarityEncoder * Fix E501 line too long in dataframe_checks.py * Fix StringSimilarityEncoder NaN issues and fragile test assertions * fix: Pandas 3 stability - mock datasets and fix FutureWarnings * style: fix flake8 linting errors E501, E302, E305, SIM102 * test: improve patch coverage for Pandas 3 stability fixes * style: fix E501 line too long in similarity encoder tests * style: revert unrelated flake8 and formatting changes * fix: restore Pandas 3 test logic and silence Pandas4Warning * style: move numpy import to top of math_features.py * style: fix spacing in MatchVariables verbose error message * test: revert dynamic std values to hardcoded values in MathFeatures tests * style: combine imports in _variable_type_checks.py * refactor: centralize is_object function and use it across the codebase * refactor: further simplify check_y dtype checks using is_object * revert: remove unnecessary complexity in _check_contains_inf and associated tests * docs: rename _normalize_func to _map_unnamed_func_to_str and add comments * perf: optimize casting logic in SimilarityEncoder * fix: address remaining code review feedback - follow sklearn convention for init params - make tests conditional on pandas version - restore encoder_dict_ assertion * style: fix linting and follow sklearn convention for MathFeatures * revert: remove california housing mock from conftest.py * revert: restore original error message assertion in DatetimeFeatures test * fix: use robust datetime normalization and flexible error assertions in tests

codecov · 2026-02-06T21:08:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.22%. Comparing base (8282fd4) to head (5e6338c).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #884      +/-   ##
==========================================
+ Coverage   98.11%   98.22%   +0.10%     
==========================================
  Files         113      113              
  Lines        4829     4842      +13     
  Branches      768      768              
==========================================
+ Hits         4738     4756      +18     
+ Misses         56       55       -1     
+ Partials       35       31       -4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

solegalli · 2026-02-06T22:10:17Z

tests/test_encoding/test_similarity_encoder.py

+    assert (
+        "<NA>" in encoder.encoder_dict_["var_A"]
+        or "" in encoder.encoder_dict_["var_A"]
+    )


Hi @ankitlade12

Quick question: why do we need this line: "<NA>" in encoder.encoder_dict_["var_A"]?

This line alone ("" in encoder.encoder_dict_["var_A"]) seems enough to pass the test in both pandas 3 and 2.

Hi @solegalli,

You're absolutely right—the "<NA>" check is redundant in that assertion.

Because we use .mask(series.isna(), "") in the impute logic, any pd.NA or NaN value is guaranteed to be converted to an empty string "" before the vocabulary is learned. Therefore, encoder_dict_ will only ever contain "" for missing values, and the check for "<NA>" can be removed to keep the test cleaner.

I've updated the local tests to reflect this and confirmed that checking for "" alone is sufficient for both pandas 2 and 3.

solegalli · 2026-02-06T22:43:46Z

tests/test_encoding/test_similarity_encoder.py

-        "var_A": ["B", "D", "G", "A", "C", "E", "F"],
-        "var_B": ["A", "D", "B", "G", "C", "E", "F"],
-        "var_C": ["C", "D", "B", "G", "A", "E", "F"],
-    }


Hi @ankitlade12 @glevv

There is a discrepancy on how SimilarityEncoder treats None, respect of np.nan or pd.NA.

Briefly, when missing_values is raise or impute, the treatment is the same. But when missing_values is ignore, SimilarityEncoder will not include the nan value in the encoder_dict if np.nan, or pd.NA, bit it will if None.

Is this something we are happy with?

Apparently None is handled differently in pandas >=3 vs older versions. In >3, is handled as np.nan and pd.NA. In older versions it's handled as string.

Hi @solegalli,

Great catch on the None discrepancy. You’re right—it’s caused by None being converted to the literal string "None" in older pandas versions, which slips through our current .drop(["nan", "<NA>"]) filter as it isn't recognized as a "null" string.

To make this consistent across all versions, I propose we normalize it by using .dropna() before converting to a string in the ignore case:

# Proposed consistent fix in fit(): self.encoder_dict_[var] = ( X[var] .dropna() # Consistently removes None, np.nan, pd.NA first .astype(str) # Then convert remaining categories to string .value_counts() .head(self.top_categories) .index.tolist() )

Thank you. That resolves things :)

tests/test_encoding/test_similarity_encoder.py

solegalli added 4 commits January 27, 2026 21:35

update dt functions

d090224

expand tests

6ff27aa

expand tests

9d44303

update fpr new pandas behaviour

de4d663

ankitlade12 and others added 2 commits February 6, 2026 15:56

remove extra learned parameter, pass function to transform

6099193

solegalli added 2 commits February 6, 2026 16:23

rolled back mapping function

8c7b35e

refactor creation of array of nan

ff45161

solegalli commented Feb 6, 2026

View reviewed changes

expand test to cover different expressions of nan values

2a6775d

solegalli commented Feb 6, 2026

View reviewed changes

tests/test_encoding/test_similarity_encoder.py Show resolved Hide resolved

solegalli added 11 commits February 6, 2026 17:52

refactor match columns update

1f526cf

refactor code variable checks

9368b17

refactor inf tests

564a84f

split test by pandas version

2541410

solve additional errors specific to pandas 3

b57a600

add pandas version tests to tox.ini

75fde70

move versioned tests to python 12

3488426

add tests to circleci config

8758cdf

reformat circleci config"

a7ab1a1

test revering pandas version on None

c272f86

move dropna a level up when missing values ignore in string similarity

2c0c51c

solegalli force-pushed the mnt-pandas3 branch from b2b248d to 2c0c51c Compare February 7, 2026 02:19

solegalli added 3 commits February 7, 2026 06:48

refactor condition logic for codecoverage

1f8e4e6

fix decreased coverage

9406e1b

revert math features to main

c516db2

remove test from math features

5e6338c

solegalli merged commit aa60b23 into main Feb 7, 2026
14 checks passed

solegalli deleted the mnt-pandas3 branch February 7, 2026 12:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[MNT] add support for pandas 3#884

[MNT] add support for pandas 3#884
solegalli merged 24 commits intomainfrom
mnt-pandas3

solegalli commented Jan 28, 2026

Uh oh!

ankitlade12 commented Jan 28, 2026

Uh oh!

solegalli commented Jan 29, 2026 via email •

edited

Loading

Uh oh!

codecov bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

solegalli Feb 6, 2026

Uh oh!

ankitlade12 Feb 7, 2026

Uh oh!

solegalli Feb 6, 2026

Uh oh!

solegalli Feb 6, 2026

Uh oh!

ankitlade12 Feb 7, 2026

Uh oh!

solegalli Feb 7, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

solegalli commented Jan 28, 2026

Uh oh!

ankitlade12 commented Jan 28, 2026

Uh oh!

solegalli commented Jan 29, 2026 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

solegalli Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

ankitlade12 Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

solegalli Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

solegalli Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

ankitlade12 Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

solegalli Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

solegalli commented Jan 29, 2026 via email •

edited

Loading

codecov bot commented Feb 6, 2026 •

edited

Loading