fix: Pandas 3 compatibility - robust dtype checks and test fixes by ankitlade12 · Pull Request #885 · feature-engine/feature_engine

ankitlade12 · 2026-01-28T21:39:13Z

Fix UnboundLocalError in _variable_type_checks.py by initializing is_cat/is_dt
Add robust dtype checking using both is_object_dtype and is_string_dtype
Update find_variables.py with same robust logic for consistency
Fix warning count assertions in encoder tests (Pandas 3 adds extra deprecation warnings)
Fix floating point precision assertion in recursive feature elimination test
Apply ruff formatting and fix linting errors
All 1900 tests passing

- Fix UnboundLocalError in _variable_type_checks.py by initializing is_cat/is_dt - Add robust dtype checking using both is_object_dtype and is_string_dtype - Update find_variables.py with same robust logic for consistency - Fix warning count assertions in encoder tests (Pandas 3 adds extra deprecation warnings) - Fix floating point precision assertion in recursive feature elimination test - Apply ruff formatting and fix linting errors - All 1900 tests passing

…imilarityEncoder

codecov · 2026-01-28T23:16:52Z

Codecov Report

❌ Patch coverage is 98.27586% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (mnt-pandas3@de4d663). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
feature_engine/creation/math_features.py	94.44%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             mnt-pandas3     #885   +/-   ##
==============================================
  Coverage               ?   98.20%           
==============================================
  Files                  ?      113           
  Lines                  ?     4853           
  Branches               ?      770           
==============================================
  Hits                   ?     4766           
  Misses                 ?       55           
  Partials               ?       32

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

solegalli

Hi @ankitlade12

Thank you so much for fixing the pandas compatibility issue. Much appreciated :)

Can it be that you applied flake8 to the entire code base and then committed all modified files? We need to remove all files unrelated to the pandas fix changes.

For some reason flake8 does different things on different computers. I generally suggest applying flake8 to modified files only, to avoid this kind of behavior.

Could you update the PR leaving only the changed files that solve the pandas issue?

Thank you very much!

ankitlade12 · 2026-02-03T04:32:30Z

Thank you so much for fixing the pandas compatibility issue. Much appreciated :)

Can it be that you applied flake8 to the entire code base and then committed all modified files? We need to remove all files unrelated to the pandas fix changes.

For some reason flake8 does different things on different computers. I generally suggest applying flake8 to modified files only, to avoid this kind of behavior.

Could you update the PR leaving only the changed files that solve the pandas issue?

Thank you very much!

ankitlade12 · 2026-02-03T04:36:21Z

Hi @solegalli,

Thank you for the feedback! You're right—I accidentally applied styling fixes to the entire codebase. I've now cleaned up the PR and reverted all unrelated changes.

The PR now only includes the 18 files essential for the Pandas 3 compatibility mission. These cover:

Core logic fixes in dataframe_checks.py, similarity_encoder.py, and variable handling.
Silencing a new Pandas4Warning in the forecasting transformers (lag_features.py and window_features.py).
Necessary infrastructure updates in tox.ini and .circleci/config.yml to support the new testing environments.
Restored compatibility logic in test_mean_encoder.py, test_ordinal_encoder.py, and test_woe_encoder.py to match Pandas 3's warning and dtype behavior.

All relevant tests are passing locally. Please let me know if there's anything else you'd like me to adjust!

feature_engine/creation/math_features.py

feature_engine/preprocessing/match_columns.py

tests/test_creation/test_math_features.py

feature_engine/variable_handling/_variable_type_checks.py

feature_engine/variable_handling/find_variables.py

feature_engine/variable_handling/_variable_type_checks.py

feature_engine/dataframe_checks.py

solegalli · 2026-02-03T12:54:41Z

tests/test_encoding/test_similarity_encoder.py

+    X = encoder.fit_transform(df)
+    assert (X.isna().sum() == 0).all(axis=None)
+    assert "nan" in encoder.encoder_dict_["var_A"]
+    assert "<NA>" in encoder.encoder_dict_["var_A"]


we also need to assert that the resulting dataframe is the one we expect.

I am a bit confused, does this resolve a pandas 3 release backward compatibility issue? Or it's enhancing the functionality of stringsimilarity? If the latter, we need to pass all related files to a new PR

You're right to call this out. Let me clarify:

Pandas 3 fix: The change from astype(object).fillna().astype(str) to astype(str).mask(isna(), "") is the Pandas 3 compatibility fix (handles Categorical dtype and is faster).

Coverage tests: The new tests (test_string_dtype_with_pd_na, test_string_dtype_with_literal_nan_strings) and the "nan"/"<NA>" handling are for robustness/coverage.

I'll add the proper dataframe assertions to verify the expected output and keep everything in the same PR for simplicity.

tests/test_encoding/test_similarity_encoder.py

tests/test_preprocessing/test_match_columns.py

tests/test_wrappers/test_sklearn_wrapper.py

solegalli · 2026-02-03T13:03:01Z

tests/conftest.py

 import numpy as np
 import pandas as pd
 import pytest
+from unittest.mock import patch


I am not sure I want unittest to be a dependency. Could we find a different solution?

unittest.mock is part of Python's standard library, not an external package, so it doesn't add any dependencies. However, if you prefer using pytest's native approach, I can switch to using a conftest.py autouse fixture with monkeypatch.

The alternative would look like:

@pytest.fixture(autouse=True, scope="session") def mock_california_housing(monkeypatch): monkeypatch.setattr( "sklearn.datasets.fetch_california_housing", mock_fetch_california_housing )

tests/conftest.py

solegalli

Hi @ankitlade12

This is an incredible amount of work! Thank you so much! I really appreciate it.

I went through all the changes and had a few questions about some of them. A few changes seem to address things beyond the pandas 3 release; I think those would be better handled in a separate PR.

For tests where results differ by pandas version, it would be great to explicitly assert the expected behavior for < 3 versus >= 3, so we can clean that up later.

For now, we should keep compatibility with older versions. After this is merged, I’ll add tests for older pandas versions on CircleCI, and we’ll need those to pass as well.

…ests

…ciated tests

…ents

feature_engine/creation/math_features.py

…on for init params - make tests conditional on pandas version - restore encoder_dict_ assertion

…test

ankitlade12 · 2026-02-04T16:19:27Z

Hi @solegalli,

I have updated all the changes as requested across the PRs. Once the pandas-related compatibility issues are fully sorted out, we can proceed to merge the two branches (PR #880 and PR #879).

Thanks for the thorough review!

…in tests

solegalli · 2026-02-06T20:38:16Z

feature_engine/creation/math_features.py

-                        "The number of new feature names must coincide with the number "
-                        "of functions."
-                    )
+            elif len(new_variables_names) != 1:


Neat! Thank you!

* update dt functions * expand tests * expand tests * update fpr new pandas behaviour * fix: Pandas 3 compatibility - robust dtype checks and test fixes (#885) * fix: Pandas 3 compatibility - robust dtype checks and test fixes - Fix UnboundLocalError in _variable_type_checks.py by initializing is_cat/is_dt - Add robust dtype checking using both is_object_dtype and is_string_dtype - Update find_variables.py with same robust logic for consistency - Fix warning count assertions in encoder tests (Pandas 3 adds extra deprecation warnings) - Fix floating point precision assertion in recursive feature elimination test - Apply ruff formatting and fix linting errors - All 1900 tests passing * fix: Remove whitespace before colon in slice notation (flake8 E203) * feat: finalize Pandas 3 compatibility fixes and test updates * style: fix flake8 line length and linting issues * style: fix remaining flake8 C416 issue * Fix Pandas 3 regressions in check_y, _check_contains_inf, and StringSimilarityEncoder * Fix E501 line too long in dataframe_checks.py * Fix StringSimilarityEncoder NaN issues and fragile test assertions * fix: Pandas 3 stability - mock datasets and fix FutureWarnings * style: fix flake8 linting errors E501, E302, E305, SIM102 * test: improve patch coverage for Pandas 3 stability fixes * style: fix E501 line too long in similarity encoder tests * style: revert unrelated flake8 and formatting changes * fix: restore Pandas 3 test logic and silence Pandas4Warning * style: move numpy import to top of math_features.py * style: fix spacing in MatchVariables verbose error message * test: revert dynamic std values to hardcoded values in MathFeatures tests * style: combine imports in _variable_type_checks.py * refactor: centralize is_object function and use it across the codebase * refactor: further simplify check_y dtype checks using is_object * revert: remove unnecessary complexity in _check_contains_inf and associated tests * docs: rename _normalize_func to _map_unnamed_func_to_str and add comments * perf: optimize casting logic in SimilarityEncoder * fix: address remaining code review feedback - follow sklearn convention for init params - make tests conditional on pandas version - restore encoder_dict_ assertion * style: fix linting and follow sklearn convention for MathFeatures * revert: remove california housing mock from conftest.py * revert: restore original error message assertion in DatetimeFeatures test * fix: use robust datetime normalization and flexible error assertions in tests * remove extra learned parameter, pass function to transform * rolled back mapping function * refactor creation of array of nan * expand test to cover different expressions of nan values * refactor match columns update * refactor code variable checks * refactor inf tests * split test by pandas version * solve additional errors specific to pandas 3 * add pandas version tests to tox.ini * move versioned tests to python 12 * add tests to circleci config * reformat circleci config" * test revering pandas version on None * move dropna a level up when missing values ignore in string similarity * refactor condition logic for codecoverage * fix decreased coverage * revert math features to main * remove test from math features --------- Co-authored-by: Ankit Hemant Lade <ankitlade12@gmail.com>

ankitlade12 added 10 commits January 28, 2026 15:37

fix: Remove whitespace before colon in slice notation (flake8 E203)

e0c3292

feat: finalize Pandas 3 compatibility fixes and test updates

ccbfa05

style: fix flake8 line length and linting issues

fd43124

style: fix remaining flake8 C416 issue

8367d4a

Fix Pandas 3 regressions in check_y, _check_contains_inf, and StringS…

3225500

…imilarityEncoder

Fix E501 line too long in dataframe_checks.py

bde0b9b

Fix StringSimilarityEncoder NaN issues and fragile test assertions

dedf500

fix: Pandas 3 stability - mock datasets and fix FutureWarnings

765e102

style: fix flake8 linting errors E501, E302, E305, SIM102

28894c5

ankitlade12 mentioned this pull request Jan 28, 2026

[MNT] add support for pandas 3 #884

Merged

ankitlade12 added 2 commits January 28, 2026 17:24

test: improve patch coverage for Pandas 3 stability fixes

08821a6

style: fix E501 line too long in similarity encoder tests

972a4b7

mo1998 mentioned this pull request Feb 2, 2026

[BUG ] fix performance warning in AddMissingIndicator #887

Merged