-
Notifications
You must be signed in to change notification settings - Fork 74
Update normalization parameters and add estimator params validation #210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
567b241
fd6e4a5
ac310c1
7df7818
5536db7
a2a7515
a627a86
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,7 +23,12 @@ | |
| import pandas as pd | ||
| from scipy.sparse import csr_matrix | ||
| from sklearn.model_selection import train_test_split | ||
| from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder | ||
| from sklearn.preprocessing import ( | ||
| MinMaxScaler, | ||
| OneHotEncoder, | ||
| OrdinalEncoder, | ||
| StandardScaler, | ||
| ) | ||
|
|
||
| from ..utils.custom_types import Array | ||
| from ..utils.logger import logger | ||
|
|
@@ -167,7 +172,7 @@ def preprocess_x( | |
| x: Array, | ||
| replace_nan="auto", | ||
| category_encoding="ordinal", | ||
| normalize=False, | ||
| normalize=None, | ||
| force_for_sparse=True, | ||
| **kwargs, | ||
| ) -> Array: | ||
|
|
@@ -219,9 +224,18 @@ def preprocess_x( | |
| pass | ||
| else: | ||
| logger.warning(f'Unknown "{category_encoding}" category encoding type.') | ||
| # Mean-Standard normalization | ||
| # Normalization | ||
| if normalize: | ||
| x = (x - x.mean()) / x.std() | ||
| if normalize == "standard": | ||
| scaler = StandardScaler(with_mean=True, with_std=True) | ||
| elif normalize == "mean": | ||
| scaler = StandardScaler(with_mean=True, with_std=False) | ||
| elif normalize == "minmax": | ||
| scaler = MinMaxScaler(feature_range=(0, 1)) | ||
| else: | ||
| logger.warning(f'Unknown "{normalize}" normalization type.') | ||
| if scaler is not None and return_type == pd.DataFrame: | ||
| return pd.DataFrame(scaler.fit_transform(x), columns=x.columns, index=x.index) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't this make it ignore
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently it works correctly for all return_types as intermediate data is always represented in pandas format. However, this conversion is indeed redundant if return_type is not a pandas dataframe
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is that because it then goes through train_test_split? Isn't that step optional? |
||
| if return_type == np.ndarray: | ||
| return x.values | ||
| else: | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.