You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I see the concept of Great Expectations as a kind of "proof of actions". As such, I think it's a suitable estimator to be included in scikit-learn pipelines, where you specify expectations for each column, and you specify if further undeclared columns are allowed.
You should be able to declare for each column:
Requirements (literal, smaller/greater than, length or even a callable).
Whether a warning or error occurs for that particular column if requirements are not met.
Response options of: callable (such as a mean, median or model imputation approach), or exclusion of the sample (provided option 2 is "warn" for that column).
Whether the response will occur if the value is null (as pipelines already often account for null values).
Quite often, imputation is a part of pipelines as a response to missing values. I see the use of Great Expectations as an estimator as a highly appropriate module.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I see the concept of Great Expectations as a kind of "proof of actions". As such, I think it's a suitable estimator to be included in scikit-learn pipelines, where you specify expectations for each column, and you specify if further undeclared columns are allowed.
You should be able to declare for each column:
Quite often, imputation is a part of pipelines as a response to missing values. I see the use of Great Expectations as an estimator as a highly appropriate module.
Let me know your thoughts!
Beta Was this translation helpful? Give feedback.
All reactions