Skip to content

Walmart's marketing service has asked you to build a machine learning model able to estimate the weekly sales in their stores, with the best precision possible on the predictions made. Such a model would help them understand better how the sales are influenced by economic indicators, and might be used to plan future marketing campaigns.

Notifications You must be signed in to change notification settings

Data-Science-Designer-and-Developer/Project_Walmart

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸͺ Walmart Weekly Sales Prediction

CDSD Certification Project β€” Linear & Regularized Regression


πŸ“‹ Executive Summary (click to expand)

Objective: Predict weekly sales for 45 Walmart stores to optimize inventory, marketing campaigns, and minimize overfitting.

Target KPI: RΒ² β‰₯ 90% on unseen data

Dataset:

  • 6,435 weekly records, 45 stores, 7 features + temporal variables
  • Target: Weekly_Sales ($)
  • Preprocessing: outlier removal (Z-score 3Οƒ), temporal feature engineering, 5,912 clean rows, 80 features

Pipeline Highlights:

  • ColumnTransformer + GridSearchCV
  • Numerical: KNNImputer β†’ StandardScaler
  • Categorical: OneHotEncoder (handle_unknown='ignore')
  • Target leakage fully prevented

Models Evaluated: Linear Regression, Ridge (Ξ±=0.01), Lasso (Ξ±=500)
Validation: Train/Test split + 5-fold CV


πŸ”¬ Model Evaluation & Results
Model RΒ² Train RΒ² Test Overfit RMSE MAE
Linear Regression 0.9714 0.9640 0.0074 130,948 103,671
Ridge (Ξ±=0.01) 0.9713 0.9630 0.0083 132,698 104,789
Lasso (Ξ±=500) 0.9708 0.9634 0.0073 131,977 102,517

Chosen model: Lasso Regression

  • Excellent predictive performance
  • Minimal overfitting
  • Sparse coefficients (~60% zeroed)
  • Improved interpretability for business stakeholders

πŸ“Š Key Business Insights
Insight Impact Recommended Action
Store dominance Top 10 stores = 45% total sales Focus inventory on high performers
Holiday effect +22% sales Pre-stock 2–3 weeks before holidays
Economic sensitivity Sales negatively correlated with unemployment Adjust promotions during downturns
Seasonality Nov/Dec peaks Plan staffing & marketing campaigns

πŸ’° Estimated annual business impact: ~$120M (forecast accuracy + inventory & holiday optimization)


πŸ› οΈ Production-Ready Pipeline
  • ColumnTransformer + GridSearchCV
  • Pipeline export: preprocessor.pkl, lasso_model.pkl
  • FastAPI endpoint: POST /predict_sales β†’ store-specific weekly forecast
  • Docker / AWS Lambda ready (<100ms inference)
  • Drift monitoring: retrain automatically if RΒ² < 90%

βœ… CDSD Certification Coverage
  • EDA & preprocessing
  • Linear regression baseline
  • Regularized models (Ridge & Lasso)
  • Cross-validation & overfitting control
  • Feature importance & business interpretation
  • Production-ready ML pipeline & deployment artifacts

πŸš€ Quick Start
# Clone the repository
git clone https://github.com/Data-Science-Designer-and-Developer/Project_Walmart.git
cd Project_Walmart

# Install dependencies
pip install -r requirements.txt

# Run the notebook
jupyter notebook
<<<<<<< HEAD
  1. Run the notebook sequentially
  2. Use deploy_pipeline.py to generate production artifacts (.pkl)
  3. Use predict.py to forecast store sales

πŸ‘¨β€πŸ’» Author

Dreipfelt β€” CDSD Data Science Certification Candidate GitHub: https://github.com/Dreipfelt

About

Walmart's marketing service has asked you to build a machine learning model able to estimate the weekly sales in their stores, with the best precision possible on the predictions made. Such a model would help them understand better how the sales are influenced by economic indicators, and might be used to plan future marketing campaigns.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published