A polished, 10-notebook university-quality introduction to Python for Data Science and Machine Learning. From "What's a variable?" to building, evaluating, and interpreting your own ML models in scikit-learn.
Click any badge below to launch the notebook in Colab with zero setup β no install, no Python, no terminal. Just a Google account.
Notebook 1 β Python Basics (25β30 min) Variables, data types, arithmetic, strings, f-strings, a first applied calculation.
Notebook 2 β Control Structures (30β35 min)
if / elif / else,for,while,break / continue,try / except.
Notebook 3 β Lists and Sequences (30β35 min) Indexing, slicing, list comprehensions, tuples, strings as sequences, nested lists.
Notebook 4 β Dictionaries (30β35 min) Key-value lookup, nested dicts, list of dicts, counting / grouping, JSON.
Notebook 5 β Pandas Preview (25β30 min) Series, DataFrames, indexing with
loc/iloc, filtering,groupby.Notebook 6 β Functions and Modules (30β35 min) Parameters, defaults,
*args/**kwargs, scope, docstrings, type hints, imports.Notebook 7 β NumPy Fundamentals (30β40 min) Arrays, vectorisation, broadcasting, axes, reproducible randomness.
Notebook 8 β Matplotlib Basics (35β45 min) Figure / Axes model, line / bar / scatter / hist / box / heatmap, subplots, annotations.
Notebook 9 β Scikit-Learn Basics (60β75 min) Train/test split, classification + regression, pipelines, metrics,
GridSearchCV, feature importance.
Notebook 10 β Capstone: Weather Data Analysis (60β90 min) Full end-to-end project: data, EDA, dashboard, regression, executive summary.
π‘ Pro tip. Google Colab provides a free Python environment with all course libraries (NumPy, pandas, matplotlib, scikit-learn) pre-installed.
Total time: ~6 hours of focused learning, plus 2β4 hours of practice.
This course is designed for complete beginners who want to use Python specifically for data science, machine learning, and analytical work β not generic application development.
You will benefit if you are:
- A business professional who wants to move from spreadsheets to code.
- A student in a quantitative field (statistics, economics, biology, physics, social science).
- A researcher who wants to script analyses instead of clicking through dropdowns.
- A career-switcher targeting data analyst / data scientist / ML engineer roles.
- A developer in another stack adding "data" to your skillset.
Prerequisites: none. A laptop, a browser, and curiosity are enough.
π― Data-science focused from day one. Every concept connects to real workflows. List slicing is taught as X[0:3] β the same syntax you'll see in scikit-learn. Dictionaries are taught as JSON-shaped records you'll meet in every API.
π§ Intuition before syntax. Each section opens with why a concept matters before showing how it works. Analogies, diagrams, mental models β not just code.
π οΈ Real-world, not toy. Financial calculations, weather analysis, customer data, machine-learning pipelines β examples that mirror what data scientists actually do.
π§© Modular & progressive. Notebooks build on each other. By Notebook 7 you're vectorising in NumPy; by Notebook 9 you're training random forests with cross-validation; by Notebook 10 you're shipping a small end-to-end project.
π‘ Exercises with full solutions. Every notebook has 5+ exercises, including a "Debug me π" β and every exercise has a detailed solution that explains the reasoning, not just the code.
π Polished visuals. Charts are clean, professionally styled, and chosen for didactic value.
By the end of this course you will be able to:
- Write clean Python code with appropriate data structures, control flow, and functions.
- Manipulate tabular data with pandas and numerical data with NumPy.
- Build clear, publication-quality visualisations with matplotlib.
- Train, evaluate, and interpret classification and regression models in scikit-learn.
- Apply the full ML workflow β split, fit, evaluate, tune β without making the classic beginner mistakes.
- Communicate findings via a short executive summary and a 2Γ2 dashboard.
Click any of the Open in Colab badges above. Sign in with a Google account. That's it.
If you'd rather have a local environment:
# clone the repo
git clone https://github.com/BridgingAISocietySummerSchools/Data-Science-AI-Python-Course.git
cd Data-Science-AI-Python-Course
# either: one-shot setup script
./setup.sh
# or: manual venv
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
jupyter notebookStart with 01_python_basics.ipynb and work through in order.
π Data-Science-AI-Python-Course/
βββ π 01_python_basics.ipynb # Variables, types, arithmetic, f-strings
βββ π 02_control_structures.ipynb # if/elif/else, loops, try/except
βββ π 03_lists_data_structures.ipynb # Lists, indexing, slicing, comprehensions
βββ π 04_dictionaries_advanced.ipynb # Dictionaries, nested data, JSON
βββ π 05_pandas_preview.ipynb # DataFrames, groupby, plotting
βββ π 06_functions_modules.ipynb # Functions, defaults, scope, imports
βββ π 07_numpy_fundamentals.ipynb # Arrays, vectorisation, broadcasting
βββ π 08_matplotlib_basics.ipynb # Professional plotting
βββ π 09_scikit_learn_basics.ipynb # Classification + regression
βββ π 10_capstone_project.ipynb # End-to-end weather analysis
βββ π README.md # β you are here
βββ π Python Data Science Cheat Sheet.md # Quick syntax reference
βββ π CHANGELOG.md # Version history
βββ π CONTRIBUTING.md # How to contribute
βββ π requirements.txt # Python dependencies
βββ π requirements-dev.txt # Dev-only dependencies
βββ π οΈ setup.sh # One-shot local setup
The notebooks are designed to be done in order. Each notebook assumes you've internalised the previous ones.
Recommended order
1 βββΊ 2 βββΊ 3 βββΊ 4 βββΊ 5 βββΊ 6 βββΊ 7 βββΊ 8 βββΊ 9 βββΊ 10
β β β β β β β β β β
β β β β β β β β β ββ π Capstone
β β β β β β β β ββββββββ ML models
β β β β β β β ββββββββββββ Visualisation
β β β β β β ββββββββββββββββ NumPy arrays
β β β β β ββββββββββββββββββββ Functions / modules
β β β β ββββββββββββββββββββββββ First pandas
β β β ββββββββββββββββββββββββββββ Dictionaries / JSON
β β ββββββββββββββββββββββββββββββββ Lists & slicing
β ββββββββββββββββββββββββββββββββββββ Decisions & loops
βββββββββββββββββββββββββββββββββββββ Python fundamentals
A typical schedule:
| Pace | Plan |
|---|---|
| 1 hour / day | 1 notebook per day β done in ~2 weeks |
| 3 hours / weekend | 3 notebooks per weekend β done in ~3 weekends |
| Bootcamp weekend | All 10 in 2 days (~6 hours pure + breaks) |
Every notebook follows the same modern structure:
- Header β module, time estimate, learning objectives, prerequisites.
- Sections with intuition first, then code, then a brief reflection on the output.
- Small examples β larger applied examples β exercises.
- 5 + practice exercises including at least one "Debug me π".
- Complete solutions with explanations (collapsed in
<details>). - Key takeaways + self-assessment checklist + next-step pointer.
Imports fail in Colab. Almost never happens β Colab has the full stack. If it does, run !pip install <package> in a fresh cell.
Matplotlib plots don't show locally. Make sure you're running a Jupyter notebook (not a .py file). In some setups you may need %matplotlib inline in the first cell.
Jupyter won't start locally. pip install --upgrade jupyter usually fixes it. Fresh venv: rm -rf venv && python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt.
scikit-learn load_boston errors. It was removed in scikit-learn 1.2 β Notebook 9 uses the California Housing dataset instead.
The full list lives in requirements.txt. The pinned core:
numpy β₯ 1.24pandas β₯ 2.0matplotlib β₯ 3.7scikit-learn β₯ 1.3scipy β₯ 1.10seaborn β₯ 0.12(optional, used briefly)jupyter β₯ 1.0
Python 3.10 or newer is recommended (we use PEP 604 union types).
- π Official Python tutorial
- π’ NumPy user guide
- πΌ Pandas getting-started
- π Matplotlib tutorials
- π€ Scikit-learn user guide
- π Kaggle Learn β free, project-based DS courses
- π Hands-On Machine Learning β A. GΓ©ron, the gold-standard book
π° Related reading: Learn Python for Data Science
We welcome contributions! Check CONTRIBUTING.md for guidelines. Found a typo, a bug, or a clearer way to explain something? Open an issue or a PR.
MIT β see LICENSE.
Every expert was once a beginner. The only difference is they started.
Welcome β and have fun. What will you build with your data-science skills? π
Made with β€οΈ for the Data Science community