Skip to content

BridgingAISocietySummerSchools/Data-Science-AI-Python-Course

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

29 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Learn Python: A Course Designed Specifically for Data Science and AI

Python Version Jupyter License: MIT Difficulty Focus

A polished, 10-notebook university-quality introduction to Python for Data Science and Machine Learning. From "What's a variable?" to building, evaluating, and interpreting your own ML models in scikit-learn.


πŸš€ Quick start β€” open any notebook in Google Colab

Click any badge below to launch the notebook in Colab with zero setup β€” no install, no Python, no terminal. Just a Google account.

🧱 Module 1 β€” Python Fundamentals

  • Open in Colab Notebook 1 β€” Python Basics (25–30 min) Variables, data types, arithmetic, strings, f-strings, a first applied calculation.
  • Open in Colab Notebook 2 β€” Control Structures (30–35 min) if / elif / else, for, while, break / continue, try / except.

πŸ“¦ Module 2 β€” Data Structures

  • Open in Colab Notebook 3 β€” Lists and Sequences (30–35 min) Indexing, slicing, list comprehensions, tuples, strings as sequences, nested lists.
  • Open in Colab Notebook 4 β€” Dictionaries (30–35 min) Key-value lookup, nested dicts, list of dicts, counting / grouping, JSON.

🧰 Module 3 β€” Data Science Libraries

  • Open in Colab Notebook 5 β€” Pandas Preview (25–30 min) Series, DataFrames, indexing with loc / iloc, filtering, groupby.
  • Open in Colab Notebook 6 β€” Functions and Modules (30–35 min) Parameters, defaults, *args / **kwargs, scope, docstrings, type hints, imports.
  • Open in Colab Notebook 7 β€” NumPy Fundamentals (30–40 min) Arrays, vectorisation, broadcasting, axes, reproducible randomness.
  • Open in Colab Notebook 8 β€” Matplotlib Basics (35–45 min) Figure / Axes model, line / bar / scatter / hist / box / heatmap, subplots, annotations.

πŸ€– Module 4 β€” Machine Learning

  • Open in Colab Notebook 9 β€” Scikit-Learn Basics (60–75 min) Train/test split, classification + regression, pipelines, metrics, GridSearchCV, feature importance.

πŸ† Capstone

  • Open in Colab Notebook 10 β€” Capstone: Weather Data Analysis (60–90 min) Full end-to-end project: data, EDA, dashboard, regression, executive summary.

πŸ’‘ Pro tip. Google Colab provides a free Python environment with all course libraries (NumPy, pandas, matplotlib, scikit-learn) pre-installed.

Total time: ~6 hours of focused learning, plus 2–4 hours of practice.


🎯 Who this course is for

This course is designed for complete beginners who want to use Python specifically for data science, machine learning, and analytical work β€” not generic application development.

You will benefit if you are:

  • A business professional who wants to move from spreadsheets to code.
  • A student in a quantitative field (statistics, economics, biology, physics, social science).
  • A researcher who wants to script analyses instead of clicking through dropdowns.
  • A career-switcher targeting data analyst / data scientist / ML engineer roles.
  • A developer in another stack adding "data" to your skillset.

Prerequisites: none. A laptop, a browser, and curiosity are enough.

🌟 What makes this course different?

🎯 Data-science focused from day one. Every concept connects to real workflows. List slicing is taught as X[0:3] β€” the same syntax you'll see in scikit-learn. Dictionaries are taught as JSON-shaped records you'll meet in every API.

🧠 Intuition before syntax. Each section opens with why a concept matters before showing how it works. Analogies, diagrams, mental models β€” not just code.

πŸ› οΈ Real-world, not toy. Financial calculations, weather analysis, customer data, machine-learning pipelines β€” examples that mirror what data scientists actually do.

🧩 Modular & progressive. Notebooks build on each other. By Notebook 7 you're vectorising in NumPy; by Notebook 9 you're training random forests with cross-validation; by Notebook 10 you're shipping a small end-to-end project.

πŸ’‘ Exercises with full solutions. Every notebook has 5+ exercises, including a "Debug me 🐞" β€” and every exercise has a detailed solution that explains the reasoning, not just the code.

πŸ“Š Polished visuals. Charts are clean, professionally styled, and chosen for didactic value.

πŸ“š Learning objectives

By the end of this course you will be able to:

  • Write clean Python code with appropriate data structures, control flow, and functions.
  • Manipulate tabular data with pandas and numerical data with NumPy.
  • Build clear, publication-quality visualisations with matplotlib.
  • Train, evaluate, and interpret classification and regression models in scikit-learn.
  • Apply the full ML workflow β€” split, fit, evaluate, tune β€” without making the classic beginner mistakes.
  • Communicate findings via a short executive summary and a 2Γ—2 dashboard.

▢️ Getting started

Option A β€” Google Colab (recommended, 0 setup)

Click any of the Open in Colab badges above. Sign in with a Google account. That's it.

Option B β€” Run locally

If you'd rather have a local environment:

# clone the repo
git clone https://github.com/BridgingAISocietySummerSchools/Data-Science-AI-Python-Course.git
cd Data-Science-AI-Python-Course

# either: one-shot setup script
./setup.sh

# or: manual venv
python3 -m venv venv
source venv/bin/activate          # Windows: venv\Scripts\activate
pip install -r requirements.txt
jupyter notebook

Start with 01_python_basics.ipynb and work through in order.

πŸ—‚οΈ Repository layout

πŸ“ Data-Science-AI-Python-Course/
β”œβ”€β”€ πŸ““ 01_python_basics.ipynb          # Variables, types, arithmetic, f-strings
β”œβ”€β”€ πŸ““ 02_control_structures.ipynb     # if/elif/else, loops, try/except
β”œβ”€β”€ πŸ““ 03_lists_data_structures.ipynb  # Lists, indexing, slicing, comprehensions
β”œβ”€β”€ πŸ““ 04_dictionaries_advanced.ipynb  # Dictionaries, nested data, JSON
β”œβ”€β”€ πŸ““ 05_pandas_preview.ipynb         # DataFrames, groupby, plotting
β”œβ”€β”€ πŸ““ 06_functions_modules.ipynb      # Functions, defaults, scope, imports
β”œβ”€β”€ πŸ““ 07_numpy_fundamentals.ipynb     # Arrays, vectorisation, broadcasting
β”œβ”€β”€ πŸ““ 08_matplotlib_basics.ipynb      # Professional plotting
β”œβ”€β”€ πŸ““ 09_scikit_learn_basics.ipynb    # Classification + regression
β”œβ”€β”€ πŸ““ 10_capstone_project.ipynb       # End-to-end weather analysis
β”œβ”€β”€ πŸ“„ README.md                       # ← you are here
β”œβ”€β”€ πŸ“„ Python Data Science Cheat Sheet.md  # Quick syntax reference
β”œβ”€β”€ πŸ“„ CHANGELOG.md                    # Version history
β”œβ”€β”€ πŸ“„ CONTRIBUTING.md                 # How to contribute
β”œβ”€β”€ πŸ“„ requirements.txt                # Python dependencies
β”œβ”€β”€ πŸ“„ requirements-dev.txt            # Dev-only dependencies
└── πŸ› οΈ setup.sh                        # One-shot local setup

🧭 Suggested learning path

The notebooks are designed to be done in order. Each notebook assumes you've internalised the previous ones.

                                Recommended order
   1 ──► 2 ──► 3 ──► 4 ──► 5 ──► 6 ──► 7 ──► 8 ──► 9 ──► 10
   β”‚     β”‚     β”‚     β”‚     β”‚     β”‚     β”‚     β”‚     β”‚     β”‚
   β”‚     β”‚     β”‚     β”‚     β”‚     β”‚     β”‚     β”‚     β”‚     └─ πŸ† Capstone
   β”‚     β”‚     β”‚     β”‚     β”‚     β”‚     β”‚     β”‚     └─────── ML models
   β”‚     β”‚     β”‚     β”‚     β”‚     β”‚     β”‚     └─────────── Visualisation
   β”‚     β”‚     β”‚     β”‚     β”‚     β”‚     └─────────────── NumPy arrays
   β”‚     β”‚     β”‚     β”‚     β”‚     └─────────────────── Functions / modules
   β”‚     β”‚     β”‚     β”‚     └─────────────────────── First pandas
   β”‚     β”‚     β”‚     └─────────────────────────── Dictionaries / JSON
   β”‚     β”‚     └─────────────────────────────── Lists & slicing
   β”‚     └─────────────────────────────────── Decisions & loops
   └──────────────────────────────────── Python fundamentals

A typical schedule:

Pace Plan
1 hour / day 1 notebook per day β†’ done in ~2 weeks
3 hours / weekend 3 notebooks per weekend β†’ done in ~3 weekends
Bootcamp weekend All 10 in 2 days (~6 hours pure + breaks)

πŸ§ͺ What's inside each notebook?

Every notebook follows the same modern structure:

  1. Header β€” module, time estimate, learning objectives, prerequisites.
  2. Sections with intuition first, then code, then a brief reflection on the output.
  3. Small examples β†’ larger applied examples β†’ exercises.
  4. 5 + practice exercises including at least one "Debug me 🐞".
  5. Complete solutions with explanations (collapsed in <details>).
  6. Key takeaways + self-assessment checklist + next-step pointer.

🧯 Troubleshooting

Imports fail in Colab. Almost never happens β€” Colab has the full stack. If it does, run !pip install <package> in a fresh cell.

Matplotlib plots don't show locally. Make sure you're running a Jupyter notebook (not a .py file). In some setups you may need %matplotlib inline in the first cell.

Jupyter won't start locally. pip install --upgrade jupyter usually fixes it. Fresh venv: rm -rf venv && python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt.

scikit-learn load_boston errors. It was removed in scikit-learn 1.2 β€” Notebook 9 uses the California Housing dataset instead.

πŸ“¦ Dependencies

The full list lives in requirements.txt. The pinned core:

  • numpy β‰₯ 1.24
  • pandas β‰₯ 2.0
  • matplotlib β‰₯ 3.7
  • scikit-learn β‰₯ 1.3
  • scipy β‰₯ 1.10
  • seaborn β‰₯ 0.12 (optional, used briefly)
  • jupyter β‰₯ 1.0

Python 3.10 or newer is recommended (we use PEP 604 union types).

πŸ“š Further reading

πŸ“° Related reading: Learn Python for Data Science


🀝 Contributing

We welcome contributions! Check CONTRIBUTING.md for guidelines. Found a typo, a bug, or a clearer way to explain something? Open an issue or a PR.

πŸ“„ License

MIT β€” see LICENSE.


Every expert was once a beginner. The only difference is they started.

Welcome β€” and have fun. What will you build with your data-science skills? πŸš€

Made with ❀️ for the Data Science community

⬆ Back to top

Releases

No releases published

Packages

 
 
 

Contributors