Skip to content

Latest commit

 

History

History
27 lines (22 loc) · 1.91 KB

File metadata and controls

27 lines (22 loc) · 1.91 KB

examples

Runnable, single-file demos of the raincloud.load(slug) datasets-like API — load a real catalog dataset and run an actual query. (For authoring templates — new manifest entries and streaming handlers — see ../templates/.)

File What it does Engine Cost on first run
use_loader.py API basics: metadata, format override, .to_arrow / .scan / .to_pandas, env vars, the typed exception hierarchy. none (metadata is network-free; --materialize resolves one artifact)
nyc_taxi_tip_rate.py Of "probably-valid" 2025 yellow-cab trips, what % left no recorded tip? Broken down by payment_type to expose that the TLC only records card tips. DuckDB over .scan() ~900 MB (48.7M rows, 12 monthly parquets)
kepler_exoplanets.py How many Kepler candidates are CONFIRMED vs FALSE POSITIVE, and what's the smallest confirmed planet? pandas ~3 MB, seconds
wine_quality_correlations.py Which physicochemical features correlate with a wine's quality score? pandas .corr() ~80 KB, instant
olympic_medals.py Top medal-winning nations and medals per decade across 120 years of the Games. DuckDB over .scan() ~5 MB, seconds

Running them

pip install "raincloud[build,duckdb,pandas] @ git+https://github.com/spiraldb/raincloud"
python examples/kepler_exoplanets.py

There is no public Raincloud mirror, so the first run of an example fetches the upstream data and builds the artifact locally (this is what the [build] extra is for); subsequent runs hit the local cache and are fast. If your team runs a private mirror, set RAINCLOUD_MIRROR=s3://bucket/prefix (or file:///path) and the examples pull from it instead of building. [duckdb] backs .scan(), [pandas] backs .to_pandas().