Cache online data#500
Open
teunbrand wants to merge 7 commits into
Open
Conversation
Prepares for the addition of an online_data module by giving the existing builtin-dataset module a more specific name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the ggsql-specific `extract_builtin_dataset_names` with a prefix-parameterised `extract_prefixed_dataset_names` and extend `rewrite_namespaced_sql` to also handle the `online:` prefix. Adds `naming::online_data_table()` for the new namespace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduces `online:world`, `online:states`, `online:us-counties` (and resolution/alias variants) as data sources that auto-download and cache Natural Earth parquet files. The download logic is native-only (ureq); the registry and parquet parsing are platform-independent so wasm can supply its own fetch implementation later. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR advances #453.
This mechanism is separate enough from #453's goal that I think could be reviewed as a separate PR.
Essentially, it adds cached, online versions of the builtin data, so that e.g.
FROM online:us_stateswill download and cache some preformatted parquet file.Notably the caching mechanism needs a different implementation for WASM as I don't think the file system works the same way and maybe there might be some trusted source thingies going on. I've only implemented the non-WASM path, but left a hook to slot WASM into.