Implement GLOFAS dataset by StuberSimon · Pull Request #498 · PyPSA/atlite

StuberSimon · 2026-04-06T10:18:18Z

Changes proposed in this Pull Request

Implement GLOFAS dataset for hydro functionality
Add option to use glofas data for plant inflow

Checklist

Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
Unit tests for new features were added (if applicable).
Newly introduced dependencies are added to environment.yaml, environment_docs.yaml and setup.py (if applicable).
A note for the release notes doc/release_notes.rst of the upcoming release is included.
I consent to the release of this PR's code under the MIT license.

StuberSimon · 2026-04-06T10:28:37Z

Hey there, I'm still new to the github workflow, so please let me know how I can improve :)
This is part of my bachelors thesis supervised by @doneachh.
Thank you @ekatef and @euronion for offering your support.
This pr is a work in progress, next step for me is to make river discharge provided by glofas selectable in cutout.hydro()

for more information, see https://pre-commit.ci

ekatef

Great @StuberSimon! Looks a very good progress, and happy to see GloFAS is being implemented. Have taken a liberty to do a preliminary code review and added a few comments hoping to assist you in getting used to github. My general impression that you are getting things perfectly right and the implementation looks quite neat in general.

ekatef · 2026-04-08T19:56:20Z

+            "dis24": "discharge",
+        }
+    )
+    # round coords since cds coords are float32 which would lead to mismatches


Is it the case also for GloFAS data?

ekatef · 2026-04-08T19:57:48Z

+        coords_as_attributes=[
+            "surface",
+            "depthBelowLandLayer",
+            "entireAtmosphere",
+            "heightAboveGround",
+            "meanSea",


It looks the names of the variables must be updated

ekatef · 2026-04-08T19:58:18Z

+    rename_vars = {
+        "time": "forecast_reference_time",
+        "step": "forecast_period",
+        "isobaricInhPa": "pressure_level",
+        "hybrid": "model_level",
+    }


A revision may be needed also here

ekatef · 2026-04-08T20:00:27Z

+    Download data like Glofas from the Climate Data Store (CDS).
+
+    If you want to track the state of your request go to
+    https://cds-beta.climate.copernicus.eu/requests?tab=all


I guess the link needs update to match EWDS server (btw, it looks the doc string may need in era5.py may require an update as well):

Suggested change

https://cds-beta.climate.copernicus.eu/requests?tab=all

https://ewds.climate.copernicus.eu/requests?tab=all

ekatef · 2026-04-08T20:09:33Z

+    If static is False, this function creates a query for each month and year
+    in the time axis in coords. This ensures not running into size query limits
+    of the cdsapi even with very (spatially) large cutouts.
+    If static is True, the function return only one set of parameters
+    for the very first time point.


I have noticed that EWDS server doesn't favour download a dataset for periods more than one year even for really small spatial areas (like 2x2 degrees). So, it looks even more crucial to have temporal disaggregation for GloFAS retrieval as compared with ERA5

ekatef · 2026-04-08T20:11:44Z

+    monthly_requests : bool, optional
+        If True, the data is requested on a monthly basis. This is useful for
+        large cutouts, where the data is requested in smaller chunks. The
+        default is False


Given EWDS limitations, I'd expect monthly_requests=True could be a more reasonable default. Could be good to test if monthly_requests=False actually works for GloFAS retrieval

ekatef · 2026-04-08T20:18:41Z

+    Get inflow time-series for `plants` by extracting the discharge time series for
+    the nearest grid points.


Completely agree that it's worth to make sure the naming reflects different nature of variables in ERA5 and GloFAS datasets.
It could be a good idea also to mention both datasets in docstrings of the respective functions (that is mention here that _hydro_from_discharge is intended for usage on GloFAS data)

ekatef · 2026-04-08T20:28:11Z

Hey there, I'm still new to the github workflow, so please let me know how I can improve :) This is part of my bachelors thesis supervised by @doneachh. Thank you @ekatef and @euronion for offering your support. This pr is a work in progress, next step for me is to make river discharge provided by glofas selectable in cutout.hydro()

Thank you @StuberSimon, looks a great contribution! From my users' perspective, I strongly confirm that having GloFAS would be an amazing feature 🙂

Have added a few preliminary technical comments while leaving more in-depth analysis for @euronion who has much deeper understanding of atlite architecture.

For the next step, do you need any support?

ValeMTo · 2026-04-10T13:10:25Z

Hi @StuberSimon, @ekatef, and @euronion

First off, great work on this PR. I completely agree with @ekatef that having GloFAS natively integrated is a fantastic and much-needed feature for the community.

I'm jumping into this conversation since @Asdominet34 and I have worked on improving the hydromodeling in PyPSA-Eur, by firstly connecting it with GloFAS, "calibrating" it, and then validating it with the real hydro production in Europe. Before pushing the work, we would like to finalize the pumped-storage logic and the working paper we had in mind. I think that there is space for collaboration. What do you think about organising a call next week?

StuberSimon · 2026-04-14T12:40:30Z

Hi @ValeMTo that sounds great!
This week was already quite full, what do you think about next Tuesday, April 21st between 11:45 and 15:15 (UTC+2)?
You can reach me on discord under the name simonstuber.

Asdominet34 · 2026-04-17T07:20:59Z

Hi @ValeMTo that sounds great! This week was already quite full, what do you think about next Tuesday, April 21st between 11:45 and 15:15 (UTC+2)? You can reach me on discord under the name simonstuber.

Hi @simon, thanks for your message!

Unfortunately next week is quite busy on our side as well, so we won’t be available then. We’ll get back to you shortly with our availability starting from the following week.

Sorry again and looking forward to connecting!

coroa

Sorry for dragging my feet on this PR. Actually, i retract my earlier statement that this has no space within atlite. I think you showed that this can be accommodated here in a way that is useful and backwards compatible.

A couple of changes would be good:

abstract common code shared between era5.py and glofas.py in some cds_helper.py module
clean up the coordinates and variable names in use (ie. comments by @ekatef )
maybe an evaluation of potential heuristics for selecting the correct discharge grid cells

coroa · 2026-06-02T13:10:48Z

+    for plant in plants.itertuples():
+        # Extract the discharge time series for the nearest point
+        inflow.loc[dict(plant=plant.Index)] = discharge.sel(
+            x=plant.lon, y=plant.lat, method="nearest"
+        )


My previous look into these datasets suggested that it is quite easy to miss the correct river cells due to small misalignments in the datasets, so i'd expect one would like to have some more sophisticated find closest river cells which are actually part of the river, rather than find closest cell heuristic. And that this would need testing.

coroa · 2026-06-02T14:29:16Z

+def retrieve_data(
+    product: str,
+    chunks: dict[str, int] | None = None,
+    tmpdir: str | Path | None = None,
+    lock: SerializableLock | None = None,
+    **updates,
+) -> xr.Dataset:


This looks like a copy from era5. Could we add a cds_helper.py module and move that there in a way that it can be used by both files. Together with methods like noisy_unlink

coroa · 2026-06-02T14:30:50Z

+def _hydro_from_discharge(
+    cutout,
+    plants,
+):


I'd prefer to move the _hydro_from_discharge and _hydro_from_inflow into the hydro.py module

StuberSimon and others added 3 commits April 8, 2026 15:46

add GLOFAS dataset module

61810fa

split hydro functionality into runoff and discharge based

5f3cfec

[pre-commit.ci] auto fixes from pre-commit.com hooks

33ef0bf

for more information, see https://pre-commit.ci

StuberSimon force-pushed the glofas-support branch from 79878e8 to 33ef0bf Compare April 8, 2026 13:52

ekatef reviewed Apr 8, 2026

View reviewed changes

ekatef mentioned this pull request Apr 14, 2026

[PARENT] Enable using custom data for hydro potential open-energy-transition/pypsa-zambia#162

Closed

coroa requested changes Jun 2, 2026

View reviewed changes

	https://cds-beta.climate.copernicus.eu/requests?tab=all
	https://ewds.climate.copernicus.eu/requests?tab=all

		Get inflow time-series for `plants` by extracting the discharge time series for
		the nearest grid points.

Conversation

StuberSimon commented Apr 6, 2026

Changes proposed in this Pull Request

Checklist

Uh oh!

StuberSimon commented Apr 6, 2026

Uh oh!

ekatef left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ekatef commented Apr 8, 2026

Uh oh!

ValeMTo commented Apr 10, 2026

Uh oh!

StuberSimon commented Apr 14, 2026

Uh oh!

Asdominet34 commented Apr 17, 2026

Uh oh!

coroa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants