Implement GLOFAS dataset#498
Conversation
|
Hey there, I'm still new to the github workflow, so please let me know how I can improve :) |
79878e8 to
33ef0bf
Compare
ekatef
left a comment
There was a problem hiding this comment.
Great @StuberSimon! Looks a very good progress, and happy to see GloFAS is being implemented. Have taken a liberty to do a preliminary code review and added a few comments hoping to assist you in getting used to github. My general impression that you are getting things perfectly right and the implementation looks quite neat in general.
| "dis24": "discharge", | ||
| } | ||
| ) | ||
| # round coords since cds coords are float32 which would lead to mismatches |
There was a problem hiding this comment.
Is it the case also for GloFAS data?
| coords_as_attributes=[ | ||
| "surface", | ||
| "depthBelowLandLayer", | ||
| "entireAtmosphere", | ||
| "heightAboveGround", | ||
| "meanSea", |
There was a problem hiding this comment.
It looks the names of the variables must be updated
| rename_vars = { | ||
| "time": "forecast_reference_time", | ||
| "step": "forecast_period", | ||
| "isobaricInhPa": "pressure_level", | ||
| "hybrid": "model_level", | ||
| } |
There was a problem hiding this comment.
A revision may be needed also here
| Download data like Glofas from the Climate Data Store (CDS). | ||
|
|
||
| If you want to track the state of your request go to | ||
| https://cds-beta.climate.copernicus.eu/requests?tab=all |
There was a problem hiding this comment.
I guess the link needs update to match EWDS server (btw, it looks the doc string may need in era5.py may require an update as well):
| https://cds-beta.climate.copernicus.eu/requests?tab=all | |
| https://ewds.climate.copernicus.eu/requests?tab=all |
| If static is False, this function creates a query for each month and year | ||
| in the time axis in coords. This ensures not running into size query limits | ||
| of the cdsapi even with very (spatially) large cutouts. | ||
| If static is True, the function return only one set of parameters | ||
| for the very first time point. |
There was a problem hiding this comment.
I have noticed that EWDS server doesn't favour download a dataset for periods more than one year even for really small spatial areas (like 2x2 degrees). So, it looks even more crucial to have temporal disaggregation for GloFAS retrieval as compared with ERA5
| monthly_requests : bool, optional | ||
| If True, the data is requested on a monthly basis. This is useful for | ||
| large cutouts, where the data is requested in smaller chunks. The | ||
| default is False |
There was a problem hiding this comment.
Given EWDS limitations, I'd expect monthly_requests=True could be a more reasonable default. Could be good to test if monthly_requests=False actually works for GloFAS retrieval
| Get inflow time-series for `plants` by extracting the discharge time series for | ||
| the nearest grid points. |
There was a problem hiding this comment.
Completely agree that it's worth to make sure the naming reflects different nature of variables in ERA5 and GloFAS datasets.
It could be a good idea also to mention both datasets in docstrings of the respective functions (that is mention here that _hydro_from_discharge is intended for usage on GloFAS data)
Thank you @StuberSimon, looks a great contribution! From my users' perspective, I strongly confirm that having GloFAS would be an amazing feature 🙂 Have added a few preliminary technical comments while leaving more in-depth analysis for @euronion who has much deeper understanding of For the next step, do you need any support? |
|
Hi @StuberSimon, @ekatef, and @euronion First off, great work on this PR. I completely agree with @ekatef that having GloFAS natively integrated is a fantastic and much-needed feature for the community. I'm jumping into this conversation since @Asdominet34 and I have worked on improving the hydromodeling in PyPSA-Eur, by firstly connecting it with GloFAS, "calibrating" it, and then validating it with the real hydro production in Europe. Before pushing the work, we would like to finalize the pumped-storage logic and the working paper we had in mind. I think that there is space for collaboration. What do you think about organising a call next week? |
|
Hi @ValeMTo that sounds great! |
Hi @simon, thanks for your message! Unfortunately next week is quite busy on our side as well, so we won’t be available then. We’ll get back to you shortly with our availability starting from the following week. Sorry again and looking forward to connecting! |
coroa
left a comment
There was a problem hiding this comment.
Sorry for dragging my feet on this PR. Actually, i retract my earlier statement that this has no space within atlite. I think you showed that this can be accommodated here in a way that is useful and backwards compatible.
A couple of changes would be good:
- abstract common code shared between era5.py and glofas.py in some cds_helper.py module
- clean up the coordinates and variable names in use (ie. comments by @ekatef )
- maybe an evaluation of potential heuristics for selecting the correct discharge grid cells
| for plant in plants.itertuples(): | ||
| # Extract the discharge time series for the nearest point | ||
| inflow.loc[dict(plant=plant.Index)] = discharge.sel( | ||
| x=plant.lon, y=plant.lat, method="nearest" | ||
| ) |
There was a problem hiding this comment.
My previous look into these datasets suggested that it is quite easy to miss the correct river cells due to small misalignments in the datasets, so i'd expect one would like to have some more sophisticated find closest river cells which are actually part of the river, rather than find closest cell heuristic. And that this would need testing.
| def retrieve_data( | ||
| product: str, | ||
| chunks: dict[str, int] | None = None, | ||
| tmpdir: str | Path | None = None, | ||
| lock: SerializableLock | None = None, | ||
| **updates, | ||
| ) -> xr.Dataset: |
There was a problem hiding this comment.
This looks like a copy from era5. Could we add a cds_helper.py module and move that there in a way that it can be used by both files. Together with methods like noisy_unlink
| def _hydro_from_discharge( | ||
| cutout, | ||
| plants, | ||
| ): |
There was a problem hiding this comment.
I'd prefer to move the _hydro_from_discharge and _hydro_from_inflow into the hydro.py module
Refs #450
Changes proposed in this Pull Request
Checklist
doc.environment.yaml,environment_docs.yamlandsetup.py(if applicable).doc/release_notes.rstof the upcoming release is included.