Skip to content

Performance of extract_points vs rasterio sample #81

@martinfleis

Description

@martinfleis

One of the questions in the recent Earthmover's webinar on xvec was about the performance of extract_points compared to rasterio's sample method. I have never tested this before so wanted to give it a go and for a large lazy-loaded raster (digital terrain model), our extract_points is waaaay slower. See https://notebooksharing.space/view/4459f651d27b2f214f8590c30aba0782f7515d4c16b00dcd3283492f19f8e694#displayOptions=

The DTM is from https://geoportalpraha.cz/en/data-and-services/97d2c9c11aa9478cb21b469b8a4f820e in case you'd like to test the same but any raster should do the trick I assume.

Under the hood, extract_points is simply passing the coordinates to .sel with method='nearest', which should be doing exactly the same as rasterio's sample.

xvec/xvec/accessor.py

Lines 1261 to 1263 in 66b541b

subset = self._obj.sel(
{x_coords: x_, y_coords: y_}, method="nearest", tolerance=tolerance
)

This is not optimal.

We could possibly use sample via rioxarray if the raster is loaded via xarray as it is available through dtm_da.rio._manager.acquire().sample(list(zip(x, y))) but that is relying on a private API of rasterio. I'll open an issue there if there's an appetite to expose sample on the rio accessor.

Outside of relying on rasterio, is there a way of speeding it up using some xarray magic?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions