extract#
- geowombat.extract(data, aoi, bands=None, time_names=None, band_names=None, frac=1.0, min_frac_area=None, all_touched=False, id_column='id', time_format='%Y%m%d', mask=None, n_jobs=8, verbose=0, n_workers=1, n_threads=-1, use_client=False, address=None, total_memory=24, processes=False, pool_kwargs=None, **kwargs)#
Extracts data within an area or points of interest. Projections do not need to match, as they are handled ‘on-the-fly’.
- Parameters:
data (DataArray) – The
xarray.DataArray
to extract data from.aoi (str or GeoDataFrame) – A file or
geopandas.GeoDataFrame
to extract data frame.bands (Optional[int or 1d array-like]) – A band or list of bands to extract. If not given, all bands are used. Bands should be GDAL-indexed (i.e., the first band is 1, not 0).
band_names (Optional[list]) – A list of band names. Length should be the same as bands.
time_names (Optional[list]) – A list of time names.
frac (Optional[float]) – A fractional subset of points to extract in each polygon feature.
min_frac_area (Optional[int | float]) – A minimum polygon area to use
frac
. Otherwise, use all samples within a polygon.all_touched (Optional[bool]) – The
all_touched
argument is passed torasterio.features.rasterize()
.id_column (Optional[str]) – The id column name.
time_format (Optional[str]) – The
datetime
conversion format iftime_names
aredatetime
objects.mask (Optional[GeoDataFrame or Shapely Polygon]) – A
shapely.geometry.Polygon
mask to subset to.n_jobs (Optional[int]) – The number of features to rasterize in parallel.
verbose (Optional[int]) – The verbosity level.
n_workers (Optional[int]) – The number of process workers. Only applies when
use_client
=True
.n_threads (Optional[int]) – The number of thread workers. Only applies when
use_client
=True
.use_client (Optional[bool]) – Whether to use a
dask
client.address (Optional[str]) – A cluster address to pass to client. Only used when
use_client
=True
.total_memory (Optional[int]) – The total memory (in GB) required when
use_client
=True
.processes (Optional[bool]) – Whether to use process workers with the
dask.distributed
client. Only applies whenuse_client
=True
.pool_kwargs (Optional[dict]) – Keyword arguments passed to
multiprocessing.Pool().imap()
.kwargs (Optional[dict]) – Keyword arguments passed to
dask.compute()
.
- Return type:
GeoDataFrame
- Returns:
geopandas.GeoDataFrame
Examples
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> df = gw.extract(src, 'poly.gpkg') >>> >>> # On a cluster >>> # Use a local cluster >>> with gw.open('image.tif') as src: >>> df = gw.extract(src, 'poly.gpkg', use_client=True, n_threads=16) >>> >>> # Specify the client address with a local cluster >>> with LocalCluster( >>> n_workers=1, >>> threads_per_worker=8, >>> scheduler_port=0, >>> processes=False, >>> memory_limit='4GB' >>> ) as cluster: >>> >>> with gw.open('image.tif') as src: >>> df = gw.extract( >>> src, >>> 'poly.gpkg', >>> use_client=True, >>> address=cluster >>> )