extract#
- geowombat.extract(data, aoi, bands=None, time_names=None, band_names=None, frac=1.0, min_frac_area=None, all_touched=False, id_column='id', time_format='%Y%m%d', mask=None, n_jobs=8, verbose=0, n_workers=1, n_threads=-1, use_client=False, address=None, total_memory=24, processes=False, pool_kwargs=None, **kwargs)#
Extracts data within an area or points of interest. Projections do not need to match, as they are handled ‘on-the-fly’.
- Parameters:
data (DataArray) – The
xarray.DataArrayto extract data from.aoi (str or GeoDataFrame) – A file or
geopandas.GeoDataFrameto extract data frame.bands (Optional[int or 1d array-like]) – A band or list of bands to extract. If not given, all bands are used. Bands should be GDAL-indexed (i.e., the first band is 1, not 0).
band_names (Optional[list]) – A list of band names. Length should be the same as bands.
time_names (Optional[list]) – A list of time names.
frac (Optional[float]) – A fractional subset of points to extract in each polygon feature.
min_frac_area (Optional[int | float]) – A minimum polygon area to use
frac. Otherwise, use all samples within a polygon.all_touched (Optional[bool]) – The
all_touchedargument is passed torasterio.features.rasterize().id_column (Optional[str]) – The id column name.
time_format (Optional[str]) – The
datetimeconversion format iftime_namesaredatetimeobjects.mask (Optional[GeoDataFrame or Shapely Polygon]) – A
shapely.geometry.Polygonmask to subset to.n_jobs (Optional[int]) – The number of features to rasterize in parallel.
verbose (Optional[int]) – The verbosity level.
n_workers (Optional[int]) – The number of process workers. Only applies when
use_client=True.n_threads (Optional[int]) – The number of thread workers. Only applies when
use_client=True.use_client (Optional[bool]) – Whether to use a
daskclient.address (Optional[str]) – A cluster address to pass to client. Only used when
use_client=True.total_memory (Optional[int]) – The total memory (in GB) required when
use_client=True.processes (Optional[bool]) – Whether to use process workers with the
dask.distributedclient. Only applies whenuse_client=True.pool_kwargs (Optional[dict]) – Keyword arguments passed to
multiprocessing.Pool().imap().kwargs (Optional[dict]) – Keyword arguments passed to
dask.compute().
- Return type:
GeoDataFrame- Returns:
geopandas.GeoDataFrame
Examples
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> df = gw.extract(src, 'poly.gpkg') >>> >>> # On a cluster >>> # Use a local cluster >>> with gw.open('image.tif') as src: >>> df = gw.extract(src, 'poly.gpkg', use_client=True, n_threads=16) >>> >>> # Specify the client address with a local cluster >>> with LocalCluster( >>> n_workers=1, >>> threads_per_worker=8, >>> scheduler_port=0, >>> processes=False, >>> memory_limit='4GB' >>> ) as cluster: >>> >>> with gw.open('image.tif') as src: >>> df = gw.extract( >>> src, >>> 'poly.gpkg', >>> use_client=True, >>> address=cluster >>> )