GeoWombat DataArray (DataArray.gw)#
- class geowombat.core.geoxarray.GeoWombatAccessor(xarray_obj)[source]#
Bases:
_UpdateConfig,DataPropertiesA method to access a
xarray.DataArray. This class is typically not accessed directly, but rather through a call togeowombat.open.An
xarray.DataArrayobject will have agwmethod.To access GeoWombat methods, use
xarray.DataArray.gw.
- Attributes:
affineGet the affine transform object.
altitudeGet satellite altitudes (in km)
array_is_daskGet whether the array is a Dask array.
avail_sensorsGet supported sensors.
band_chunksGet the band chunk size.
bottomGet the array bounding box bottom coordinate.
boundsGet the array bounding box (left, bottom, right, top)
bounds_as_namedtupleGet the array bounding box as a
rasterio.coords.BoundingBoxcellxGet the cell size in the x direction.
cellxhGet the half width of the cell size in the x direction.
cellyGet the cell size in the y direction.
cellyhGet the half width of the cell size in the y direction.
central_umGet a dictionary of central wavelengths (in micrometers)
chunk_gridGet the image chunk grid.
col_chunksGet the column chunk size.
crs_to_pyprojGet the CRS as a
pyproj.CRSobject.data_are_separateChecks whether the data are loaded separately.
data_are_stackedChecks whether the data are stacked.
dtypeGet the data type of the DataArray.
filenamesGets the data filenames.
footprint_gridGet the image footprint grid.
geodataframeGet a
geopandas.GeoDataFrameof the array bounds.geometryGet the polygon geometry of the array bounding box.
has_bandCheck whether the DataArray has a band attribute.
has_band_coordCheck whether the DataArray has a band coordinate.
has_band_dimCheck whether the DataArray has a band dimension.
has_timeCheck whether the DataArray has a time attribute.
has_time_coordCheck whether the DataArray has a time coordinate.
has_time_dimCheck whether the DataArray has a time dimension.
leftGet the array bounding box left coordinate.
metaGet the array metadata.
nbandsGet the number of array bands.
ncolsGet the number of array columns.
ndimsGet the number of array dimensions.
nodatavalGet the ‘no data’ value from the attributes.
nrowsGet the number of array rows.
ntimeGet the number of time dimensions.
offsetvalGet the offset value.
pydatetimeGet Python datetime objects from the time dimension.
rightGet the array bounding box right coordinate.
row_chunksGet the row chunk size.
scalevalGet the scale factor value.
sensor_namesGet sensor full names.
time_chunksGet the time chunk size.
topGet the array bounding box top coordinate.
transformGet the data transform (cell x, 0, left, 0, cell y, top)
unary_unionGet a representation of the union of the image bounds.
wavelengthsGet a dictionary of sensor wavelengths.
Methods
apply(filename, user_func[, n_jobs])Applies a user function to an Xarray Dataset or DataArray and writes to file.
assign_nodata_attrs(nodata)Assigns 'no data' attributes.
avi([nodata, mask, sensor, scale_factor])Calculates the advanced vegetation index
band_mask(valid_bands[, src_nodata, ...])Creates a mask from band nonzeros.
bounds_overlay(bounds[, how])Checks whether the bounds overlay the image bounds.
calc_area(values[, op, units, row_chunks, ...])Calculates the area of data values.
check_chunksize(chunksize, array_size)Ensures the chunk size is a multiple of 16 and fits within the array dimension.
clip(df[, query, mask_data, expand_by])Clips a DataArray by vector polygon geometry.
clip_by_polygon(df[, query, mask_data, ...])Clips a DataArray by vector polygon geometry.
compare(op, b[, return_binary])Comparison operation.
compute(**kwargs)Computes data.
detect(detector, **kwargs)Run tiled, georeferenced object detection over this raster.
evi([nodata, mask, sensor, scale_factor])Calculates the enhanced vegetation index
evi2([nodata, mask, sensor, scale_factor])Calculates the two-band modified enhanced vegetation index
extract(aoi[, bands, time_names, ...])Extracts data within an area or points of interest.
gcvi([nodata, mask, sensor, scale_factor])Calculates the green chlorophyll vegetation index
imshow([mask, nodata, flip, text_color, rot])Shows an image on a plot.
kndvi([nodata, mask, sensor, scale_factor])Calculates the kernel normalized difference vegetation index
mask(df[, query, keep])Masks a DataArray.
Masks 'no data' values with nans.
match_data(data, band_names)Coerces the
xarray.DataArrayto match anotherxarray.DataArray.moving([stat, perc, w, nodata, weights])Applies a moving window function to the DataArray.
n_windows([row_chunks, col_chunks])Calculates the number of windows in a row/column iteration.
nbr([nodata, mask, sensor, scale_factor])Calculates the normalized burn ratio
ndvi([nodata, mask, sensor, scale_factor])Calculates the normalized difference vegetation index
norm_brdf(solar_za, solar_az, sensor_za, ...)Applies Bidirectional Reflectance Distribution Function (BRDF) normalization.
norm_diff(b1, b2[, nodata, mask, sensor, ...])Calculates the normalized difference band ratio.
read(band, **kwargs)Reads data for a band or bands.
recode(polygon, to_replace[, num_workers])Recodes a DataArray with polygon mappings.
replace(to_replace)Replace values given in
to_replacewith value.sample([method, band, n, strata, spacing, ...])Generates samples from a raster.
save(filename[, overwrite, scatter, client, ...])Saves a DataArray to raster using rasterio/dask.
set_nodata([src_nodata, dst_nodata, ...])Sets 'no data' values and applies scaling to an
xarray.DataArray.subset([left, top, right, bottom, rows, ...])Subsets a DataArray.
tasseled_cap([nodata, sensor, scale_factor])Applies a tasseled cap transformation
to_netcdf(filename, *args, **kwargs)Writes an Xarray DataArray to a NetCDF file.
to_polygon([mask, connectivity])Converts a
daskarray to aGeoDataFrameto_raster(filename[, readxsize, readysize, ...])Writes an Xarray DataArray to a raster file.
to_vector(filename[, mask, connectivity])Writes an Xarray DataArray to a vector file.
to_vrt(filename[, overwrite, resampling, ...])Writes a file to a VRT file.
to_yolo_dataset(labels, class_col, out_dir)Write a YOLO-format training dataset from this raster + labels.
transform_crs([dst_crs, dst_res, dst_width, ...])Transforms an
xarray.DataArrayto a new coordinate reference system.wi([nodata, mask, sensor, scale_factor])Calculates the woody vegetation index
windows([row_chunks, col_chunks, ...])Generates windows for a row/column iteration.
Methods Documentation
- avi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the advanced vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[AVI = {(NIR \times (1.0 - red) \times (NIR - red))}^{0.3334}\]- Returns:
Data range: 0 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- apply(filename, user_func, n_jobs=1, **kwargs)[source]#
Applies a user function to an Xarray Dataset or DataArray and writes to file.
- Parameters:
filename (str | Path) – The output file name to write to.
user_func (func) – The user function to apply.
n_jobs (Optional[int]) – The number of parallel jobs for the cluster.
kwargs (Optional[dict]) – Keyword arguments passed to
to_raster().
Example
>>> import geowombat as gw >>> >>> def user_func(ds_): >>> return ds_.max(axis=0) >>> >>> with gw.open('image.tif', chunks=512) as ds: >>> ds.gw.apply( >>> 'output.tif', >>> user_func, >>> n_jobs=8, >>> overwrite=True, >>> blockxsize=512, >>> blockysize=512 >>> )
- assign_nodata_attrs(nodata)[source]#
Assigns ‘no data’ attributes.
- Parameters:
nodata (float | int) – The ‘no data’ value to assign.
- Return type:
DataArray- Returns:
xarray.DataArray
- bounds_overlay(bounds, how='intersects')[source]#
Checks whether the bounds overlay the image bounds.
- Parameters:
bounds (tuple | rasterio.coords.BoundingBox | shapely.geometry) – The bounds to check. If given as a tuple, the order should be (left, bottom, right, top).
how (Optional[str]) – Choices are any
shapely.geometrybinary predicates.
- Return type:
bool- Returns:
bool
Example
>>> import geowombat as gw >>> >>> bounds = (left, bottom, right, top) >>> >>> with gw.open('image.tif') as src >>> intersects = src.gw.bounds_overlay(bounds) >>> >>> from rasterio.coords import BoundingBox >>> >>> bounds = BoundingBox(left, bottom, right, top) >>> >>> with gw.open('image.tif') as src >>> contains = src.gw.bounds_overlay(bounds, how='contains')
- calc_area(values, op='eq', units='km2', row_chunks=None, col_chunks=None, n_workers=1, n_threads=1, scheduler='threads', n_chunks=100)[source]#
Calculates the area of data values.
- Parameters:
values (list) – A list of values.
op (Optional[str]) – The value sign. Choices are [‘gt’, ‘ge’, ‘lt’, ‘le’, ‘eq’].
units (Optional[str]) – The units to return. Choices are [‘km2’, ‘ha’].
row_chunks (Optional[int]) – The row chunk size to process in parallel.
col_chunks (Optional[int]) – The column chunk size to process in parallel.
n_workers (Optional[int]) – The number of parallel workers for
scheduler.n_threads (Optional[int]) – The number of parallel threads for
dask.compute().scheduler (Optional[str]) –
The parallel task scheduler to use. Choices are [‘processes’, ‘threads’, ‘mpool’].
mpool: process pool of workers using
multiprocessing.Poolprocesses: process pool of workers usingconcurrent.futuresthreads: thread pool of workers usingconcurrent.futuresn_chunks (Optional[int]) – The chunk size of windows. If not given, equal to
n_workersx 50.
- Return type:
DataFrame- Returns:
pandas.DataFrame
Example
>>> import geowombat as gw >>> >>> # Read a land cover image with 512x512 chunks >>> with gw.open('land_cover.tif', chunks=512) as src: >>> >>> df = src.gw.calc_area( >>> [1, 2, 5], # calculate the area of classes 1, 2, and 5 >>> units='km2', # return area in kilometers squared >>> n_workers=4, >>> row_chunks=1024, # iterate over larger chunks to use 512 chunks in parallel >>> col_chunks=1024 >>> )
- clip(df, query=None, mask_data=False, expand_by=0)[source]#
Clips a DataArray by vector polygon geometry.
Deprecated since version 2.1.7: Use
xarray.DataArray.gw.clip_by_polygon().- Parameters:
df (GeoDataFrame) – The
geopandas.GeoDataFrameto clip to.query (Optional[str]) – A query to apply to
df.mask_data (Optional[bool]) – Whether to mask values outside of the
dfgeometry envelope.expand_by (Optional[int]) – Expand the clip array bounds by
expand_bypixels on each side.
- Returns:
xarray.DataArray
- compare(op, b, return_binary=False)[source]#
Comparison operation.
- Parameters:
op (str) – The comparison operation.
b (float | int) – The value to compare to.
return_binary (Optional[bool]) – Whether to return a binary (1 or 0) array.
- Returns:
Valid data where
opmeets criteriab, otherwise nans.- Return type:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> # Mask all values greater than 10 >>> thresh = src.gw.compare(op='lt', b=10)
- compute(**kwargs)[source]#
Computes data.
- Parameters:
kwargs (Optional[dict]) – Keyword arguments to pass to
dask.compute().- Return type:
ndarray- Returns:
numpy.ndarray
- evi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the enhanced vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[EVI = 2.5 \times \frac{NIR - red}{NIR \times 6 \times red - 7.5 \times blue + 1}\]- Returns:
Data range: 0 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- evi2(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the two-band modified enhanced vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[EVI2 = 2.5 \times \frac{NIR - red}{NIR + 1 + 2.4 \times red}\]- Returns:
Data range: 0 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- extract(aoi, bands=None, time_names=None, band_names=None, frac=1.0, min_frac_area=None, all_touched=False, id_column='id', time_format='%Y%m%d', mask=None, n_jobs=8, verbose=0, n_workers=1, n_threads=-1, use_client=False, address=None, total_memory=24, processes=False, pool_kwargs=None, **kwargs)[source]#
Extracts data within an area or points of interest. Projections do not need to match, as they are handled ‘on-the-fly’.
- Parameters:
aoi (str or GeoDataFrame) – A file or
geopandas.GeoDataFrameto extract data frame.bands (Optional[int or 1d array-like]) – A band or list of bands to extract. If not given, all bands are used. Bands should be GDAL-indexed (i.e., the first band is 1, not 0).
band_names (Optional[list]) – A list of band names. Length should be the same as bands.
time_names (Optional[list]) – A list of time names.
frac (Optional[float]) – A fractional subset of points to extract in each polygon feature.
min_frac_area (Optional[int | float]) – A minimum polygon area to use
frac. Otherwise, use all samples within a polygon.all_touched (Optional[bool]) – The
all_touchedargument is passed torasterio.features.rasterize().id_column (Optional[str]) – The id column name.
time_format (Optional[str]) – The
datetimeconversion format iftime_namesaredatetimeobjects.mask (Optional[GeoDataFrame or Shapely Polygon]) – A
shapely.geometry.Polygonmask to subset to.n_jobs (Optional[int]) – The number of features to rasterize in parallel.
verbose (Optional[int]) – The verbosity level.
n_workers (Optional[int]) – The number of process workers. Only applies when
use_client=True.n_threads (Optional[int]) – The number of thread workers. Only applies when
use_client=True.use_client (Optional[bool]) – Whether to use a
daskclient.address (Optional[str]) – A cluster address to pass to client. Only used when
use_client=True.total_memory (Optional[int]) – The total memory (in GB) required when
use_client=True.processes (Optional[bool]) – Whether to use process workers with the
dask.distributedclient. Only applies whenuse_client=True.pool_kwargs (Optional[dict]) – Keyword arguments passed to
multiprocessing.Pool().imap().kwargs (Optional[dict]) – Keyword arguments passed to
dask.compute().
- Return type:
GeoDataFrame- Returns:
geopandas.GeoDataFrame
Examples
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> df = src.gw.extract('poly.gpkg') >>> >>> # On a cluster >>> # Use a local cluster >>> with gw.open('image.tif') as src: >>> df = src.gw.extract('poly.gpkg', use_client=True, n_threads=16) >>> >>> # Specify the client address with a local cluster >>> with LocalCluster( >>> n_workers=1, >>> threads_per_worker=8, >>> scheduler_port=0, >>> processes=False, >>> memory_limit='4GB' >>> ) as cluster: >>> >>> with gw.open('image.tif') as src: >>> df = src.gw.extract( >>> 'poly.gpkg', >>> use_client=True, >>> address=cluster >>> )
- imshow(mask=False, nodata=0, flip=False, text_color='black', rot=30, **kwargs)[source]#
Shows an image on a plot.
- Parameters:
mask (Optional[bool]) – Whether to mask ‘no data’ values (given by
nodata).nodata (Optional[int or float]) – The ‘no data’ value.
flip (Optional[bool]) – Whether to flip an RGB array’s band order.
text_color (Optional[str]) – The text color.
rot (Optional[int]) – The degree rotation for the x-axis tick labels.
kwargs (Optional[dict]) – Keyword arguments passed to
xarray.plot.imshow.
- Return type:
None- Returns:
None
Examples
>>> with gw.open('image.tif') as ds: >>> ds.gw.imshow(band_names=['red', 'green', 'red'], mask=True, vmin=0.1, vmax=0.9, robust=True)
- kndvi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the kernel normalized difference vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[kNDVI = tanh({NDVI}^2)\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- mask(df, query=None, keep='in')[source]#
Masks a DataArray.
- Parameters:
df (GeoDataFrame or str) – The
geopandas.GeoDataFrameor filename to use for masking.query (Optional[str]) – A query to apply to
df.keep (Optional[str]) – If
keep= ‘in’, mask values outside of the geometry (keep inside). Otherwise, ifkeep= ‘out’, mask values inside (keep outside).
- Return type:
DataArray- Returns:
xarray.DataArray
- mask_nodata()[source]#
Masks ‘no data’ values with nans.
- Return type:
DataArray- Returns:
xarray.DataArray
- match_data(data, band_names)[source]#
Coerces the
xarray.DataArrayto match anotherxarray.DataArray.- Parameters:
data (DataArray) – The
xarray.DataArrayto match to.band_names (1d array-like) – The output band names.
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> import xarray as xr >>> >>> other_array = xr.DataArray() >>> >>> with gw.open('image.tif') as src: >>> new_array = other_array.gw.match_data(src, ['bd1'])
- moving(stat='mean', perc=50, w=3, nodata=None, weights=False)[source]#
Applies a moving window function to the DataArray.
- Parameters:
stat (Optional[str]) – The statistic to compute. Choices are [‘mean’, ‘std’, ‘var’, ‘min’, ‘max’, ‘perc’].
perc (Optional[int]) – The percentile to return if
stat= ‘perc’.w (Optional[int]) – The moving window size (in pixels).
nodata (Optional[int or float]) – A ‘no data’ value to ignore.
weights (Optional[bool]) – Whether to weight values by distance from window center.
- Return type:
DataArray- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> >>> # Calculate the mean within a 5x5 window >>> with gw.open('image.tif') as src: >>> res = src.gw.moving(stat='mean', w=5, nodata=32767.0) >>> >>> # Calculate the 90th percentile within a 15x15 window >>> with gw.open('image.tif') as src: >>> res = src.gw.moving(stat='perc', w=15, perc=90, nodata=32767.0) >>> res.data.compute(num_workers=4)
- nbr(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the normalized burn ratio
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[NBR = \frac{NIR - SWIR1}{NIR + SWIR1}\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- ndvi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the normalized difference vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[NDVI = \frac{NIR - red}{NIR + red}\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- norm_brdf(solar_za, solar_az, sensor_za, sensor_az, sensor=None, wavelengths=None, nodata=None, mask=None, scale_factor=1.0, scale_angles=True)[source]#
Applies Bidirectional Reflectance Distribution Function (BRDF) normalization.
- Parameters:
solar_za (2d DataArray) – The solar zenith angles (degrees).
solar_az (2d DataArray) – The solar azimuth angles (degrees).
sensor_za (2d DataArray) – The sensor azimuth angles (degrees).
sensor_az (2d DataArray) – The sensor azimuth angles (degrees).
sensor (Optional[str]) – The satellite sensor.
wavelengths (str list) – The wavelength(s) to normalize.
nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[DataArray]) – A data mask, where clear values are 0.
scale_factor (Optional[float]) – A scale factor to apply to the input data.
scale_angles (Optional[bool]) – Whether to scale the pixel angle arrays.
- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> >>> # Example where pixel angles are stored in separate GeoTiff files >>> with gw.config.update(sensor='l7', scale_factor=0.0001, nodata=0): >>> >>> with gw.open('solarz.tif') as solarz, >>> gw.open('solara.tif') as solara, >>> gw.open('sensorz.tif') as sensorz, >>> gw.open('sensora.tif') as sensora: >>> >>> with gw.open('landsat.tif') as ds: >>> ds_brdf = ds.gw.norm_brdf(solarz, solara, sensorz, sensora)
- norm_diff(b1, b2, nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the normalized difference band ratio.
- Parameters:
b1 (str) – The band name of the first band.
b2 (str) – The band name of the second band.
nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[{norm}_{diff} = \frac{b2 - b1}{b2 + b1}\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray- Parameters:
b1 (Any) –
b2 (Any) –
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- read(band, **kwargs)[source]#
Reads data for a band or bands.
- Parameters:
band (int | list) – A band or list of bands to read.
- Return type:
ndarray- Returns:
xarray.DataArray
- recode(polygon, to_replace, num_workers=1)[source]#
Recodes a DataArray with polygon mappings.
- Parameters:
polygon (GeoDataFrame | str) – The
geopandas.DataFrameor file with polygon geometry.to_replace (dict) –
How to find the values to replace. Dictionary mappings should be given as {from: to} pairs. If
to_replaceis an integer/string mapping, the to string should be ‘mode’.- {1: 5}:
recode values of 1 to 5
- {1: ‘mode’}:
recode values of 1 to the polygon mode
num_workers (Optional[int]) – The number of parallel Dask workers (only used if
to_replacehas a ‘mode’ mapping).
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif', chunks=512) as ds: >>> # Recode 1 with 5 within a polygon >>> res = ds.gw.recode('poly.gpkg', {1: 5})
- replace(to_replace)[source]#
Replace values given in
to_replacewith value.- Parameters:
to_replace (dict) –
How to find the values to replace. Dictionary mappings should be given as {from: to} pairs. If
to_replaceis an integer/string mapping, the to string should be ‘mode’.- {1: 5}:
recode values of 1 to 5
- {1: ‘mode’}:
recode values of 1 to the polygon mode
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif', chunks=512) as ds: >>> # Replace 1 with 5 >>> res = ds.gw.replace({1: 5})
- sample(method='random', band=None, n=None, strata=None, spacing=None, min_dist=None, max_attempts=10, num_workers=1, verbose=1, **kwargs)[source]#
Generates samples from a raster.
- Parameters:
data (DataArray) – The
xarray.DataArrayto extract data from.method (Optional[str]) – The sampling method. Choices are [‘random’, ‘systematic’].
band (Optional[int or str]) – The band name to extract from. Only required if
method= ‘random’ andstratais given.n (Optional[int]) – The total number of samples. Only required if
method= ‘random’.strata (Optional[dict]) –
The strata to sample within. The dictionary key–>value pairs should be {‘conditional,value’: proportion}.
E.g.,
strata = {‘==,1’: 0.5, ‘>=,2’: 0.5} … would sample 50% of total samples within class 1 and 50% of total samples in class >= 2.
strata = {‘==,1’: 10, ‘>=,2’: 20} … would sample 10 samples within class 1 and 20 samples in class >= 2.
spacing (Optional[float]) – The spacing (in map projection units) when
method= ‘systematic’.min_dist (Optional[float or int]) – A minimum distance allowed between samples. Only applies when
method= ‘random’.max_attempts (Optional[int]) – The maximum numer of attempts to sample points >
min_distfrom each other.num_workers (Optional[int]) – The number of parallel workers for
dask.compute().verbose (Optional[int]) – The verbosity level.
kwargs (Optional[dict]) – Keyword arguments passed to
geowombat.extract.
- Return type:
GeoDataFrame- Returns:
geopandas.GeoDataFrame
Examples
>>> import geowombat as gw >>> >>> # Sample 100 points randomly across the image >>> with gw.open('image.tif') as ds: >>> df = ds.gw.sample(n=100) >>> >>> # Sample points systematically (with 10km spacing) across the image >>> with gw.open('image.tif') as ds: >>> df = ds.gw.sample(method='systematic', spacing=10000.0) >>> >>> # Sample 50% of 100 in class 1 and 50% in classes >= 2 >>> strata = {'==,1': 0.5, '>=,2': 0.5} >>> with gw.open('image.tif') as ds: >>> df = ds.gw.sample(band=1, n=100, strata=strata) >>> >>> # Specify a per-stratum minimum allowed point distance of 1,000 meters >>> with gw.open('image.tif') as ds: >>> df = ds.gw.sample(band=1, n=100, min_dist=1000, strata=strata)
- save(filename, overwrite=False, scatter=None, client=None, compute=True, tags=None, compress='none', compression=None, num_workers=1, log_progress=True, tqdm_kwargs=None, bigtiff=None)[source]#
Saves a DataArray to raster using rasterio/dask.
- Parameters:
filename (str | Path) – The output file name to write to.
nodata (Optional[float | int]) – The ‘no data’ value. If
None(default), the ‘no data’ value is taken from theDataArraymetadata.overwrite (Optional[bool]) – Whether to overwrite an existing file. Default is False.
scatter (Optional[str]) – Scatter ‘band’ or ‘time’ to separate file. Default is None.
client (Optional[Client object]) – A
dask.distributed.Clientclient object to persist data. Default is None.compute (Optinoal[bool]) – Whether to compute and write to
filename. Otherwise, return thedasktask graph. IfTrue, compute and write tofilename. IfFalse, return thedasktask graph. Default isTrue.tags (Optional[dict]) – Metadata tags to write to file. Default is None.
compress (Optional[str]) –
The file compression type. Default is ‘none’, or no compression.
Note
When using a client, it is advised to use threading. E.g.,
dask.distributed.LocalCluster(processes=False). Process-based concurrency could result in corrupted file blocks.compression (Optional[str]) –
The file compression type. Default is ‘none’, or no compression.
Deprecated since version 2.1.4: Use ‘compress’ – ‘compression’ will be removed in >=2.2.0.
num_workers (Optional[int]) – The number of dask workers (i.e., chunks) to write concurrently. Default is 1.
log_progress (Optional[bool]) – Whether to log the progress bar during writing. Default is True.
tqdm_kwargs (Optional[dict]) – Keyword arguments to pass to
tqdm.bigtiff (Optional[str]) – A GDAL BIGTIFF flag. Choices are [“YES”, “NO”, “IF_NEEDED”, “IF_SAFER”].
- Return type:
None- Returns:
None, writes tofilename
Example
>>> import geowombat as gw >>> >>> with gw.open('file.tif') as src: >>> result = ... >>> result.gw.save('output.tif', compress='lzw', num_workers=8)
- set_nodata(src_nodata=None, dst_nodata=None, out_range=None, dtype=None, scale_factor=None, offset=None)[source]#
Sets ‘no data’ values and applies scaling to an
xarray.DataArray.- Parameters:
src_nodata (int | float) – The ‘no data’ values to replace. Default is
None.dst_nodata (int | float) – The ‘no data’ value to set. Default is
nan.out_range (Optional[tuple]) – The output clip range. Default is
None.dtype (Optional[str]) – The output data type. Default is
None.scale_factor (Optional[float | int]) – A scale factor to apply. Default is
None.offset (Optional[float | int]) – An offset to apply. Default is
None.
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> src = src.gw.set_nodata(0, 65535, out_range=(0, 10000), dtype='uint16')
- subset(left=None, top=None, right=None, bottom=None, rows=None, cols=None, center=False, mask_corners=False)[source]#
Subsets a DataArray.
- Parameters:
left (Optional[float]) – The left coordinate.
top (Optional[float]) – The top coordinate.
right (Optional[float]) – The right coordinate.
bottom (Optional[float]) – The bottom coordinate.
rows (Optional[int]) – The number of output rows.
cols (Optional[int]) – The number of output rows.
center (Optional[bool]) – Whether to center the subset on
leftandtop.mask_corners (Optional[bool]) – Whether to mask corners (requires
pymorph).chunksize (Optional[tuple]) – A new chunk size for the output.
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif', chunks=512) as ds: >>> ds_sub = ds.gw.subset( >>> left=-263529.884, >>> top=953985.314, >>> rows=2048, >>> cols=2048 >>> )
- tasseled_cap(nodata=None, sensor=None, scale_factor=1.0)[source]#
Applies a tasseled cap transformation
- Parameters:
nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> >>> with gw.config.update(sensor='qb', scale_factor=0.0001): >>> with gw.open( >>> 'image.tif', band_names=['blue', 'green', 'red', 'nir'] >>> ) as ds: >>> tcap = ds.gw.tasseled_cap()
- to_polygon(mask=None, connectivity=4)[source]#
Converts a
daskarray to aGeoDataFrame- Parameters:
mask (Optional[numpy ndarray or rasterio Band object]) – Must evaluate to bool (
rasterio.bool_orrasterio.uint8). Values of False or 0 will be excluded from feature generation. Note well that this is the inverse sense from Numpy’s, where a mask value of True indicates invalid data in an array. If source is a Numpy masked array and mask is None, the source’s mask will be inverted and used in place of mask.connectivity (Optional[int]) – Use 4 or 8 pixel connectivity for grouping pixels into features.
- Return type:
GeoDataFrame- Returns:
geopandas.GeoDataFrame
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> >>> # Convert the input image to a GeoDataFrame >>> df = src.gw.to_polygon(mask='source', num_workers=8)
- to_netcdf(filename, *args, **kwargs)[source]#
Writes an Xarray DataArray to a NetCDF file.
- Parameters:
filename (Path | str) – The output file name to write to.
args (DataArray) – Additional
DataArraysto stack.kwargs (dict) – Encoding arguments.
- Return type:
None
Examples
>>> import geowombat as gw >>> import xarray as xr >>> >>> # Write a single DataArray to a .nc file >>> with gw.config.update(sensor='l7'): >>> with gw.open('LC08_L1TP_225078_20200219_20200225_01_T1.tif') as src: >>> src.gw.to_netcdf('filename.nc', zlib=True, complevel=5) >>> >>> # Add extra layers >>> with gw.config.update(sensor='l7'): >>> with gw.open( >>> 'LC08_L1TP_225078_20200219_20200225_01_T1.tif' >>> ) as src, gw.open( >>> 'LC08_L1TP_225078_20200219_20200225_01_T1_angles.tif', >>> band_names=['zenith', 'azimuth'] >>> ) as ang: >>> src = ( >>> xr.where( >>> src == 0, -32768, src >>> ) >>> .astype('int16') >>> .assign_attrs(**src.attrs) >>> ) >>> >>> src.gw.to_netcdf( >>> 'filename.nc', >>> ang.astype('int16'), >>> zlib=True, >>> complevel=5, >>> _FillValue=-32768 >>> ) >>> >>> # Open the data and convert to a DataArray >>> with xr.open_dataset( >>> 'filename.nc', engine='h5netcdf', chunks=256 >>> ) as ds: >>> src = ds.to_array(dim='band')
- to_raster(filename, readxsize=None, readysize=None, separate=False, out_block_type='gtiff', keep_blocks=False, verbose=0, overwrite=False, gdal_cache=512, scheduler='processes', n_jobs=1, n_workers=None, n_threads=None, n_chunks=None, overviews=False, resampling='nearest', driver='GTiff', nodata=None, blockxsize=512, blockysize=512, tags=None, **kwargs)[source]#
Writes an Xarray DataArray to a raster file.
Note
We advise using
save()in place of this method.- Parameters:
filename (str) – The output file name to write to.
readxsize (Optional[int]) – The size of column chunks to read. If not given,
readxsizedefaults to Dask chunk size.readysize (Optional[int]) – The size of row chunks to read. If not given,
readysizedefaults to Dask chunk size.separate (Optional[bool]) – Whether to write blocks as separate files. Otherwise, write to a single file.
out_block_type (Optional[str]) – The output block type. Choices are [‘gtiff’, ‘zarr’]. Only used if
separate=True.keep_blocks (Optional[bool]) – Whether to keep the blocks stored on disk. Only used if
separate=True.verbose (Optional[int]) – The verbosity level.
overwrite (Optional[bool]) – Whether to overwrite an existing file.
gdal_cache (Optional[int]) – The
GDALcache size (in MB).scheduler (Optional[str]) – The
concurrent.futuresscheduler to use. Choices are [‘processes’, ‘threads’].n_jobs (Optional[int]) – The total number of parallel jobs.
n_workers (Optional[int]) – The number of processes.
n_threads (Optional[int]) – The number of threads.
n_chunks (Optional[int]) – The chunk size of windows. If not given, equal to
n_workersx 3.overviews (Optional[bool or list]) – Whether to build overview layers.
resampling (Optional[str]) – The resampling method for overviews when
overviewsisTrueor alist. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].driver (Optional[str]) – The raster driver.
nodata (Optional[int]) – A ‘no data’ value.
blockxsize (Optional[int]) – The output x block size. Ignored if
separate=True.blockysize (Optional[int]) – The output y block size. Ignored if
separate=True.tags (Optional[dict]) – Image tags to write to file.
kwargs (Optional[dict]) – Additional keyword arguments to pass to
rasterio.write.
- Return type:
None- Returns:
None
Examples
>>> import geowombat as gw >>> >>> # Use dask.compute() >>> with gw.open('input.tif') as ds: >>> ds.gw.to_raster('output.tif', n_jobs=8) >>> >>> # Use a dask client >>> with gw.open('input.tif') as ds: >>> ds.gw.to_raster('output.tif', use_client=True, n_workers=8, n_threads=4) >>> >>> # Compress the output >>> with gw.open('input.tif') as ds: >>> ds.gw.to_raster('output.tif', n_jobs=8, compress='lzw')
- to_vector(filename, mask=None, connectivity=4)[source]#
Writes an Xarray DataArray to a vector file.
- Parameters:
filename (str) – The output file name to write to.
mask (numpy ndarray or rasterio Band object, optional) – Must evaluate to bool (
rasterio.bool_orrasterio.uint8). Values of False or 0 will be excluded from feature generation. Note well that this is the inverse sense from Numpy’s, where a mask value of True indicates invalid data in an array. If source is a Numpy masked array and mask is None, the source’s mask will be inverted and used in place of mask.connectivity (Optional[int]) – Use 4 or 8 pixel connectivity for grouping pixels into features.
- Return type:
None- Returns:
None
- to_vrt(filename, overwrite=False, resampling=None, nodata=None, init_dest_nodata=True, warp_mem_limit=128)[source]#
Writes a file to a VRT file.
- Parameters:
filename (str | Path) – The output file name to write to.
overwrite (Optional[bool]) – Whether to overwrite an existing VRT file.
resampling (Optional[object]) – The resampling algorithm for
rasterio.vrt.WarpedVRT.nodata (Optional[float or int]) – The ‘no data’ value for
rasterio.vrt.WarpedVRT.init_dest_nodata (Optional[bool]) – Whether or not to initialize output to
nodataforrasterio.vrt.WarpedVRT.warp_mem_limit (Optional[int]) – The GDAL memory limit for
rasterio.vrt.WarpedVRT.
- Return type:
None
Examples
>>> import geowombat as gw >>> from rasterio.enums import Resampling >>> >>> # Transform a CRS and save to VRT >>> with gw.config.update(ref_crs=102033): >>> with gw.open('image.tif') as src: >>> src.gw.to_vrt( >>> 'output.vrt', >>> resampling=Resampling.cubic, >>> warp_mem_limit=256 >>> ) >>> >>> # Load multiple files set to a common geographic extent >>> bounds = (left, bottom, right, top) >>> with gw.config.update(ref_bounds=bounds): >>> with gw.open( >>> ['image1.tif', 'image2.tif'], mosaic=True >>> ) as src: >>> src.gw.to_vrt('output.vrt')
- transform_crs(dst_crs=None, dst_res=None, dst_width=None, dst_height=None, dst_bounds=None, src_nodata=None, dst_nodata=None, coords_only=False, resampling='nearest', warp_mem_limit=512, num_threads=1)[source]#
Transforms an
xarray.DataArrayto a new coordinate reference system.- Parameters:
dst_crs (Optional[CRS | int | dict | str]) – The destination CRS.
dst_res (Optional[tuple]) – The destination resolution.
dst_width (Optional[int]) – The destination width. Cannot be used with
dst_res.dst_height (Optional[int]) – The destination height. Cannot be used with
dst_res.dst_bounds (Optional[BoundingBox | tuple]) – The destination bounds, as a
rasterio.coords.BoundingBoxor as a tuple of (left, bottom, right, top).src_nodata (Optional[int | float]) – The source nodata value. Pixels with this value will not be used for interpolation. If not set, it will default to the nodata value of the source image if a masked ndarray or rasterio band, if available.
dst_nodata (Optional[int | float]) – The nodata value used to initialize the destination; it will remain in all areas not covered by the reprojected source. Defaults to the nodata value of the destination image (if set), the value of src_nodata, or 0 (GDAL default).
coords_only (Optional[bool]) – Whether to return transformed coordinates. If
coords_only=Truethen the array is not warped and the size is unchanged. It also avoids in-memory computations.resampling (Optional[str]) – The resampling method if
filenameis alist. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].warp_mem_limit (Optional[int]) – The warp memory limit.
num_threads (Optional[int]) – The number of parallel threads.
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> dst = src.gw.transform_crs(4326)
- wi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the woody vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[WI = \Biggl \lbrace { 0,\text{ if } { red + SWIR1 \ge 0.5 } \atop 1 - \frac{red + SWIR1}{0.5}, \text{ otherwise } }\]- Returns:
Data range: 0 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- windows(row_chunks=None, col_chunks=None, return_type='window', ndim=2)[source]#
Generates windows for a row/column iteration.
- Parameters:
row_chunks (Optional[int]) – The row chunk size. If not given, defaults to opened DataArray chunks.
col_chunks (Optional[int]) – The column chunk size. If not given, defaults to opened DataArray chunks.
return_type (Optional[str]) – The data to return. Choices are [‘data’, ‘slice’, ‘window’].
ndim (Optional[int]) – The number of required dimensions if
return_type= ‘data’ or ‘slice’.
- Returns:
yieldsxarray.DataArray,tuple, orrasterio.windows.Window
- apply(filename, user_func, n_jobs=1, **kwargs)[source]#
Applies a user function to an Xarray Dataset or DataArray and writes to file.
- Parameters:
filename (str | Path) – The output file name to write to.
user_func (func) – The user function to apply.
n_jobs (Optional[int]) – The number of parallel jobs for the cluster.
kwargs (Optional[dict]) – Keyword arguments passed to
to_raster().
Example
>>> import geowombat as gw >>> >>> def user_func(ds_): >>> return ds_.max(axis=0) >>> >>> with gw.open('image.tif', chunks=512) as ds: >>> ds.gw.apply( >>> 'output.tif', >>> user_func, >>> n_jobs=8, >>> overwrite=True, >>> blockxsize=512, >>> blockysize=512 >>> )
- assign_nodata_attrs(nodata)[source]#
Assigns ‘no data’ attributes.
- Parameters:
nodata (float | int) – The ‘no data’ value to assign.
- Return type:
DataArray- Returns:
xarray.DataArray
- avi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the advanced vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[AVI = {(NIR \times (1.0 - red) \times (NIR - red))}^{0.3334}\]- Returns:
Data range: 0 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- band_mask(valid_bands, src_nodata=None, dst_clear_val=0, dst_mask_val=1)[source]#
Creates a mask from band nonzeros.
- Parameters:
valid_bands (list) – The bands considered valid.
src_nodata (Optional[float | int]) – The source ‘no data’ value.
dst_clear_val (Optional[int]) – The destination clear value.
dst_mask_val (Optional[int]) – The destination mask value.
- Return type:
DataArray- Returns:
xarray.DataArray
- bounds_overlay(bounds, how='intersects')[source]#
Checks whether the bounds overlay the image bounds.
- Parameters:
bounds (tuple | rasterio.coords.BoundingBox | shapely.geometry) – The bounds to check. If given as a tuple, the order should be (left, bottom, right, top).
how (Optional[str]) – Choices are any
shapely.geometrybinary predicates.
- Return type:
bool- Returns:
bool
Example
>>> import geowombat as gw >>> >>> bounds = (left, bottom, right, top) >>> >>> with gw.open('image.tif') as src >>> intersects = src.gw.bounds_overlay(bounds) >>> >>> from rasterio.coords import BoundingBox >>> >>> bounds = BoundingBox(left, bottom, right, top) >>> >>> with gw.open('image.tif') as src >>> contains = src.gw.bounds_overlay(bounds, how='contains')
- calc_area(values, op='eq', units='km2', row_chunks=None, col_chunks=None, n_workers=1, n_threads=1, scheduler='threads', n_chunks=100)[source]#
Calculates the area of data values.
- Parameters:
values (list) – A list of values.
op (Optional[str]) – The value sign. Choices are [‘gt’, ‘ge’, ‘lt’, ‘le’, ‘eq’].
units (Optional[str]) – The units to return. Choices are [‘km2’, ‘ha’].
row_chunks (Optional[int]) – The row chunk size to process in parallel.
col_chunks (Optional[int]) – The column chunk size to process in parallel.
n_workers (Optional[int]) – The number of parallel workers for
scheduler.n_threads (Optional[int]) – The number of parallel threads for
dask.compute().scheduler (Optional[str]) –
The parallel task scheduler to use. Choices are [‘processes’, ‘threads’, ‘mpool’].
mpool: process pool of workers using
multiprocessing.Poolprocesses: process pool of workers usingconcurrent.futuresthreads: thread pool of workers usingconcurrent.futuresn_chunks (Optional[int]) – The chunk size of windows. If not given, equal to
n_workersx 50.
- Return type:
DataFrame- Returns:
pandas.DataFrame
Example
>>> import geowombat as gw >>> >>> # Read a land cover image with 512x512 chunks >>> with gw.open('land_cover.tif', chunks=512) as src: >>> >>> df = src.gw.calc_area( >>> [1, 2, 5], # calculate the area of classes 1, 2, and 5 >>> units='km2', # return area in kilometers squared >>> n_workers=4, >>> row_chunks=1024, # iterate over larger chunks to use 512 chunks in parallel >>> col_chunks=1024 >>> )
- check_chunksize(chunksize, array_size)[source]#
Ensures the chunk size is a multiple of 16 and fits within the array dimension.
- Parameters:
chunksize (int) – The chunk size to check.
array_size (int) – The array dimension size to check against.
- Return type:
int- Returns:
int
- clip(df, query=None, mask_data=False, expand_by=0)[source]#
Clips a DataArray by vector polygon geometry.
Deprecated since version 2.1.7: Use
xarray.DataArray.gw.clip_by_polygon().- Parameters:
df (GeoDataFrame) – The
geopandas.GeoDataFrameto clip to.query (Optional[str]) – A query to apply to
df.mask_data (Optional[bool]) – Whether to mask values outside of the
dfgeometry envelope.expand_by (Optional[int]) – Expand the clip array bounds by
expand_bypixels on each side.
- Returns:
xarray.DataArray
- clip_by_polygon(df, query=None, mask_data=False, expand_by=0)[source]#
Clips a DataArray by vector polygon geometry.
- Parameters:
df (GeoDataFrame) – The
geopandas.GeoDataFrameto clip to.query (Optional[str]) – A query to apply to
df.mask_data (Optional[bool]) – Whether to mask values outside of the
dfgeometry envelope.expand_by (Optional[int]) – Expand the clip array bounds by
expand_bypixels on each side.
- Return type:
DataArray- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as ds: >>> ds = ds.gw.clip_by_polygon(df, query="Id == 1")
- compare(op, b, return_binary=False)[source]#
Comparison operation.
- Parameters:
op (str) – The comparison operation.
b (float | int) – The value to compare to.
return_binary (Optional[bool]) – Whether to return a binary (1 or 0) array.
- Returns:
Valid data where
opmeets criteriab, otherwise nans.- Return type:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> # Mask all values greater than 10 >>> thresh = src.gw.compare(op='lt', b=10)
- compute(**kwargs)[source]#
Computes data.
- Parameters:
kwargs (Optional[dict]) – Keyword arguments to pass to
dask.compute().- Return type:
ndarray- Returns:
numpy.ndarray
- property data_are_separate: bool#
Checks whether the data are loaded separately.
- Returns:
bool
- property data_are_stacked: bool#
Checks whether the data are stacked.
- Returns:
bool
- detect(detector, **kwargs)[source]#
Run tiled, georeferenced object detection over this raster.
Thin accessor over
detector.predict(src, **kwargs)so that detection follows the samesrc.gw.<method>(...)shape assrc.gw.ndvi()orsrc.gw.extract(). The detector instance is built once (loading model weights) and passed in.- Parameters:
detector – A
YOLODetectororTorchGeoDetectorinstance (seegeowombat.detect).**kwargs – Forwarded to
detector.predict— typical args aretile_size,overlap,conf,band_indices,scale,nms_iou,max_det,progress. Ifband_indicesis omitted it is resolved from the activegw.config(sensor=...)band names.
- Returns:
geopandas.GeoDataFrame— one row per detection withgeometry,class_id,class_name,scoreandtile_idcolumns, in the source CRS.
Examples
>>> import geowombat as gw >>> from geowombat.detect import YOLODetector >>> >>> det = YOLODetector(weights='yolov8n.pt') >>> with gw.config.update(sensor='rgb'): ... with gw.open('aerial.tif', chunks=512) as src: ... preds = src.gw.detect(det, conf=0.25)
- evi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the enhanced vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[EVI = 2.5 \times \frac{NIR - red}{NIR \times 6 \times red - 7.5 \times blue + 1}\]- Returns:
Data range: 0 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- evi2(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the two-band modified enhanced vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[EVI2 = 2.5 \times \frac{NIR - red}{NIR + 1 + 2.4 \times red}\]- Returns:
Data range: 0 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- extract(aoi, bands=None, time_names=None, band_names=None, frac=1.0, min_frac_area=None, all_touched=False, id_column='id', time_format='%Y%m%d', mask=None, n_jobs=8, verbose=0, n_workers=1, n_threads=-1, use_client=False, address=None, total_memory=24, processes=False, pool_kwargs=None, **kwargs)[source]#
Extracts data within an area or points of interest. Projections do not need to match, as they are handled ‘on-the-fly’.
- Parameters:
aoi (str or GeoDataFrame) – A file or
geopandas.GeoDataFrameto extract data frame.bands (Optional[int or 1d array-like]) – A band or list of bands to extract. If not given, all bands are used. Bands should be GDAL-indexed (i.e., the first band is 1, not 0).
band_names (Optional[list]) – A list of band names. Length should be the same as bands.
time_names (Optional[list]) – A list of time names.
frac (Optional[float]) – A fractional subset of points to extract in each polygon feature.
min_frac_area (Optional[int | float]) – A minimum polygon area to use
frac. Otherwise, use all samples within a polygon.all_touched (Optional[bool]) – The
all_touchedargument is passed torasterio.features.rasterize().id_column (Optional[str]) – The id column name.
time_format (Optional[str]) – The
datetimeconversion format iftime_namesaredatetimeobjects.mask (Optional[GeoDataFrame or Shapely Polygon]) – A
shapely.geometry.Polygonmask to subset to.n_jobs (Optional[int]) – The number of features to rasterize in parallel.
verbose (Optional[int]) – The verbosity level.
n_workers (Optional[int]) – The number of process workers. Only applies when
use_client=True.n_threads (Optional[int]) – The number of thread workers. Only applies when
use_client=True.use_client (Optional[bool]) – Whether to use a
daskclient.address (Optional[str]) – A cluster address to pass to client. Only used when
use_client=True.total_memory (Optional[int]) – The total memory (in GB) required when
use_client=True.processes (Optional[bool]) – Whether to use process workers with the
dask.distributedclient. Only applies whenuse_client=True.pool_kwargs (Optional[dict]) – Keyword arguments passed to
multiprocessing.Pool().imap().kwargs (Optional[dict]) – Keyword arguments passed to
dask.compute().
- Return type:
GeoDataFrame- Returns:
geopandas.GeoDataFrame
Examples
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> df = src.gw.extract('poly.gpkg') >>> >>> # On a cluster >>> # Use a local cluster >>> with gw.open('image.tif') as src: >>> df = src.gw.extract('poly.gpkg', use_client=True, n_threads=16) >>> >>> # Specify the client address with a local cluster >>> with LocalCluster( >>> n_workers=1, >>> threads_per_worker=8, >>> scheduler_port=0, >>> processes=False, >>> memory_limit='4GB' >>> ) as cluster: >>> >>> with gw.open('image.tif') as src: >>> df = src.gw.extract( >>> 'poly.gpkg', >>> use_client=True, >>> address=cluster >>> )
- property filenames: Sequence[str | Path]#
Gets the data filenames.
- Returns:
list
- gcvi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the green chlorophyll vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[GCVI = \frac{NIR}{green} - 1\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- imshow(mask=False, nodata=0, flip=False, text_color='black', rot=30, **kwargs)[source]#
Shows an image on a plot.
- Parameters:
mask (Optional[bool]) – Whether to mask ‘no data’ values (given by
nodata).nodata (Optional[int or float]) – The ‘no data’ value.
flip (Optional[bool]) – Whether to flip an RGB array’s band order.
text_color (Optional[str]) – The text color.
rot (Optional[int]) – The degree rotation for the x-axis tick labels.
kwargs (Optional[dict]) – Keyword arguments passed to
xarray.plot.imshow.
- Return type:
None- Returns:
None
Examples
>>> with gw.open('image.tif') as ds: >>> ds.gw.imshow(band_names=['red', 'green', 'red'], mask=True, vmin=0.1, vmax=0.9, robust=True)
- kndvi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the kernel normalized difference vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[kNDVI = tanh({NDVI}^2)\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- mask(df, query=None, keep='in')[source]#
Masks a DataArray.
- Parameters:
df (GeoDataFrame or str) – The
geopandas.GeoDataFrameor filename to use for masking.query (Optional[str]) – A query to apply to
df.keep (Optional[str]) – If
keep= ‘in’, mask values outside of the geometry (keep inside). Otherwise, ifkeep= ‘out’, mask values inside (keep outside).
- Return type:
DataArray- Returns:
xarray.DataArray
- mask_nodata()[source]#
Masks ‘no data’ values with nans.
- Return type:
DataArray- Returns:
xarray.DataArray
- match_data(data, band_names)[source]#
Coerces the
xarray.DataArrayto match anotherxarray.DataArray.- Parameters:
data (DataArray) – The
xarray.DataArrayto match to.band_names (1d array-like) – The output band names.
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> import xarray as xr >>> >>> other_array = xr.DataArray() >>> >>> with gw.open('image.tif') as src: >>> new_array = other_array.gw.match_data(src, ['bd1'])
- moving(stat='mean', perc=50, w=3, nodata=None, weights=False)[source]#
Applies a moving window function to the DataArray.
- Parameters:
stat (Optional[str]) – The statistic to compute. Choices are [‘mean’, ‘std’, ‘var’, ‘min’, ‘max’, ‘perc’].
perc (Optional[int]) – The percentile to return if
stat= ‘perc’.w (Optional[int]) – The moving window size (in pixels).
nodata (Optional[int or float]) – A ‘no data’ value to ignore.
weights (Optional[bool]) – Whether to weight values by distance from window center.
- Return type:
DataArray- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> >>> # Calculate the mean within a 5x5 window >>> with gw.open('image.tif') as src: >>> res = src.gw.moving(stat='mean', w=5, nodata=32767.0) >>> >>> # Calculate the 90th percentile within a 15x15 window >>> with gw.open('image.tif') as src: >>> res = src.gw.moving(stat='perc', w=15, perc=90, nodata=32767.0) >>> res.data.compute(num_workers=4)
- n_windows(row_chunks=None, col_chunks=None)[source]#
Calculates the number of windows in a row/column iteration.
- Parameters:
row_chunks (Optional[int]) – The row chunk size. If not given, defaults to opened DataArray chunks.
col_chunks (Optional[int]) – The column chunk size. If not given, defaults to opened DataArray chunks.
- Return type:
int- Returns:
int
- nbr(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the normalized burn ratio
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[NBR = \frac{NIR - SWIR1}{NIR + SWIR1}\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- ndvi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the normalized difference vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[NDVI = \frac{NIR - red}{NIR + red}\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- norm_brdf(solar_za, solar_az, sensor_za, sensor_az, sensor=None, wavelengths=None, nodata=None, mask=None, scale_factor=1.0, scale_angles=True)[source]#
Applies Bidirectional Reflectance Distribution Function (BRDF) normalization.
- Parameters:
solar_za (2d DataArray) – The solar zenith angles (degrees).
solar_az (2d DataArray) – The solar azimuth angles (degrees).
sensor_za (2d DataArray) – The sensor azimuth angles (degrees).
sensor_az (2d DataArray) – The sensor azimuth angles (degrees).
sensor (Optional[str]) – The satellite sensor.
wavelengths (str list) – The wavelength(s) to normalize.
nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[DataArray]) – A data mask, where clear values are 0.
scale_factor (Optional[float]) – A scale factor to apply to the input data.
scale_angles (Optional[bool]) – Whether to scale the pixel angle arrays.
- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> >>> # Example where pixel angles are stored in separate GeoTiff files >>> with gw.config.update(sensor='l7', scale_factor=0.0001, nodata=0): >>> >>> with gw.open('solarz.tif') as solarz, >>> gw.open('solara.tif') as solara, >>> gw.open('sensorz.tif') as sensorz, >>> gw.open('sensora.tif') as sensora: >>> >>> with gw.open('landsat.tif') as ds: >>> ds_brdf = ds.gw.norm_brdf(solarz, solara, sensorz, sensora)
- norm_diff(b1, b2, nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the normalized difference band ratio.
- Parameters:
b1 (str) – The band name of the first band.
b2 (str) – The band name of the second band.
nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[{norm}_{diff} = \frac{b2 - b1}{b2 + b1}\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray- Parameters:
b1 (Any) –
b2 (Any) –
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- read(band, **kwargs)[source]#
Reads data for a band or bands.
- Parameters:
band (int | list) – A band or list of bands to read.
- Return type:
ndarray- Returns:
xarray.DataArray
- recode(polygon, to_replace, num_workers=1)[source]#
Recodes a DataArray with polygon mappings.
- Parameters:
polygon (GeoDataFrame | str) – The
geopandas.DataFrameor file with polygon geometry.to_replace (dict) –
How to find the values to replace. Dictionary mappings should be given as {from: to} pairs. If
to_replaceis an integer/string mapping, the to string should be ‘mode’.- {1: 5}:
recode values of 1 to 5
- {1: ‘mode’}:
recode values of 1 to the polygon mode
num_workers (Optional[int]) – The number of parallel Dask workers (only used if
to_replacehas a ‘mode’ mapping).
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif', chunks=512) as ds: >>> # Recode 1 with 5 within a polygon >>> res = ds.gw.recode('poly.gpkg', {1: 5})
- replace(to_replace)[source]#
Replace values given in
to_replacewith value.- Parameters:
to_replace (dict) –
How to find the values to replace. Dictionary mappings should be given as {from: to} pairs. If
to_replaceis an integer/string mapping, the to string should be ‘mode’.- {1: 5}:
recode values of 1 to 5
- {1: ‘mode’}:
recode values of 1 to the polygon mode
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif', chunks=512) as ds: >>> # Replace 1 with 5 >>> res = ds.gw.replace({1: 5})
- sample(method='random', band=None, n=None, strata=None, spacing=None, min_dist=None, max_attempts=10, num_workers=1, verbose=1, **kwargs)[source]#
Generates samples from a raster.
- Parameters:
data (DataArray) – The
xarray.DataArrayto extract data from.method (Optional[str]) – The sampling method. Choices are [‘random’, ‘systematic’].
band (Optional[int or str]) – The band name to extract from. Only required if
method= ‘random’ andstratais given.n (Optional[int]) – The total number of samples. Only required if
method= ‘random’.strata (Optional[dict]) –
The strata to sample within. The dictionary key–>value pairs should be {‘conditional,value’: proportion}.
E.g.,
strata = {‘==,1’: 0.5, ‘>=,2’: 0.5} … would sample 50% of total samples within class 1 and 50% of total samples in class >= 2.
strata = {‘==,1’: 10, ‘>=,2’: 20} … would sample 10 samples within class 1 and 20 samples in class >= 2.
spacing (Optional[float]) – The spacing (in map projection units) when
method= ‘systematic’.min_dist (Optional[float or int]) – A minimum distance allowed between samples. Only applies when
method= ‘random’.max_attempts (Optional[int]) – The maximum numer of attempts to sample points >
min_distfrom each other.num_workers (Optional[int]) – The number of parallel workers for
dask.compute().verbose (Optional[int]) – The verbosity level.
kwargs (Optional[dict]) – Keyword arguments passed to
geowombat.extract.
- Return type:
GeoDataFrame- Returns:
geopandas.GeoDataFrame
Examples
>>> import geowombat as gw >>> >>> # Sample 100 points randomly across the image >>> with gw.open('image.tif') as ds: >>> df = ds.gw.sample(n=100) >>> >>> # Sample points systematically (with 10km spacing) across the image >>> with gw.open('image.tif') as ds: >>> df = ds.gw.sample(method='systematic', spacing=10000.0) >>> >>> # Sample 50% of 100 in class 1 and 50% in classes >= 2 >>> strata = {'==,1': 0.5, '>=,2': 0.5} >>> with gw.open('image.tif') as ds: >>> df = ds.gw.sample(band=1, n=100, strata=strata) >>> >>> # Specify a per-stratum minimum allowed point distance of 1,000 meters >>> with gw.open('image.tif') as ds: >>> df = ds.gw.sample(band=1, n=100, min_dist=1000, strata=strata)
- save(filename, overwrite=False, scatter=None, client=None, compute=True, tags=None, compress='none', compression=None, num_workers=1, log_progress=True, tqdm_kwargs=None, bigtiff=None)[source]#
Saves a DataArray to raster using rasterio/dask.
- Parameters:
filename (str | Path) – The output file name to write to.
nodata (Optional[float | int]) – The ‘no data’ value. If
None(default), the ‘no data’ value is taken from theDataArraymetadata.overwrite (Optional[bool]) – Whether to overwrite an existing file. Default is False.
scatter (Optional[str]) – Scatter ‘band’ or ‘time’ to separate file. Default is None.
client (Optional[Client object]) – A
dask.distributed.Clientclient object to persist data. Default is None.compute (Optinoal[bool]) – Whether to compute and write to
filename. Otherwise, return thedasktask graph. IfTrue, compute and write tofilename. IfFalse, return thedasktask graph. Default isTrue.tags (Optional[dict]) – Metadata tags to write to file. Default is None.
compress (Optional[str]) –
The file compression type. Default is ‘none’, or no compression.
Note
When using a client, it is advised to use threading. E.g.,
dask.distributed.LocalCluster(processes=False). Process-based concurrency could result in corrupted file blocks.compression (Optional[str]) –
The file compression type. Default is ‘none’, or no compression.
Deprecated since version 2.1.4: Use ‘compress’ – ‘compression’ will be removed in >=2.2.0.
num_workers (Optional[int]) – The number of dask workers (i.e., chunks) to write concurrently. Default is 1.
log_progress (Optional[bool]) – Whether to log the progress bar during writing. Default is True.
tqdm_kwargs (Optional[dict]) – Keyword arguments to pass to
tqdm.bigtiff (Optional[str]) – A GDAL BIGTIFF flag. Choices are [“YES”, “NO”, “IF_NEEDED”, “IF_SAFER”].
- Return type:
None- Returns:
None, writes tofilename
Example
>>> import geowombat as gw >>> >>> with gw.open('file.tif') as src: >>> result = ... >>> result.gw.save('output.tif', compress='lzw', num_workers=8)
- set_nodata(src_nodata=None, dst_nodata=None, out_range=None, dtype=None, scale_factor=None, offset=None)[source]#
Sets ‘no data’ values and applies scaling to an
xarray.DataArray.- Parameters:
src_nodata (int | float) – The ‘no data’ values to replace. Default is
None.dst_nodata (int | float) – The ‘no data’ value to set. Default is
nan.out_range (Optional[tuple]) – The output clip range. Default is
None.dtype (Optional[str]) – The output data type. Default is
None.scale_factor (Optional[float | int]) – A scale factor to apply. Default is
None.offset (Optional[float | int]) – An offset to apply. Default is
None.
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> src = src.gw.set_nodata(0, 65535, out_range=(0, 10000), dtype='uint16')
- subset(left=None, top=None, right=None, bottom=None, rows=None, cols=None, center=False, mask_corners=False)[source]#
Subsets a DataArray.
- Parameters:
left (Optional[float]) – The left coordinate.
top (Optional[float]) – The top coordinate.
right (Optional[float]) – The right coordinate.
bottom (Optional[float]) – The bottom coordinate.
rows (Optional[int]) – The number of output rows.
cols (Optional[int]) – The number of output rows.
center (Optional[bool]) – Whether to center the subset on
leftandtop.mask_corners (Optional[bool]) – Whether to mask corners (requires
pymorph).chunksize (Optional[tuple]) – A new chunk size for the output.
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif', chunks=512) as ds: >>> ds_sub = ds.gw.subset( >>> left=-263529.884, >>> top=953985.314, >>> rows=2048, >>> cols=2048 >>> )
- tasseled_cap(nodata=None, sensor=None, scale_factor=1.0)[source]#
Applies a tasseled cap transformation
- Parameters:
nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> >>> with gw.config.update(sensor='qb', scale_factor=0.0001): >>> with gw.open( >>> 'image.tif', band_names=['blue', 'green', 'red', 'nir'] >>> ) as ds: >>> tcap = ds.gw.tasseled_cap()
- to_netcdf(filename, *args, **kwargs)[source]#
Writes an Xarray DataArray to a NetCDF file.
- Parameters:
filename (Path | str) – The output file name to write to.
args (DataArray) – Additional
DataArraysto stack.kwargs (dict) – Encoding arguments.
- Return type:
None
Examples
>>> import geowombat as gw >>> import xarray as xr >>> >>> # Write a single DataArray to a .nc file >>> with gw.config.update(sensor='l7'): >>> with gw.open('LC08_L1TP_225078_20200219_20200225_01_T1.tif') as src: >>> src.gw.to_netcdf('filename.nc', zlib=True, complevel=5) >>> >>> # Add extra layers >>> with gw.config.update(sensor='l7'): >>> with gw.open( >>> 'LC08_L1TP_225078_20200219_20200225_01_T1.tif' >>> ) as src, gw.open( >>> 'LC08_L1TP_225078_20200219_20200225_01_T1_angles.tif', >>> band_names=['zenith', 'azimuth'] >>> ) as ang: >>> src = ( >>> xr.where( >>> src == 0, -32768, src >>> ) >>> .astype('int16') >>> .assign_attrs(**src.attrs) >>> ) >>> >>> src.gw.to_netcdf( >>> 'filename.nc', >>> ang.astype('int16'), >>> zlib=True, >>> complevel=5, >>> _FillValue=-32768 >>> ) >>> >>> # Open the data and convert to a DataArray >>> with xr.open_dataset( >>> 'filename.nc', engine='h5netcdf', chunks=256 >>> ) as ds: >>> src = ds.to_array(dim='band')
- to_polygon(mask=None, connectivity=4)[source]#
Converts a
daskarray to aGeoDataFrame- Parameters:
mask (Optional[numpy ndarray or rasterio Band object]) – Must evaluate to bool (
rasterio.bool_orrasterio.uint8). Values of False or 0 will be excluded from feature generation. Note well that this is the inverse sense from Numpy’s, where a mask value of True indicates invalid data in an array. If source is a Numpy masked array and mask is None, the source’s mask will be inverted and used in place of mask.connectivity (Optional[int]) – Use 4 or 8 pixel connectivity for grouping pixels into features.
- Return type:
GeoDataFrame- Returns:
geopandas.GeoDataFrame
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> >>> # Convert the input image to a GeoDataFrame >>> df = src.gw.to_polygon(mask='source', num_workers=8)
- to_raster(filename, readxsize=None, readysize=None, separate=False, out_block_type='gtiff', keep_blocks=False, verbose=0, overwrite=False, gdal_cache=512, scheduler='processes', n_jobs=1, n_workers=None, n_threads=None, n_chunks=None, overviews=False, resampling='nearest', driver='GTiff', nodata=None, blockxsize=512, blockysize=512, tags=None, **kwargs)[source]#
Writes an Xarray DataArray to a raster file.
Note
We advise using
save()in place of this method.- Parameters:
filename (str) – The output file name to write to.
readxsize (Optional[int]) – The size of column chunks to read. If not given,
readxsizedefaults to Dask chunk size.readysize (Optional[int]) – The size of row chunks to read. If not given,
readysizedefaults to Dask chunk size.separate (Optional[bool]) – Whether to write blocks as separate files. Otherwise, write to a single file.
out_block_type (Optional[str]) – The output block type. Choices are [‘gtiff’, ‘zarr’]. Only used if
separate=True.keep_blocks (Optional[bool]) – Whether to keep the blocks stored on disk. Only used if
separate=True.verbose (Optional[int]) – The verbosity level.
overwrite (Optional[bool]) – Whether to overwrite an existing file.
gdal_cache (Optional[int]) – The
GDALcache size (in MB).scheduler (Optional[str]) – The
concurrent.futuresscheduler to use. Choices are [‘processes’, ‘threads’].n_jobs (Optional[int]) – The total number of parallel jobs.
n_workers (Optional[int]) – The number of processes.
n_threads (Optional[int]) – The number of threads.
n_chunks (Optional[int]) – The chunk size of windows. If not given, equal to
n_workersx 3.overviews (Optional[bool or list]) – Whether to build overview layers.
resampling (Optional[str]) – The resampling method for overviews when
overviewsisTrueor alist. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].driver (Optional[str]) – The raster driver.
nodata (Optional[int]) – A ‘no data’ value.
blockxsize (Optional[int]) – The output x block size. Ignored if
separate=True.blockysize (Optional[int]) – The output y block size. Ignored if
separate=True.tags (Optional[dict]) – Image tags to write to file.
kwargs (Optional[dict]) – Additional keyword arguments to pass to
rasterio.write.
- Return type:
None- Returns:
None
Examples
>>> import geowombat as gw >>> >>> # Use dask.compute() >>> with gw.open('input.tif') as ds: >>> ds.gw.to_raster('output.tif', n_jobs=8) >>> >>> # Use a dask client >>> with gw.open('input.tif') as ds: >>> ds.gw.to_raster('output.tif', use_client=True, n_workers=8, n_threads=4) >>> >>> # Compress the output >>> with gw.open('input.tif') as ds: >>> ds.gw.to_raster('output.tif', n_jobs=8, compress='lzw')
- to_vector(filename, mask=None, connectivity=4)[source]#
Writes an Xarray DataArray to a vector file.
- Parameters:
filename (str) – The output file name to write to.
mask (numpy ndarray or rasterio Band object, optional) – Must evaluate to bool (
rasterio.bool_orrasterio.uint8). Values of False or 0 will be excluded from feature generation. Note well that this is the inverse sense from Numpy’s, where a mask value of True indicates invalid data in an array. If source is a Numpy masked array and mask is None, the source’s mask will be inverted and used in place of mask.connectivity (Optional[int]) – Use 4 or 8 pixel connectivity for grouping pixels into features.
- Return type:
None- Returns:
None
- to_vrt(filename, overwrite=False, resampling=None, nodata=None, init_dest_nodata=True, warp_mem_limit=128)[source]#
Writes a file to a VRT file.
- Parameters:
filename (str | Path) – The output file name to write to.
overwrite (Optional[bool]) – Whether to overwrite an existing VRT file.
resampling (Optional[object]) – The resampling algorithm for
rasterio.vrt.WarpedVRT.nodata (Optional[float or int]) – The ‘no data’ value for
rasterio.vrt.WarpedVRT.init_dest_nodata (Optional[bool]) – Whether or not to initialize output to
nodataforrasterio.vrt.WarpedVRT.warp_mem_limit (Optional[int]) – The GDAL memory limit for
rasterio.vrt.WarpedVRT.
- Return type:
None
Examples
>>> import geowombat as gw >>> from rasterio.enums import Resampling >>> >>> # Transform a CRS and save to VRT >>> with gw.config.update(ref_crs=102033): >>> with gw.open('image.tif') as src: >>> src.gw.to_vrt( >>> 'output.vrt', >>> resampling=Resampling.cubic, >>> warp_mem_limit=256 >>> ) >>> >>> # Load multiple files set to a common geographic extent >>> bounds = (left, bottom, right, top) >>> with gw.config.update(ref_bounds=bounds): >>> with gw.open( >>> ['image1.tif', 'image2.tif'], mosaic=True >>> ) as src: >>> src.gw.to_vrt('output.vrt')
- to_yolo_dataset(labels, class_col, out_dir, tile_size=640, overlap=0.1, **kwargs)[source]#
Write a YOLO-format training dataset from this raster + labels.
Accessor wrapper around
geowombat.detect.build_datasetso users can stay inside thewith gw.open(...) as src:flow.- Parameters:
labels –
geopandas.GeoDataFrame, path, or URL of vector labels. Reprojected to the raster CRS automatically.class_col (str) – Column in
labelsholding class name/id.out_dir (str | Path) – Output directory. Ultralytics layout (
images/{train,val}+labels/{train,val}+ adata.yaml) is written under it.tile_size (int) – Tile edge in pixels. Default 640.
overlap (float) – Fractional overlap between tiles. Default 0.1.
**kwargs – Forwarded to
build_dataset— e.g.val_split,min_box_pixels,background_ratio,band_indices,scale,oriented,image_format,seed,class_names. Ifband_indicesis omitted it is resolved from the activegw.config(sensor=...).
- Returns:
dictsummary with keysout_dir,classes,n_train,n_val,n_boxes.
Examples
>>> import geopandas as gpd, geowombat as gw >>> gdf = gpd.read_file('buildings.gpkg') >>> with gw.open('naip.tif', chunks=512) as src: ... summary = src.gw.to_yolo_dataset( ... gdf, class_col='class_name', ... out_dir='./yolo', tile_size=640, ... )
- transform_crs(dst_crs=None, dst_res=None, dst_width=None, dst_height=None, dst_bounds=None, src_nodata=None, dst_nodata=None, coords_only=False, resampling='nearest', warp_mem_limit=512, num_threads=1)[source]#
Transforms an
xarray.DataArrayto a new coordinate reference system.- Parameters:
dst_crs (Optional[CRS | int | dict | str]) – The destination CRS.
dst_res (Optional[tuple]) – The destination resolution.
dst_width (Optional[int]) – The destination width. Cannot be used with
dst_res.dst_height (Optional[int]) – The destination height. Cannot be used with
dst_res.dst_bounds (Optional[BoundingBox | tuple]) – The destination bounds, as a
rasterio.coords.BoundingBoxor as a tuple of (left, bottom, right, top).src_nodata (Optional[int | float]) – The source nodata value. Pixels with this value will not be used for interpolation. If not set, it will default to the nodata value of the source image if a masked ndarray or rasterio band, if available.
dst_nodata (Optional[int | float]) – The nodata value used to initialize the destination; it will remain in all areas not covered by the reprojected source. Defaults to the nodata value of the destination image (if set), the value of src_nodata, or 0 (GDAL default).
coords_only (Optional[bool]) – Whether to return transformed coordinates. If
coords_only=Truethen the array is not warped and the size is unchanged. It also avoids in-memory computations.resampling (Optional[str]) – The resampling method if
filenameis alist. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].warp_mem_limit (Optional[int]) – The warp memory limit.
num_threads (Optional[int]) – The number of parallel threads.
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> dst = src.gw.transform_crs(4326)
- wi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the woody vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[WI = \Biggl \lbrace { 0,\text{ if } { red + SWIR1 \ge 0.5 } \atop 1 - \frac{red + SWIR1}{0.5}, \text{ otherwise } }\]- Returns:
Data range: 0 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- windows(row_chunks=None, col_chunks=None, return_type='window', ndim=2)[source]#
Generates windows for a row/column iteration.
- Parameters:
row_chunks (Optional[int]) – The row chunk size. If not given, defaults to opened DataArray chunks.
col_chunks (Optional[int]) – The column chunk size. If not given, defaults to opened DataArray chunks.
return_type (Optional[str]) – The data to return. Choices are [‘data’, ‘slice’, ‘window’].
ndim (Optional[int]) – The number of required dimensions if
return_type= ‘data’ or ‘slice’.
- Returns:
yieldsxarray.DataArray,tuple, orrasterio.windows.Window