API reference#
geowombat.core.geoxarray Module#
- class geowombat.core.geoxarray.GeoWombatAccessor(xarray_obj)[source]#
Bases:
_UpdateConfig,DataPropertiesA method to access a
xarray.DataArray. This class is typically not accessed directly, but rather through a call togeowombat.open.An
xarray.DataArrayobject will have agwmethod.To access GeoWombat methods, use
xarray.DataArray.gw.
- Attributes:
affineGet the affine transform object.
altitudeGet satellite altitudes (in km)
array_is_daskGet whether the array is a Dask array.
avail_sensorsGet supported sensors.
band_chunksGet the band chunk size.
bottomGet the array bounding box bottom coordinate.
boundsGet the array bounding box (left, bottom, right, top)
bounds_as_namedtupleGet the array bounding box as a
rasterio.coords.BoundingBoxcellxGet the cell size in the x direction.
cellxhGet the half width of the cell size in the x direction.
cellyGet the cell size in the y direction.
cellyhGet the half width of the cell size in the y direction.
central_umGet a dictionary of central wavelengths (in micrometers)
chunk_gridGet the image chunk grid.
col_chunksGet the column chunk size.
crs_to_pyprojGet the CRS as a
pyproj.CRSobject.data_are_separateChecks whether the data are loaded separately.
data_are_stackedChecks whether the data are stacked.
dtypeGet the data type of the DataArray.
filenamesGets the data filenames.
footprint_gridGet the image footprint grid.
geodataframeGet a
geopandas.GeoDataFrameof the array bounds.geometryGet the polygon geometry of the array bounding box.
has_bandCheck whether the DataArray has a band attribute.
has_band_coordCheck whether the DataArray has a band coordinate.
has_band_dimCheck whether the DataArray has a band dimension.
has_timeCheck whether the DataArray has a time attribute.
has_time_coordCheck whether the DataArray has a time coordinate.
has_time_dimCheck whether the DataArray has a time dimension.
leftGet the array bounding box left coordinate.
metaGet the array metadata.
nbandsGet the number of array bands.
ncolsGet the number of array columns.
ndimsGet the number of array dimensions.
nodatavalGet the ‘no data’ value from the attributes.
nrowsGet the number of array rows.
ntimeGet the number of time dimensions.
offsetvalGet the offset value.
pydatetimeGet Python datetime objects from the time dimension.
rightGet the array bounding box right coordinate.
row_chunksGet the row chunk size.
scalevalGet the scale factor value.
sensor_namesGet sensor full names.
time_chunksGet the time chunk size.
topGet the array bounding box top coordinate.
transformGet the data transform (cell x, 0, left, 0, cell y, top)
unary_unionGet a representation of the union of the image bounds.
wavelengthsGet a dictionary of sensor wavelengths.
Methods
apply(filename, user_func[, n_jobs])Applies a user function to an Xarray Dataset or DataArray and writes to file.
assign_nodata_attrs(nodata)Assigns 'no data' attributes.
avi([nodata, mask, sensor, scale_factor])Calculates the advanced vegetation index
band_mask(valid_bands[, src_nodata, ...])Creates a mask from band nonzeros.
bounds_overlay(bounds[, how])Checks whether the bounds overlay the image bounds.
calc_area(values[, op, units, row_chunks, ...])Calculates the area of data values.
check_chunksize(chunksize, array_size)Ensures the chunk size is a multiple of 16 and fits within the array dimension.
clip(df[, query, mask_data, expand_by])Clips a DataArray by vector polygon geometry.
clip_by_polygon(df[, query, mask_data, ...])Clips a DataArray by vector polygon geometry.
compare(op, b[, return_binary])Comparison operation.
compute(**kwargs)Computes data.
detect(detector, **kwargs)Run tiled, georeferenced object detection over this raster.
evi([nodata, mask, sensor, scale_factor])Calculates the enhanced vegetation index
evi2([nodata, mask, sensor, scale_factor])Calculates the two-band modified enhanced vegetation index
extract(aoi[, bands, time_names, ...])Extracts data within an area or points of interest.
gcvi([nodata, mask, sensor, scale_factor])Calculates the green chlorophyll vegetation index
imshow([mask, nodata, flip, text_color, rot])Shows an image on a plot.
kndvi([nodata, mask, sensor, scale_factor])Calculates the kernel normalized difference vegetation index
mask(df[, query, keep])Masks a DataArray.
Masks 'no data' values with nans.
match_data(data, band_names)Coerces the
xarray.DataArrayto match anotherxarray.DataArray.moving([stat, perc, w, nodata, weights])Applies a moving window function to the DataArray.
n_windows([row_chunks, col_chunks])Calculates the number of windows in a row/column iteration.
nbr([nodata, mask, sensor, scale_factor])Calculates the normalized burn ratio
ndvi([nodata, mask, sensor, scale_factor])Calculates the normalized difference vegetation index
norm_brdf(solar_za, solar_az, sensor_za, ...)Applies Bidirectional Reflectance Distribution Function (BRDF) normalization.
norm_diff(b1, b2[, nodata, mask, sensor, ...])Calculates the normalized difference band ratio.
read(band, **kwargs)Reads data for a band or bands.
recode(polygon, to_replace[, num_workers])Recodes a DataArray with polygon mappings.
replace(to_replace)Replace values given in
to_replacewith value.sample([method, band, n, strata, spacing, ...])Generates samples from a raster.
save(filename[, overwrite, scatter, client, ...])Saves a DataArray to raster using rasterio/dask.
set_nodata([src_nodata, dst_nodata, ...])Sets 'no data' values and applies scaling to an
xarray.DataArray.subset([left, top, right, bottom, rows, ...])Subsets a DataArray.
tasseled_cap([nodata, sensor, scale_factor])Applies a tasseled cap transformation
to_netcdf(filename, *args, **kwargs)Writes an Xarray DataArray to a NetCDF file.
to_polygon([mask, connectivity])Converts a
daskarray to aGeoDataFrameto_raster(filename[, readxsize, readysize, ...])Writes an Xarray DataArray to a raster file.
to_vector(filename[, mask, connectivity])Writes an Xarray DataArray to a vector file.
to_vrt(filename[, overwrite, resampling, ...])Writes a file to a VRT file.
to_yolo_dataset(labels, class_col, out_dir)Write a YOLO-format training dataset from this raster + labels.
transform_crs([dst_crs, dst_res, dst_width, ...])Transforms an
xarray.DataArrayto a new coordinate reference system.wi([nodata, mask, sensor, scale_factor])Calculates the woody vegetation index
windows([row_chunks, col_chunks, ...])Generates windows for a row/column iteration.
- apply(filename, user_func, n_jobs=1, **kwargs)[source]#
Applies a user function to an Xarray Dataset or DataArray and writes to file.
- Parameters:
filename (str | Path) – The output file name to write to.
user_func (func) – The user function to apply.
n_jobs (Optional[int]) – The number of parallel jobs for the cluster.
kwargs (Optional[dict]) – Keyword arguments passed to
to_raster().
Example
>>> import geowombat as gw >>> >>> def user_func(ds_): >>> return ds_.max(axis=0) >>> >>> with gw.open('image.tif', chunks=512) as ds: >>> ds.gw.apply( >>> 'output.tif', >>> user_func, >>> n_jobs=8, >>> overwrite=True, >>> blockxsize=512, >>> blockysize=512 >>> )
- assign_nodata_attrs(nodata)[source]#
Assigns ‘no data’ attributes.
- Parameters:
nodata (float | int) – The ‘no data’ value to assign.
- Return type:
DataArray- Returns:
xarray.DataArray
- avi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the advanced vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[AVI = {(NIR \times (1.0 - red) \times (NIR - red))}^{0.3334}\]- Returns:
Data range: 0 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- band_mask(valid_bands, src_nodata=None, dst_clear_val=0, dst_mask_val=1)[source]#
Creates a mask from band nonzeros.
- Parameters:
valid_bands (list) – The bands considered valid.
src_nodata (Optional[float | int]) – The source ‘no data’ value.
dst_clear_val (Optional[int]) – The destination clear value.
dst_mask_val (Optional[int]) – The destination mask value.
- Return type:
DataArray- Returns:
xarray.DataArray
- bounds_overlay(bounds, how='intersects')[source]#
Checks whether the bounds overlay the image bounds.
- Parameters:
bounds (tuple | rasterio.coords.BoundingBox | shapely.geometry) – The bounds to check. If given as a tuple, the order should be (left, bottom, right, top).
how (Optional[str]) – Choices are any
shapely.geometrybinary predicates.
- Return type:
bool- Returns:
bool
Example
>>> import geowombat as gw >>> >>> bounds = (left, bottom, right, top) >>> >>> with gw.open('image.tif') as src >>> intersects = src.gw.bounds_overlay(bounds) >>> >>> from rasterio.coords import BoundingBox >>> >>> bounds = BoundingBox(left, bottom, right, top) >>> >>> with gw.open('image.tif') as src >>> contains = src.gw.bounds_overlay(bounds, how='contains')
- calc_area(values, op='eq', units='km2', row_chunks=None, col_chunks=None, n_workers=1, n_threads=1, scheduler='threads', n_chunks=100)[source]#
Calculates the area of data values.
- Parameters:
values (list) – A list of values.
op (Optional[str]) – The value sign. Choices are [‘gt’, ‘ge’, ‘lt’, ‘le’, ‘eq’].
units (Optional[str]) – The units to return. Choices are [‘km2’, ‘ha’].
row_chunks (Optional[int]) – The row chunk size to process in parallel.
col_chunks (Optional[int]) – The column chunk size to process in parallel.
n_workers (Optional[int]) – The number of parallel workers for
scheduler.n_threads (Optional[int]) – The number of parallel threads for
dask.compute().scheduler (Optional[str]) –
The parallel task scheduler to use. Choices are [‘processes’, ‘threads’, ‘mpool’].
mpool: process pool of workers using
multiprocessing.Poolprocesses: process pool of workers usingconcurrent.futuresthreads: thread pool of workers usingconcurrent.futuresn_chunks (Optional[int]) – The chunk size of windows. If not given, equal to
n_workersx 50.
- Return type:
DataFrame- Returns:
pandas.DataFrame
Example
>>> import geowombat as gw >>> >>> # Read a land cover image with 512x512 chunks >>> with gw.open('land_cover.tif', chunks=512) as src: >>> >>> df = src.gw.calc_area( >>> [1, 2, 5], # calculate the area of classes 1, 2, and 5 >>> units='km2', # return area in kilometers squared >>> n_workers=4, >>> row_chunks=1024, # iterate over larger chunks to use 512 chunks in parallel >>> col_chunks=1024 >>> )
- check_chunksize(chunksize, array_size)[source]#
Ensures the chunk size is a multiple of 16 and fits within the array dimension.
- Parameters:
chunksize (int) – The chunk size to check.
array_size (int) – The array dimension size to check against.
- Return type:
int- Returns:
int
- clip(df, query=None, mask_data=False, expand_by=0)[source]#
Clips a DataArray by vector polygon geometry.
Deprecated since version 2.1.7: Use
xarray.DataArray.gw.clip_by_polygon().- Parameters:
df (GeoDataFrame) – The
geopandas.GeoDataFrameto clip to.query (Optional[str]) – A query to apply to
df.mask_data (Optional[bool]) – Whether to mask values outside of the
dfgeometry envelope.expand_by (Optional[int]) – Expand the clip array bounds by
expand_bypixels on each side.
- Returns:
xarray.DataArray
- clip_by_polygon(df, query=None, mask_data=False, expand_by=0)[source]#
Clips a DataArray by vector polygon geometry.
- Parameters:
df (GeoDataFrame) – The
geopandas.GeoDataFrameto clip to.query (Optional[str]) – A query to apply to
df.mask_data (Optional[bool]) – Whether to mask values outside of the
dfgeometry envelope.expand_by (Optional[int]) – Expand the clip array bounds by
expand_bypixels on each side.
- Return type:
DataArray- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as ds: >>> ds = ds.gw.clip_by_polygon(df, query="Id == 1")
- compare(op, b, return_binary=False)[source]#
Comparison operation.
- Parameters:
op (str) – The comparison operation.
b (float | int) – The value to compare to.
return_binary (Optional[bool]) – Whether to return a binary (1 or 0) array.
- Returns:
Valid data where
opmeets criteriab, otherwise nans.- Return type:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> # Mask all values greater than 10 >>> thresh = src.gw.compare(op='lt', b=10)
- compute(**kwargs)[source]#
Computes data.
- Parameters:
kwargs (Optional[dict]) – Keyword arguments to pass to
dask.compute().- Return type:
ndarray- Returns:
numpy.ndarray
- property data_are_separate: bool#
Checks whether the data are loaded separately.
- Returns:
bool
- property data_are_stacked: bool#
Checks whether the data are stacked.
- Returns:
bool
- detect(detector, **kwargs)[source]#
Run tiled, georeferenced object detection over this raster.
Thin accessor over
detector.predict(src, **kwargs)so that detection follows the samesrc.gw.<method>(...)shape assrc.gw.ndvi()orsrc.gw.extract(). The detector instance is built once (loading model weights) and passed in.- Parameters:
detector – A
YOLODetectororTorchGeoDetectorinstance (seegeowombat.detect).**kwargs – Forwarded to
detector.predict— typical args aretile_size,overlap,conf,band_indices,scale,nms_iou,max_det,progress. Ifband_indicesis omitted it is resolved from the activegw.config(sensor=...)band names.
- Returns:
geopandas.GeoDataFrame— one row per detection withgeometry,class_id,class_name,scoreandtile_idcolumns, in the source CRS.
Examples
>>> import geowombat as gw >>> from geowombat.detect import YOLODetector >>> >>> det = YOLODetector(weights='yolov8n.pt') >>> with gw.config.update(sensor='rgb'): ... with gw.open('aerial.tif', chunks=512) as src: ... preds = src.gw.detect(det, conf=0.25)
- evi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the enhanced vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[EVI = 2.5 \times \frac{NIR - red}{NIR \times 6 \times red - 7.5 \times blue + 1}\]- Returns:
Data range: 0 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- evi2(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the two-band modified enhanced vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[EVI2 = 2.5 \times \frac{NIR - red}{NIR + 1 + 2.4 \times red}\]- Returns:
Data range: 0 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- extract(aoi, bands=None, time_names=None, band_names=None, frac=1.0, min_frac_area=None, all_touched=False, id_column='id', time_format='%Y%m%d', mask=None, n_jobs=8, verbose=0, n_workers=1, n_threads=-1, use_client=False, address=None, total_memory=24, processes=False, pool_kwargs=None, **kwargs)[source]#
Extracts data within an area or points of interest. Projections do not need to match, as they are handled ‘on-the-fly’.
- Parameters:
aoi (str or GeoDataFrame) – A file or
geopandas.GeoDataFrameto extract data frame.bands (Optional[int or 1d array-like]) – A band or list of bands to extract. If not given, all bands are used. Bands should be GDAL-indexed (i.e., the first band is 1, not 0).
band_names (Optional[list]) – A list of band names. Length should be the same as bands.
time_names (Optional[list]) – A list of time names.
frac (Optional[float]) – A fractional subset of points to extract in each polygon feature.
min_frac_area (Optional[int | float]) – A minimum polygon area to use
frac. Otherwise, use all samples within a polygon.all_touched (Optional[bool]) – The
all_touchedargument is passed torasterio.features.rasterize().id_column (Optional[str]) – The id column name.
time_format (Optional[str]) – The
datetimeconversion format iftime_namesaredatetimeobjects.mask (Optional[GeoDataFrame or Shapely Polygon]) – A
shapely.geometry.Polygonmask to subset to.n_jobs (Optional[int]) – The number of features to rasterize in parallel.
verbose (Optional[int]) – The verbosity level.
n_workers (Optional[int]) – The number of process workers. Only applies when
use_client=True.n_threads (Optional[int]) – The number of thread workers. Only applies when
use_client=True.use_client (Optional[bool]) – Whether to use a
daskclient.address (Optional[str]) – A cluster address to pass to client. Only used when
use_client=True.total_memory (Optional[int]) – The total memory (in GB) required when
use_client=True.processes (Optional[bool]) – Whether to use process workers with the
dask.distributedclient. Only applies whenuse_client=True.pool_kwargs (Optional[dict]) – Keyword arguments passed to
multiprocessing.Pool().imap().kwargs (Optional[dict]) – Keyword arguments passed to
dask.compute().
- Return type:
GeoDataFrame- Returns:
geopandas.GeoDataFrame
Examples
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> df = src.gw.extract('poly.gpkg') >>> >>> # On a cluster >>> # Use a local cluster >>> with gw.open('image.tif') as src: >>> df = src.gw.extract('poly.gpkg', use_client=True, n_threads=16) >>> >>> # Specify the client address with a local cluster >>> with LocalCluster( >>> n_workers=1, >>> threads_per_worker=8, >>> scheduler_port=0, >>> processes=False, >>> memory_limit='4GB' >>> ) as cluster: >>> >>> with gw.open('image.tif') as src: >>> df = src.gw.extract( >>> 'poly.gpkg', >>> use_client=True, >>> address=cluster >>> )
- property filenames: Sequence[str | Path]#
Gets the data filenames.
- Returns:
list
- gcvi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the green chlorophyll vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[GCVI = \frac{NIR}{green} - 1\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- imshow(mask=False, nodata=0, flip=False, text_color='black', rot=30, **kwargs)[source]#
Shows an image on a plot.
- Parameters:
mask (Optional[bool]) – Whether to mask ‘no data’ values (given by
nodata).nodata (Optional[int or float]) – The ‘no data’ value.
flip (Optional[bool]) – Whether to flip an RGB array’s band order.
text_color (Optional[str]) – The text color.
rot (Optional[int]) – The degree rotation for the x-axis tick labels.
kwargs (Optional[dict]) – Keyword arguments passed to
xarray.plot.imshow.
- Return type:
None- Returns:
None
Examples
>>> with gw.open('image.tif') as ds: >>> ds.gw.imshow(band_names=['red', 'green', 'red'], mask=True, vmin=0.1, vmax=0.9, robust=True)
- kndvi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the kernel normalized difference vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[kNDVI = tanh({NDVI}^2)\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- mask(df, query=None, keep='in')[source]#
Masks a DataArray.
- Parameters:
df (GeoDataFrame or str) – The
geopandas.GeoDataFrameor filename to use for masking.query (Optional[str]) – A query to apply to
df.keep (Optional[str]) – If
keep= ‘in’, mask values outside of the geometry (keep inside). Otherwise, ifkeep= ‘out’, mask values inside (keep outside).
- Return type:
DataArray- Returns:
xarray.DataArray
- mask_nodata()[source]#
Masks ‘no data’ values with nans.
- Return type:
DataArray- Returns:
xarray.DataArray
- match_data(data, band_names)[source]#
Coerces the
xarray.DataArrayto match anotherxarray.DataArray.- Parameters:
data (DataArray) – The
xarray.DataArrayto match to.band_names (1d array-like) – The output band names.
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> import xarray as xr >>> >>> other_array = xr.DataArray() >>> >>> with gw.open('image.tif') as src: >>> new_array = other_array.gw.match_data(src, ['bd1'])
- moving(stat='mean', perc=50, w=3, nodata=None, weights=False)[source]#
Applies a moving window function to the DataArray.
- Parameters:
stat (Optional[str]) – The statistic to compute. Choices are [‘mean’, ‘std’, ‘var’, ‘min’, ‘max’, ‘perc’].
perc (Optional[int]) – The percentile to return if
stat= ‘perc’.w (Optional[int]) – The moving window size (in pixels).
nodata (Optional[int or float]) – A ‘no data’ value to ignore.
weights (Optional[bool]) – Whether to weight values by distance from window center.
- Return type:
DataArray- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> >>> # Calculate the mean within a 5x5 window >>> with gw.open('image.tif') as src: >>> res = src.gw.moving(stat='mean', w=5, nodata=32767.0) >>> >>> # Calculate the 90th percentile within a 15x15 window >>> with gw.open('image.tif') as src: >>> res = src.gw.moving(stat='perc', w=15, perc=90, nodata=32767.0) >>> res.data.compute(num_workers=4)
- n_windows(row_chunks=None, col_chunks=None)[source]#
Calculates the number of windows in a row/column iteration.
- Parameters:
row_chunks (Optional[int]) – The row chunk size. If not given, defaults to opened DataArray chunks.
col_chunks (Optional[int]) – The column chunk size. If not given, defaults to opened DataArray chunks.
- Return type:
int- Returns:
int
- nbr(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the normalized burn ratio
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[NBR = \frac{NIR - SWIR1}{NIR + SWIR1}\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- ndvi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the normalized difference vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[NDVI = \frac{NIR - red}{NIR + red}\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- norm_brdf(solar_za, solar_az, sensor_za, sensor_az, sensor=None, wavelengths=None, nodata=None, mask=None, scale_factor=1.0, scale_angles=True)[source]#
Applies Bidirectional Reflectance Distribution Function (BRDF) normalization.
- Parameters:
solar_za (2d DataArray) – The solar zenith angles (degrees).
solar_az (2d DataArray) – The solar azimuth angles (degrees).
sensor_za (2d DataArray) – The sensor azimuth angles (degrees).
sensor_az (2d DataArray) – The sensor azimuth angles (degrees).
sensor (Optional[str]) – The satellite sensor.
wavelengths (str list) – The wavelength(s) to normalize.
nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[DataArray]) – A data mask, where clear values are 0.
scale_factor (Optional[float]) – A scale factor to apply to the input data.
scale_angles (Optional[bool]) – Whether to scale the pixel angle arrays.
- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> >>> # Example where pixel angles are stored in separate GeoTiff files >>> with gw.config.update(sensor='l7', scale_factor=0.0001, nodata=0): >>> >>> with gw.open('solarz.tif') as solarz, >>> gw.open('solara.tif') as solara, >>> gw.open('sensorz.tif') as sensorz, >>> gw.open('sensora.tif') as sensora: >>> >>> with gw.open('landsat.tif') as ds: >>> ds_brdf = ds.gw.norm_brdf(solarz, solara, sensorz, sensora)
- norm_diff(b1, b2, nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the normalized difference band ratio.
- Parameters:
b1 (str) – The band name of the first band.
b2 (str) – The band name of the second band.
nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[{norm}_{diff} = \frac{b2 - b1}{b2 + b1}\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray- Parameters:
b1 (Any) –
b2 (Any) –
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- read(band, **kwargs)[source]#
Reads data for a band or bands.
- Parameters:
band (int | list) – A band or list of bands to read.
- Return type:
ndarray- Returns:
xarray.DataArray
- recode(polygon, to_replace, num_workers=1)[source]#
Recodes a DataArray with polygon mappings.
- Parameters:
polygon (GeoDataFrame | str) – The
geopandas.DataFrameor file with polygon geometry.to_replace (dict) –
How to find the values to replace. Dictionary mappings should be given as {from: to} pairs. If
to_replaceis an integer/string mapping, the to string should be ‘mode’.- {1: 5}:
recode values of 1 to 5
- {1: ‘mode’}:
recode values of 1 to the polygon mode
num_workers (Optional[int]) – The number of parallel Dask workers (only used if
to_replacehas a ‘mode’ mapping).
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif', chunks=512) as ds: >>> # Recode 1 with 5 within a polygon >>> res = ds.gw.recode('poly.gpkg', {1: 5})
- replace(to_replace)[source]#
Replace values given in
to_replacewith value.- Parameters:
to_replace (dict) –
How to find the values to replace. Dictionary mappings should be given as {from: to} pairs. If
to_replaceis an integer/string mapping, the to string should be ‘mode’.- {1: 5}:
recode values of 1 to 5
- {1: ‘mode’}:
recode values of 1 to the polygon mode
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif', chunks=512) as ds: >>> # Replace 1 with 5 >>> res = ds.gw.replace({1: 5})
- sample(method='random', band=None, n=None, strata=None, spacing=None, min_dist=None, max_attempts=10, num_workers=1, verbose=1, **kwargs)[source]#
Generates samples from a raster.
- Parameters:
data (DataArray) – The
xarray.DataArrayto extract data from.method (Optional[str]) – The sampling method. Choices are [‘random’, ‘systematic’].
band (Optional[int or str]) – The band name to extract from. Only required if
method= ‘random’ andstratais given.n (Optional[int]) – The total number of samples. Only required if
method= ‘random’.strata (Optional[dict]) –
The strata to sample within. The dictionary key–>value pairs should be {‘conditional,value’: proportion}.
E.g.,
strata = {‘==,1’: 0.5, ‘>=,2’: 0.5} … would sample 50% of total samples within class 1 and 50% of total samples in class >= 2.
strata = {‘==,1’: 10, ‘>=,2’: 20} … would sample 10 samples within class 1 and 20 samples in class >= 2.
spacing (Optional[float]) – The spacing (in map projection units) when
method= ‘systematic’.min_dist (Optional[float or int]) – A minimum distance allowed between samples. Only applies when
method= ‘random’.max_attempts (Optional[int]) – The maximum numer of attempts to sample points >
min_distfrom each other.num_workers (Optional[int]) – The number of parallel workers for
dask.compute().verbose (Optional[int]) – The verbosity level.
kwargs (Optional[dict]) – Keyword arguments passed to
geowombat.extract.
- Return type:
GeoDataFrame- Returns:
geopandas.GeoDataFrame
Examples
>>> import geowombat as gw >>> >>> # Sample 100 points randomly across the image >>> with gw.open('image.tif') as ds: >>> df = ds.gw.sample(n=100) >>> >>> # Sample points systematically (with 10km spacing) across the image >>> with gw.open('image.tif') as ds: >>> df = ds.gw.sample(method='systematic', spacing=10000.0) >>> >>> # Sample 50% of 100 in class 1 and 50% in classes >= 2 >>> strata = {'==,1': 0.5, '>=,2': 0.5} >>> with gw.open('image.tif') as ds: >>> df = ds.gw.sample(band=1, n=100, strata=strata) >>> >>> # Specify a per-stratum minimum allowed point distance of 1,000 meters >>> with gw.open('image.tif') as ds: >>> df = ds.gw.sample(band=1, n=100, min_dist=1000, strata=strata)
- save(filename, overwrite=False, scatter=None, client=None, compute=True, tags=None, compress='none', compression=None, num_workers=1, log_progress=True, tqdm_kwargs=None, bigtiff=None)[source]#
Saves a DataArray to raster using rasterio/dask.
- Parameters:
filename (str | Path) – The output file name to write to.
nodata (Optional[float | int]) – The ‘no data’ value. If
None(default), the ‘no data’ value is taken from theDataArraymetadata.overwrite (Optional[bool]) – Whether to overwrite an existing file. Default is False.
scatter (Optional[str]) – Scatter ‘band’ or ‘time’ to separate file. Default is None.
client (Optional[Client object]) – A
dask.distributed.Clientclient object to persist data. Default is None.compute (Optinoal[bool]) – Whether to compute and write to
filename. Otherwise, return thedasktask graph. IfTrue, compute and write tofilename. IfFalse, return thedasktask graph. Default isTrue.tags (Optional[dict]) – Metadata tags to write to file. Default is None.
compress (Optional[str]) –
The file compression type. Default is ‘none’, or no compression.
Note
When using a client, it is advised to use threading. E.g.,
dask.distributed.LocalCluster(processes=False). Process-based concurrency could result in corrupted file blocks.compression (Optional[str]) –
The file compression type. Default is ‘none’, or no compression.
Deprecated since version 2.1.4: Use ‘compress’ – ‘compression’ will be removed in >=2.2.0.
num_workers (Optional[int]) – The number of dask workers (i.e., chunks) to write concurrently. Default is 1.
log_progress (Optional[bool]) – Whether to log the progress bar during writing. Default is True.
tqdm_kwargs (Optional[dict]) – Keyword arguments to pass to
tqdm.bigtiff (Optional[str]) – A GDAL BIGTIFF flag. Choices are [“YES”, “NO”, “IF_NEEDED”, “IF_SAFER”].
- Return type:
None- Returns:
None, writes tofilename
Example
>>> import geowombat as gw >>> >>> with gw.open('file.tif') as src: >>> result = ... >>> result.gw.save('output.tif', compress='lzw', num_workers=8)
- set_nodata(src_nodata=None, dst_nodata=None, out_range=None, dtype=None, scale_factor=None, offset=None)[source]#
Sets ‘no data’ values and applies scaling to an
xarray.DataArray.- Parameters:
src_nodata (int | float) – The ‘no data’ values to replace. Default is
None.dst_nodata (int | float) – The ‘no data’ value to set. Default is
nan.out_range (Optional[tuple]) – The output clip range. Default is
None.dtype (Optional[str]) – The output data type. Default is
None.scale_factor (Optional[float | int]) – A scale factor to apply. Default is
None.offset (Optional[float | int]) – An offset to apply. Default is
None.
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> src = src.gw.set_nodata(0, 65535, out_range=(0, 10000), dtype='uint16')
- subset(left=None, top=None, right=None, bottom=None, rows=None, cols=None, center=False, mask_corners=False)[source]#
Subsets a DataArray.
- Parameters:
left (Optional[float]) – The left coordinate.
top (Optional[float]) – The top coordinate.
right (Optional[float]) – The right coordinate.
bottom (Optional[float]) – The bottom coordinate.
rows (Optional[int]) – The number of output rows.
cols (Optional[int]) – The number of output rows.
center (Optional[bool]) – Whether to center the subset on
leftandtop.mask_corners (Optional[bool]) – Whether to mask corners (requires
pymorph).chunksize (Optional[tuple]) – A new chunk size for the output.
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif', chunks=512) as ds: >>> ds_sub = ds.gw.subset( >>> left=-263529.884, >>> top=953985.314, >>> rows=2048, >>> cols=2048 >>> )
- tasseled_cap(nodata=None, sensor=None, scale_factor=1.0)[source]#
Applies a tasseled cap transformation
- Parameters:
nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> >>> with gw.config.update(sensor='qb', scale_factor=0.0001): >>> with gw.open( >>> 'image.tif', band_names=['blue', 'green', 'red', 'nir'] >>> ) as ds: >>> tcap = ds.gw.tasseled_cap()
- to_netcdf(filename, *args, **kwargs)[source]#
Writes an Xarray DataArray to a NetCDF file.
- Parameters:
filename (Path | str) – The output file name to write to.
args (DataArray) – Additional
DataArraysto stack.kwargs (dict) – Encoding arguments.
- Return type:
None
Examples
>>> import geowombat as gw >>> import xarray as xr >>> >>> # Write a single DataArray to a .nc file >>> with gw.config.update(sensor='l7'): >>> with gw.open('LC08_L1TP_225078_20200219_20200225_01_T1.tif') as src: >>> src.gw.to_netcdf('filename.nc', zlib=True, complevel=5) >>> >>> # Add extra layers >>> with gw.config.update(sensor='l7'): >>> with gw.open( >>> 'LC08_L1TP_225078_20200219_20200225_01_T1.tif' >>> ) as src, gw.open( >>> 'LC08_L1TP_225078_20200219_20200225_01_T1_angles.tif', >>> band_names=['zenith', 'azimuth'] >>> ) as ang: >>> src = ( >>> xr.where( >>> src == 0, -32768, src >>> ) >>> .astype('int16') >>> .assign_attrs(**src.attrs) >>> ) >>> >>> src.gw.to_netcdf( >>> 'filename.nc', >>> ang.astype('int16'), >>> zlib=True, >>> complevel=5, >>> _FillValue=-32768 >>> ) >>> >>> # Open the data and convert to a DataArray >>> with xr.open_dataset( >>> 'filename.nc', engine='h5netcdf', chunks=256 >>> ) as ds: >>> src = ds.to_array(dim='band')
- to_polygon(mask=None, connectivity=4)[source]#
Converts a
daskarray to aGeoDataFrame- Parameters:
mask (Optional[numpy ndarray or rasterio Band object]) – Must evaluate to bool (
rasterio.bool_orrasterio.uint8). Values of False or 0 will be excluded from feature generation. Note well that this is the inverse sense from Numpy’s, where a mask value of True indicates invalid data in an array. If source is a Numpy masked array and mask is None, the source’s mask will be inverted and used in place of mask.connectivity (Optional[int]) – Use 4 or 8 pixel connectivity for grouping pixels into features.
- Return type:
GeoDataFrame- Returns:
geopandas.GeoDataFrame
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> >>> # Convert the input image to a GeoDataFrame >>> df = src.gw.to_polygon(mask='source', num_workers=8)
- to_raster(filename, readxsize=None, readysize=None, separate=False, out_block_type='gtiff', keep_blocks=False, verbose=0, overwrite=False, gdal_cache=512, scheduler='processes', n_jobs=1, n_workers=None, n_threads=None, n_chunks=None, overviews=False, resampling='nearest', driver='GTiff', nodata=None, blockxsize=512, blockysize=512, tags=None, **kwargs)[source]#
Writes an Xarray DataArray to a raster file.
Note
We advise using
save()in place of this method.- Parameters:
filename (str) – The output file name to write to.
readxsize (Optional[int]) – The size of column chunks to read. If not given,
readxsizedefaults to Dask chunk size.readysize (Optional[int]) – The size of row chunks to read. If not given,
readysizedefaults to Dask chunk size.separate (Optional[bool]) – Whether to write blocks as separate files. Otherwise, write to a single file.
out_block_type (Optional[str]) – The output block type. Choices are [‘gtiff’, ‘zarr’]. Only used if
separate=True.keep_blocks (Optional[bool]) – Whether to keep the blocks stored on disk. Only used if
separate=True.verbose (Optional[int]) – The verbosity level.
overwrite (Optional[bool]) – Whether to overwrite an existing file.
gdal_cache (Optional[int]) – The
GDALcache size (in MB).scheduler (Optional[str]) – The
concurrent.futuresscheduler to use. Choices are [‘processes’, ‘threads’].n_jobs (Optional[int]) – The total number of parallel jobs.
n_workers (Optional[int]) – The number of processes.
n_threads (Optional[int]) – The number of threads.
n_chunks (Optional[int]) – The chunk size of windows. If not given, equal to
n_workersx 3.overviews (Optional[bool or list]) – Whether to build overview layers.
resampling (Optional[str]) – The resampling method for overviews when
overviewsisTrueor alist. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].driver (Optional[str]) – The raster driver.
nodata (Optional[int]) – A ‘no data’ value.
blockxsize (Optional[int]) – The output x block size. Ignored if
separate=True.blockysize (Optional[int]) – The output y block size. Ignored if
separate=True.tags (Optional[dict]) – Image tags to write to file.
kwargs (Optional[dict]) – Additional keyword arguments to pass to
rasterio.write.
- Return type:
None- Returns:
None
Examples
>>> import geowombat as gw >>> >>> # Use dask.compute() >>> with gw.open('input.tif') as ds: >>> ds.gw.to_raster('output.tif', n_jobs=8) >>> >>> # Use a dask client >>> with gw.open('input.tif') as ds: >>> ds.gw.to_raster('output.tif', use_client=True, n_workers=8, n_threads=4) >>> >>> # Compress the output >>> with gw.open('input.tif') as ds: >>> ds.gw.to_raster('output.tif', n_jobs=8, compress='lzw')
- to_vector(filename, mask=None, connectivity=4)[source]#
Writes an Xarray DataArray to a vector file.
- Parameters:
filename (str) – The output file name to write to.
mask (numpy ndarray or rasterio Band object, optional) – Must evaluate to bool (
rasterio.bool_orrasterio.uint8). Values of False or 0 will be excluded from feature generation. Note well that this is the inverse sense from Numpy’s, where a mask value of True indicates invalid data in an array. If source is a Numpy masked array and mask is None, the source’s mask will be inverted and used in place of mask.connectivity (Optional[int]) – Use 4 or 8 pixel connectivity for grouping pixels into features.
- Return type:
None- Returns:
None
- to_vrt(filename, overwrite=False, resampling=None, nodata=None, init_dest_nodata=True, warp_mem_limit=128)[source]#
Writes a file to a VRT file.
- Parameters:
filename (str | Path) – The output file name to write to.
overwrite (Optional[bool]) – Whether to overwrite an existing VRT file.
resampling (Optional[object]) – The resampling algorithm for
rasterio.vrt.WarpedVRT.nodata (Optional[float or int]) – The ‘no data’ value for
rasterio.vrt.WarpedVRT.init_dest_nodata (Optional[bool]) – Whether or not to initialize output to
nodataforrasterio.vrt.WarpedVRT.warp_mem_limit (Optional[int]) – The GDAL memory limit for
rasterio.vrt.WarpedVRT.
- Return type:
None
Examples
>>> import geowombat as gw >>> from rasterio.enums import Resampling >>> >>> # Transform a CRS and save to VRT >>> with gw.config.update(ref_crs=102033): >>> with gw.open('image.tif') as src: >>> src.gw.to_vrt( >>> 'output.vrt', >>> resampling=Resampling.cubic, >>> warp_mem_limit=256 >>> ) >>> >>> # Load multiple files set to a common geographic extent >>> bounds = (left, bottom, right, top) >>> with gw.config.update(ref_bounds=bounds): >>> with gw.open( >>> ['image1.tif', 'image2.tif'], mosaic=True >>> ) as src: >>> src.gw.to_vrt('output.vrt')
- to_yolo_dataset(labels, class_col, out_dir, tile_size=640, overlap=0.1, **kwargs)[source]#
Write a YOLO-format training dataset from this raster + labels.
Accessor wrapper around
geowombat.detect.build_datasetso users can stay inside thewith gw.open(...) as src:flow.- Parameters:
labels –
geopandas.GeoDataFrame, path, or URL of vector labels. Reprojected to the raster CRS automatically.class_col (str) – Column in
labelsholding class name/id.out_dir (str | Path) – Output directory. Ultralytics layout (
images/{train,val}+labels/{train,val}+ adata.yaml) is written under it.tile_size (int) – Tile edge in pixels. Default 640.
overlap (float) – Fractional overlap between tiles. Default 0.1.
**kwargs – Forwarded to
build_dataset— e.g.val_split,min_box_pixels,background_ratio,band_indices,scale,oriented,image_format,seed,class_names. Ifband_indicesis omitted it is resolved from the activegw.config(sensor=...).
- Returns:
dictsummary with keysout_dir,classes,n_train,n_val,n_boxes.
Examples
>>> import geopandas as gpd, geowombat as gw >>> gdf = gpd.read_file('buildings.gpkg') >>> with gw.open('naip.tif', chunks=512) as src: ... summary = src.gw.to_yolo_dataset( ... gdf, class_col='class_name', ... out_dir='./yolo', tile_size=640, ... )
- transform_crs(dst_crs=None, dst_res=None, dst_width=None, dst_height=None, dst_bounds=None, src_nodata=None, dst_nodata=None, coords_only=False, resampling='nearest', warp_mem_limit=512, num_threads=1)[source]#
Transforms an
xarray.DataArrayto a new coordinate reference system.- Parameters:
dst_crs (Optional[CRS | int | dict | str]) – The destination CRS.
dst_res (Optional[tuple]) – The destination resolution.
dst_width (Optional[int]) – The destination width. Cannot be used with
dst_res.dst_height (Optional[int]) – The destination height. Cannot be used with
dst_res.dst_bounds (Optional[BoundingBox | tuple]) – The destination bounds, as a
rasterio.coords.BoundingBoxor as a tuple of (left, bottom, right, top).src_nodata (Optional[int | float]) – The source nodata value. Pixels with this value will not be used for interpolation. If not set, it will default to the nodata value of the source image if a masked ndarray or rasterio band, if available.
dst_nodata (Optional[int | float]) – The nodata value used to initialize the destination; it will remain in all areas not covered by the reprojected source. Defaults to the nodata value of the destination image (if set), the value of src_nodata, or 0 (GDAL default).
coords_only (Optional[bool]) – Whether to return transformed coordinates. If
coords_only=Truethen the array is not warped and the size is unchanged. It also avoids in-memory computations.resampling (Optional[str]) – The resampling method if
filenameis alist. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].warp_mem_limit (Optional[int]) – The warp memory limit.
num_threads (Optional[int]) – The number of parallel threads.
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> dst = src.gw.transform_crs(4326)
- wi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#
Calculates the woody vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.
mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor.
scale_factor (Optional[float]) – A scale factor to apply to the data.
- Return type:
DataArray
Equation:
\[WI = \Biggl \lbrace { 0,\text{ if } { red + SWIR1 \ge 0.5 } \atop 1 - \frac{red + SWIR1}{0.5}, \text{ otherwise } }\]- Returns:
Data range: 0 to 1
- Return type:
xarray.DataArray- Parameters:
nodata (float | int) –
mask (bool) –
sensor (str | None) –
scale_factor (float | None) –
- windows(row_chunks=None, col_chunks=None, return_type='window', ndim=2)[source]#
Generates windows for a row/column iteration.
- Parameters:
row_chunks (Optional[int]) – The row chunk size. If not given, defaults to opened DataArray chunks.
col_chunks (Optional[int]) – The column chunk size. If not given, defaults to opened DataArray chunks.
return_type (Optional[str]) – The data to return. Choices are [‘data’, ‘slice’, ‘window’].
ndim (Optional[int]) – The number of required dimensions if
return_type= ‘data’ or ‘slice’.
- Returns:
yieldsxarray.DataArray,tuple, orrasterio.windows.Window
Classes#
|
A method to access a |
geowombat Package#
- geowombat.apply(infile, outfile, block_func, args=None, count=1, scheduler='processes', gdal_cache=512, n_jobs=4, overwrite=False, tags=None, **kwargs)[source]#
Applies a function and writes results to file.
- Parameters:
infile (str) – The input file to process.
outfile (str) – The output file.
block_func (func) – The user function to apply to each block. The function should always return the window, the data, and at least one argument. The block data inside the function will be a 2d array if the input image has 1 band, otherwise a 3d array.
args (Optional[tuple]) – Additional arguments to pass to
block_func.count (Optional[int]) – The band count for the output file.
scheduler (Optional[str]) – The
concurrent.futuresscheduler to use. Choices are [‘threads’, ‘processes’].gdal_cache (Optional[int]) – The
GDALcache size (in MB).n_jobs (Optional[int]) – The number of blocks to process in parallel.
overwrite (Optional[bool]) – Whether to overwrite an existing output file.
tags (Optional[dict]) – Image tags to write to file.
kwargs (Optional[dict]) – Additional keyword arguments to pass to
rasterio.open.
- Returns:
None, writes tooutfile
Examples
>>> import geowombat as gw >>> >>> # Here is a function with no arguments >>> def my_func0(w, block, arg): >>> return w, block >>> >>> gw.apply('input.tif', >>> 'output.tif', >>> my_func0, >>> n_jobs=8) >>> >>> # Here is a function with 1 argument >>> def my_func1(w, block, arg): >>> return w, block * arg >>> >>> gw.apply('input.tif', >>> 'output.tif', >>> my_func1, >>> args=(10.0,), >>> n_jobs=8)
- geowombat.array_to_polygon(data, mask=None, connectivity=4, num_workers=1)#
Converts an
xarray.DataArray` to a ``geopandas.GeoDataFrame- Parameters:
data (DataArray) – The
xarray.DataArrayto convert.mask (Optional[str, numpy ndarray, or rasterio Band object]) – Must evaluate to bool (
rasterio.bool_orrasterio.uint8). Values of False or 0 will be excluded from feature generation. Note well that this is the inverse sense from Numpy’s, where a mask value of True indicates invalid data in an array. If source is a Numpy masked array and mask is None, the source’s mask will be inverted and used in place of mask. Ifmaskis equal to ‘source’, thendatais used as the mask.connectivity (Optional[int]) – Use 4 or 8 pixel connectivity for grouping pixels into features.
num_workers (Optional[int]) – The number of parallel workers to send to
dask.compute().
- Return type:
GeoDataFrame- Returns:
geopandas.GeoDataFrame
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> >>> # Convert the input image to a GeoDataFrame >>> df = gw.array_to_polygon( >>> src, >>> mask='source', >>> num_workers=8 >>> )
- geowombat.avi(data, nodata=None, mask=False, sensor=None, scale_factor=None)#
Calculates the advanced vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If
None, the ‘no data’ value is taken from thexarray.DataArrayattributes.mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor. If
None, the band names should reflect the index being calculated.scale_factor (Optional[float]) – A scale factor to apply to the data. If
None, the scale value is taken from thexarray.DataArrayattributes.
Equation:
\[AVI = {(NIR \times (1.0 - red) \times (NIR - red))}^{0.3334}\]- Returns:
Data range: 0 to 1
- Return type:
xarray.DataArray
- geowombat.bounds_to_coords(bounds, dst_crs)#
Converts bounds from longitude and latitude to native map coordinates.
- Parameters:
bounds (
tuple|rasterio.coords.BoundingBox) – The lat/lon bounds to transform.dst_crs (str, object, or DataArray) – The CRS to transform to. It can be provided as a string, a CRS instance (e.g.,
pyproj.crs.CRS), or ageowombat.DataArray.
- Returns:
(left, bottom, right, top)
- Return type:
tuple
- geowombat.calc_area(data, values, op='eq', units='km2', row_chunks=None, col_chunks=None, n_workers=1, n_threads=1, scheduler='threads', n_chunks=100)#
Calculates the area of data values.
- Parameters:
data (DataArray) – The
xarray.DataArrayto calculate area.values (list) – A list of values.
op (Optional[str]) – The value sign. Choices are [‘gt’, ‘ge’, ‘lt’, ‘le’, ‘eq’].
units (Optional[str]) – The units to return. Choices are [‘km2’, ‘ha’].
row_chunks (Optional[int]) – The row chunk size to process in parallel.
col_chunks (Optional[int]) – The column chunk size to process in parallel.
n_workers (Optional[int]) – The number of parallel workers for
scheduler.n_threads (Optional[int]) – The number of parallel threads for
dask.compute().scheduler (Optional[str]) –
The parallel task scheduler to use. Choices are [‘processes’, ‘threads’, ‘mpool’].
mpool: process pool of workers using
multiprocessing.Poolprocesses: process pool of workers usingconcurrent.futuresthreads: thread pool of workers usingconcurrent.futuresn_chunks (Optional[int]) – The chunk size of windows. If not given, equal to
n_workersx 50.
- Return type:
DataFrame- Returns:
pandas.DataFrame
Example
>>> import geowombat as gw >>> >>> # Read a land cover image with 512x512 chunks >>> with gw.open('land_cover.tif', chunks=512) as src: >>> >>> df = gw.calc_area( >>> src, >>> [1, 2, 5], # calculate the area of classes 1, 2, and 5 >>> units='km2', # return area in kilometers squared >>> n_workers=4, >>> row_chunks=1024, # iterate over larger chunks to use 512 chunks in parallel >>> col_chunks=1024 >>> )
- geowombat.clip(data, df, query=None, mask_data=False, expand_by=0)#
Clips a DataArray by vector polygon geometry.
Deprecated since version 2.1.7: Use
geowombat.clip_by_polygon().- Parameters:
data (DataArray) – The
xarray.DataArrayto subset.df (GeoDataFrame or str) – The
geopandas.GeoDataFrameor filename to clip to.query (Optional[str]) – A query to apply to
df.mask_data (Optional[bool]) – Whether to mask values outside of the
dfgeometry envelope.expand_by (Optional[int]) – Expand the clip array bounds by
expand_bypixels on each side.
- Return type:
DataArray- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as ds: >>> ds = gw.clip(ds, df, query="Id == 1") >>> >>> # or >>> >>> with gw.open('image.tif') as ds: >>> ds = ds.gw.clip(df, query="Id == 1")
- geowombat.clip_by_polygon(data, df, query=None, mask_data=False, expand_by=0)#
Clips a DataArray by vector polygon geometry.
- Parameters:
data (DataArray) – The
xarray.DataArrayto subset.df (GeoDataFrame or str) – The
geopandas.GeoDataFrameor filename to clip to.query (Optional[str]) – A query to apply to
df.mask_data (Optional[bool]) – Whether to mask values outside of the
dfgeometry envelope.expand_by (Optional[int]) – Expand the clip array bounds by
expand_bypixels on each side.
- Return type:
DataArray- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as ds: >>> ds = gw.clip_by_polygon(ds, df, query="Id == 1")
- geowombat.coords_to_indices(x, y, transform)#
Converts map coordinates to array indices.
- Parameters:
x (float or 1d array) – The x coordinates.
y (float or 1d array) – The y coordinates.
transform (object) – The affine transform.
- Returns:
(col_index, row_index)
- Return type:
tuple
Example
>>> import geowombat as gw >>> from geowombat.core import coords_to_indices >>> >>> with gw.open('image.tif') as src: >>> j, i = coords_to_indices(x, y, src)
- geowombat.coregister(target, reference, band_names_reference=None, band_names_target=None, wkt_version=None, **kwargs)#
Co-registers an image, or images, using AROSICS.
While the required inputs are DataArrays, the intermediate results are stored as NumPy arrays. Therefore, memory usage is constrained to the size of the input data. Dask is not used for any of the computation in this function.
- Parameters:
target (DataArray or str) – The target
xarray.DataArrayor file name to co-register toreference.reference (DataArray or str) – The reference
xarray.DataArrayor file name used to co-registertarget.band_names_reference (Optional[list | tuple]) – Band names to open for the reference data.
band_names_target (Optional[list | tuple]) – Band names to open for the target data.
wkt_version (Optional[str]) – The WKT version to use with
to_wkt().kwargs (Optional[dict]) – Keyword arguments passed to
arosics.
- Return type:
DataArray
- Reference:
- Return type:
DataArray- Returns:
xarray.DataArray- Parameters:
target (str | Path | DataArray) –
reference (str | Path | DataArray) –
band_names_reference (Sequence[str] | None) –
band_names_target (Sequence[str] | None) –
wkt_version (str | None) –
Example
>>> import geowombat as gw >>> >>> # Co-register a single image to a reference image >>> with gw.open('target.tif') as tar, gw.open('reference.tif') as ref: >>> results = gw.coregister( >>> tar, ref, q=True, ws=(512, 512), max_shift=3, CPUs=4 >>> ) >>> >>> # or >>> >>> results = gw.coregister( >>> 'target.tif', >>> 'reference.tif', >>> q=True, >>> ws=(512, 512), >>> max_shift=3, >>> CPUs=4 >>> )
- geowombat.evi(data, nodata=None, mask=False, sensor=None, scale_factor=None)#
Calculates the enhanced vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If
None, the ‘no data’ value is taken from thexarray.DataArrayattributes.mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor. If
None, the band names should reflect the index being calculated.scale_factor (Optional[float]) – A scale factor to apply to the data. If
None, the scale value is taken from thexarray.DataArrayattributes.
Equation:
\[EVI = 2.5 \times \frac{NIR - red}{NIR + 6 \times red - 7.5 \times blue + 1}\]- Returns:
Data range: 0 to 1
- Return type:
xarray.DataArray
- geowombat.evi2(data, nodata=None, mask=False, sensor=None, scale_factor=None)#
Calculates the two-band modified enhanced vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If
None, the ‘no data’ value is taken from thexarray.DataArrayattributes.mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor. If
None, the band names should reflect the index being calculated.scale_factor (Optional[float]) – A scale factor to apply to the data. If
None, the scale value is taken from thexarray.DataArrayattributes.
Equation:
\[EVI2 = 2.5 \times \frac{NIR - red}{NIR + 1 + 2.4 \times red}\]- Reference:
See [JHDM08]
- Returns:
Data range: 0 to 1
- Return type:
xarray.DataArray
- geowombat.extract(data, aoi, bands=None, time_names=None, band_names=None, frac=1.0, min_frac_area=None, all_touched=False, id_column='id', time_format='%Y%m%d', mask=None, n_jobs=8, verbose=0, n_workers=1, n_threads=-1, use_client=False, address=None, total_memory=24, processes=False, pool_kwargs=None, **kwargs)#
Extracts data within an area or points of interest. Projections do not need to match, as they are handled ‘on-the-fly’.
- Parameters:
data (DataArray) – The
xarray.DataArrayto extract data from.aoi (str or GeoDataFrame) – A file or
geopandas.GeoDataFrameto extract data frame.bands (Optional[int or 1d array-like]) – A band or list of bands to extract. If not given, all bands are used. Bands should be GDAL-indexed (i.e., the first band is 1, not 0).
band_names (Optional[list]) – A list of band names. Length should be the same as bands.
time_names (Optional[list]) – A list of time names.
frac (Optional[float]) – A fractional subset of points to extract in each polygon feature.
min_frac_area (Optional[int | float]) – A minimum polygon area to use
frac. Otherwise, use all samples within a polygon.all_touched (Optional[bool]) – The
all_touchedargument is passed torasterio.features.rasterize().id_column (Optional[str]) – The id column name.
time_format (Optional[str]) – The
datetimeconversion format iftime_namesaredatetimeobjects.mask (Optional[GeoDataFrame or Shapely Polygon]) – A
shapely.geometry.Polygonmask to subset to.n_jobs (Optional[int]) – The number of features to rasterize in parallel.
verbose (Optional[int]) – The verbosity level.
n_workers (Optional[int]) – The number of process workers. Only applies when
use_client=True.n_threads (Optional[int]) – The number of thread workers. Only applies when
use_client=True.use_client (Optional[bool]) – Whether to use a
daskclient.address (Optional[str]) – A cluster address to pass to client. Only used when
use_client=True.total_memory (Optional[int]) – The total memory (in GB) required when
use_client=True.processes (Optional[bool]) – Whether to use process workers with the
dask.distributedclient. Only applies whenuse_client=True.pool_kwargs (Optional[dict]) – Keyword arguments passed to
multiprocessing.Pool().imap().kwargs (Optional[dict]) – Keyword arguments passed to
dask.compute().
- Return type:
GeoDataFrame- Returns:
geopandas.GeoDataFrame
Examples
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as src: >>> df = gw.extract(src, 'poly.gpkg') >>> >>> # On a cluster >>> # Use a local cluster >>> with gw.open('image.tif') as src: >>> df = gw.extract(src, 'poly.gpkg', use_client=True, n_threads=16) >>> >>> # Specify the client address with a local cluster >>> with LocalCluster( >>> n_workers=1, >>> threads_per_worker=8, >>> scheduler_port=0, >>> processes=False, >>> memory_limit='4GB' >>> ) as cluster: >>> >>> with gw.open('image.tif') as src: >>> df = gw.extract( >>> src, >>> 'poly.gpkg', >>> use_client=True, >>> address=cluster >>> )
- geowombat.indices_to_coords(col_index, row_index, transform)#
Converts array indices to map coordinates.
- Parameters:
col_index (float or 1d array) – The column index.
row_index (float or 1d array) – The row index.
transform (Affine, DataArray, or tuple) – The affine transform.
- Returns:
(x, y)
- Return type:
tuple
Example
>>> import geowombat as gw >>> from geowombat.core import indices_to_coords >>> >>> with gw.open('image.tif') as src: >>> x, y = indices_to_coords(j, i, src)
- geowombat.kndvi(data, nodata=None, mask=False, sensor=None, scale_factor=None)#
Calculates the kernel normalized difference vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If
None, the ‘no data’ value is taken from thexarray.DataArrayattributes.mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor. If
None, the band names should reflect the index being calculated.scale_factor (Optional[float]) – A scale factor to apply to the data. If
None, the scale value is taken from thexarray.DataArrayattributes.
Equation:
\[kNDVI = tanh({NDVI}^2)\]- Reference:
See [CVCTMMartinez+21]
- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray
- geowombat.load(image_list, time_names, band_names, chunks=512, nodata=65535, in_range=None, out_range=None, data_slice=None, num_workers=1, src=None, scheduler='ray')[source]#
Loads data into memory using
xarray.open_mfdataset()andray. This function does not check data alignments and CRSs. It assumes each image inimage_listhas the same y and x dimensions and that the coordinates align.The
loadfunction cannot be used ifdataclasseswas pip installed.- Parameters:
image_list (list) – The list of image file paths.
time_names (list) – The list of image
datetimeobjects.band_names (list) – The list of bands to open.
chunks (Optional[int]) – The dask chunk size.
nodata (Optional[float | int]) – The ‘no data’ value.
in_range (Optional[tuple]) – The input (min, max) range. If not given, defaults to (0, 10000).
out_range (Optional[tuple]) – The output (min, max) range. If not given, defaults to (0, 1).
data_slice (Optional[tuple]) – The slice object to read, given as (time, bands, rows, columns).
num_workers (Optional[int]) – The number of threads.
scheduler (Optional[str]) – The distributed scheduler. Currently not implemented.
- Returns:
Datetime list, array of (time x bands x rows x columns)
- Return type:
list,numpy.ndarray
Example
>>> import datetime >>> import geowombat as gw >>> >>> image_names = ['LT05_L1TP_227082_19990311_20161220_01_T1.nc', >>> 'LT05_L1TP_227081_19990311_20161220_01_T1.nc', >>> 'LT05_L1TP_227082_19990327_20161220_01_T1.nc'] >>> >>> image_dates = [datetime.datetime(1999, 3, 11, 0, 0), >>> datetime.datetime(1999, 3, 11, 0, 0), >>> datetime.datetime(1999, 3, 27, 0, 0)] >>> >>> data_slice = (slice(0, None), slice(0, None), slice(0, 64), slice(0, 64)) >>> >>> # Load data into memory >>> dates, y = gw.load(image_names, >>> image_dates, >>> ['red', 'nir'], >>> chunks=512, >>> nodata=65535, >>> data_slice=data_slice, >>> num_workers=4)
- geowombat.lonlat_to_xy(lon, lat, dst_crs)#
Converts from longitude and latitude to native map coordinates.
- Parameters:
lon (float) – The longitude to convert.
lat (float) – The latitude to convert.
dst_crs (str, object, or DataArray) – The CRS to transform to. It can be provided as a string, a CRS instance (e.g.,
pyproj.crs.CRS), or ageowombat.DataArray.
- Returns:
(x, y)
- Return type:
tuple
Example
>>> import geowombat as gw >>> from geowombat.core import lonlat_to_xy >>> >>> lon, lat = -55.56822206, -25.46214220 >>> >>> with gw.open('image.tif') as src: >>> x, y = lonlat_to_xy(lon, lat, src)
- geowombat.mask(data, dataframe, query=None, keep='in')#
Masks a DataArray by vector polygon geometry.
- Parameters:
data (DataArray) – The
xarray.DataArrayto mask.dataframe (GeoDataFrame or str) – The
geopandas.GeoDataFrameor filename to use for masking.query (Optional[str]) – A query to apply to
dataframe.keep (Optional[str]) – If
keep= ‘in’, mask values outside of the geometry (keep inside). Otherwise, ifkeep= ‘out’, mask values inside (keep outside).
- Return type:
DataArray- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> >>> with gw.open('image.tif') as ds: >>> ds = ds.gw.mask(df)
- geowombat.moving(data, stat='mean', perc=50, w=3, nodata=None, weights=False)#
Applies a moving window function over Dask array blocks.
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.stat (Optional[str]) – The statistic to compute. Choices are [‘mean’, ‘std’, ‘var’, ‘min’, ‘max’, ‘perc’].
perc (Optional[int]) – The percentile to return if
stat= ‘perc’.w (Optional[int]) – The moving window size (in pixels).
nodata (Optional[int or float]) – A ‘no data’ value to ignore.
weights (Optional[bool]) – Whether to weight values by distance from window center.
- Return type:
DataArray- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> >>> # Calculate the mean within a 5x5 window >>> with gw.open('image.tif') as src: >>> res = gw.moving(ds, stat='mean', w=5, nodata=32767.0) >>> >>> # Calculate the 90th percentile within a 15x15 window >>> with gw.open('image.tif') as src: >>> res = gw.moving(stat='perc', w=15, perc=90, nodata=32767.0) >>> res.data.compute(num_workers=4)
- geowombat.nbr(data, nodata=None, mask=False, sensor=None, scale_factor=None)#
Calculates the normalized burn ratio
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If
None, the ‘no data’ value is taken from thexarray.DataArrayattributes.mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor. If
None, the band names should reflect the index being calculated.scale_factor (Optional[float]) – A scale factor to apply to the data. If
None, the scale value is taken from thexarray.DataArrayattributes.
Equation:
\[NBR = \frac{NIR - SWIR2}{NIR + SWIR2}\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray
- geowombat.ndvi(data, nodata=None, mask=False, sensor=None, scale_factor=None)#
Calculates the normalized difference vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If
None, the ‘no data’ value is taken from thexarray.DataArrayattributes.mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor. If
None, the band names should reflect the index being calculated.scale_factor (Optional[float]) – A scale factor to apply to the data. If
None, the scale value is taken from thexarray.DataArrayattributes.
Equation:
\[NDVI = \frac{NIR - red}{NIR + red}\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray
- geowombat.norm_diff(data, b1, b2, sensor=None, nodata=None, mask=False, scale_factor=None)#
Calculates the normalized difference band ratio
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.b1 (str) – The band name of the first band.
b2 (str) – The band name of the second band.
sensor (Optional[str]) – sensor (Optional[str]): The data’s sensor. If
None, the band names should reflect the index being calculated.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If
None, the ‘no data’ value is taken from thexarray.DataArrayattributes.mask (Optional[bool]) – Whether to mask the results.
scale_factor (Optional[float]) – A scale factor to apply to the data. If
None, the scale value is taken from thexarray.DataArrayattributes.
Equation:
\[{norm}_{diff} = \frac{b2 - b1}{b2 + b1}\]- Returns:
Data range: -1 to 1
- Return type:
xarray.DataArray
- class geowombat.open(filename, band_names=None, time_names=None, stack_dim='time', bounds=None, bounds_by='reference', resampling='nearest', persist_filenames=False, netcdf_vars=None, mosaic=False, overlap='max', nodata=None, scale_factor=None, offset=None, dtype=None, scale_data=False, num_workers=1, **kwargs)[source]#
Bases:
objectOpens one or more raster files.
- Parameters:
filename (str or list) – The file name, search string, or a list of files to open.
band_names (Optional[1d array-like]) – A list of band names if
boundsis given orwindowis given. Default is None.time_names (Optional[1d array-like]) – A list of names to give the time dimension if
boundsis given. Default is None.stack_dim (Optional[str]) – The stack dimension. Choices are [‘time’, ‘band’].
bounds (Optional[1d array-like]) – A bounding box to subset to, given as [minx, maxy, miny, maxx]. Default is None.
bounds_by (Optional[str]) – How to concatenate the output extent if
filenameis alistandmosaic=False. Choices are'intersection'(use the minimum extent of all image bounds),'union'(use the maximum extent of all image bounds), or'reference'(use the bounds of the reference image; if aref_imageis not given, the first image in thefilenamelist is used).resampling (Optional[str]) – The resampling method if
filenameis alist. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].persist_filenames (Optional[bool]) – Whether to persist the filenames list with the
xarray.DataArrayattributes. By default,persist_filenames=Falseto avoid storing large file lists.netcdf_vars (Optional[list]) – NetCDF variables to open as a band stack.
mosaic (Optional[bool]) – If
filenameis alist, whether to mosaic the arrays instead of stacking.overlap (Optional[str]) – The keyword that determines how to handle overlapping data if
filenamesis alist. Choices are [‘min’, ‘max’, ‘mean’].nodata (Optional[float | int]) –
A ‘no data’ value to set. Default is
None. IfnodataisNone, the ‘no data’ value is set from the file metadata. Ifnodatais given, then the file ‘no data’ value is overridden. See docstring examples for use ofnodataingeowombat.config.update.Note
The
geowombat.config.updateoverrides this argument. Thus, preference is always given in the following order:geowombat.config.update(nodata not None)open(nodata not None)file ‘no data’ value from metadata ‘_FillValue’ or ‘nodatavals’
scale_factor (Optional[float | int]) –
A scale value to apply to the opened data. The same rules used in
nodataapply. I.e.,Note
The
geowombat.config.updateoverrides this argument. Thus, preference is always given in the following order:geowombat.config.update(scale_factor not None)open(scale_factor not None)file scale value from metadata ‘scales’
offset (Optional[float | int]) –
An offset value to apply to the opened data. The same rules used in
nodataapply. I.e.,Note
The
geowombat.config.updateoverrides this argument. Thus, preference is always given in the following order:geowombat.config.update(offset not None)open(offset not None)file offset value from metadata ‘offsets’
dtype (Optional[str]) – A data type to force the output to. If not given, the data type is extracted from the file.
scale_data (Optional[bool]) –
Whether to apply scaling to the opened data. Default is
False. Scaled data are returned as:scaled = data * gain + offset
See the arguments
nodata,scale_factor, andoffsetfor rules regarding how scaling is applied.num_workers (Optional[int]) – The number of parallel workers for Dask if
boundsis given orwindowis given. Default is 1.kwargs (Optional[dict]) – Keyword arguments passed to the file opener.
- Returns:
xarray.DataArrayorxarray.Dataset
Examples
>>> import geowombat as gw >>> >>> # Open an image >>> with gw.open('image.tif') as ds: >>> print(ds) >>> >>> # Open a list of images, stacking along the 'time' dimension >>> with gw.open(['image1.tif', 'image2.tif']) as ds: >>> print(ds) >>> >>> # Open all GeoTiffs in a directory, stack along the 'time' dimension >>> with gw.open('*.tif') as ds: >>> print(ds) >>> >>> # Use a context manager to handle images of difference sizes and projections >>> with gw.config.update(ref_image='image1.tif'): >>> # Use 'time' names to stack and mosaic non-aligned images with identical dates >>> with gw.open(['image1.tif', 'image2.tif', 'image3.tif'], >>> >>> # The first two images were acquired on the same date >>> # and will be merged into a single time layer >>> time_names=['date1', 'date1', 'date2']) as ds: >>> >>> print(ds) >>> >>> # Mosaic images across space using a reference >>> # image for the CRS and cell resolution >>> with gw.config.update(ref_image='image1.tif'): >>> with gw.open(['image1.tif', 'image2.tif'], mosaic=True) as ds: >>> print(ds) >>> >>> # Mix configuration keywords >>> with gw.config.update(ref_crs='image1.tif', ref_res='image1.tif', ref_bounds='image2.tif'): >>> # The ``bounds_by`` keyword overrides the extent bounds >>> with gw.open(['image1.tif', 'image2.tif'], bounds_by='union') as ds: >>> print(ds) >>> >>> # Resample an image to 10m x 10m cell size >>> with gw.config.update(ref_crs=(10, 10)): >>> with gw.open('image.tif', resampling='cubic') as ds: >>> print(ds) >>> >>> # Open a list of images at a window slice >>> from rasterio.windows import Window >>> # Stack two images, opening band 3 >>> with gw.open( >>> ['image1.tif', 'image2.tif'], >>> band_names=['date1', 'date2'], >>> num_workers=8, >>> indexes=3, >>> window=Window(row_off=0, col_off=0, height=100, width=100), >>> dtype='float32' >>> ) as ds: >>> print(ds) >>> >>> # Scale data upon opening, using the image metadata to get scales and offsets >>> with gw.open('image.tif', scale_data=True) as ds: >>> print(ds) >>> >>> # Scale data upon opening, specifying scales and overriding metadata >>> with gw.open('image.tif', scale_data=True, scale_factor=1e-4) as ds: >>> print(ds) >>> >>> # Scale data upon opening, specifying scales and overriding metadata >>> with gw.config.update(scale_factor=1e-4): >>> with gw.open('image.tif', scale_data=True) as ds: >>> print(ds) >>> >>> # Open a NetCDF variable, specifying a NetCDF prefix and variable to open >>> with gw.open('netcdf:image.nc:blue') as src: >>> print(src) >>> >>> # Open a NetCDF image without access to transforms by providing full file path >>> # NOTE: This will be faster than the above method >>> # as it uses ``xarray.open_dataset`` and bypasses CRS checks. >>> # NOTE: The chunks must be provided by the user. >>> # NOTE: Providing band names will ensure the correct order when reading from a NetCDF dataset. >>> with gw.open( >>> 'image.nc', >>> chunks={'band': -1, 'y': 256, 'x': 256}, >>> band_names=['blue', 'green', 'red', 'nir', 'swir1', 'swir2'], >>> engine='h5netcdf' >>> ) as src: >>> print(src) >>> >>> # Open multiple NetCDF variables as an array stack >>> with gw.open('netcdf:image.nc', netcdf_vars=['blue', 'green', 'red']) as src: >>> print(src)
Methods
close
- geowombat.polygon_to_array(polygon, col=None, data=None, cellx=None, celly=None, band_name=None, row_chunks=512, col_chunks=512, src_res=None, fill=0, default_value=1, all_touched=True, dtype='uint8', sindex=None, tap=False, bounds_by='intersection')#
Converts a polygon geometry to an
xarray.DataArray.- Parameters:
polygon (GeoDataFrame | str) – The
geopandas.DataFrameor file with polygon geometry.col (Optional[str]) – The column in
polygonyou want to assign values from. If not set, creates a binary raster.data (Optional[DataArray]) – An
xarray.DataArrayto use as a reference for rasterizing.cellx (Optional[float]) – The output cell x size.
celly (Optional[float]) – The output cell y size.
band_name (Optional[list]) – The
xarray.DataArrayband name.row_chunks (Optional[int]) – The
daskrow chunk size.col_chunks (Optional[int]) – The
daskcolumn chunk size.(Optional[tuple] (src_res) – A source resolution to align to.
fill (Optional[int]) – Used as fill value for all areas not covered by input geometries to
rasterio.features.rasterize.default_value (Optional[int]) – Used as value for all geometries, if not provided in shapes to
rasterio.features.rasterize.all_touched (Optional[bool]) – If True, all pixels touched by geometries will be burned in. If false, only pixels whose center is within the polygon or that are selected by Bresenham’s line algorithm will be burned in. The
all_touchedvalue forrasterio.features.rasterize().dtype (Optional[str | numpy data type]) – The output data type for
rasterio.features.rasterize().sindex (Optional[object]) – An instanced of
geopandas.GeoDataFrame.sindex.tap (Optional[bool]) – Whether to target align pixels.
bounds_by (Optional[str]) –
How to concatenate the output extent. Choices are [‘intersection’, ‘union’, ‘reference’].
reference: Use the bounds of the reference image
intersection: Use the intersection (i.e., minimum extent) of all the image bounds
union: Use the union (i.e., maximum extent) of all the image bounds
src_res (Sequence[float] | None) –
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> import geopandas as gpd >>> >>> df = gpd.read_file('polygons.gpkg') >>> >>> # 100x100 cell size >>> data = gw.polygon_to_array(df, 100.0, 100.0) >>> >>> # Align to an existing image >>> with gw.open('image.tif') as src: >>> data = gw.polygon_to_array(df, data=src)
- geowombat.polygons_to_points(data, df, frac=1.0, min_frac_area=None, all_touched=False, id_column='id', n_jobs=1, **kwargs)#
Converts polygons to points.
- Parameters:
data (DataArray or Dataset) – The
xarray.DataArrayorxarray.Dataset.df (GeoDataFrame) – The
geopandas.GeoDataFramecontaining the geometry to rasterize.frac (Optional[float]) – A fractional subset of points to extract in each feature.
min_frac_area (Optional[int | float]) – A minimum polygon area to use
frac. Otherwise, use all samples within a polygon.all_touched (Optional[bool]) – The
all_touchedargument is passed torasterio.features.rasterize.id_column (Optional[str]) – The ‘id’ column.
n_jobs (Optional[int]) – The number of features to rasterize in parallel.
kwargs (Optional[dict]) – Keyword arguments passed to
multiprocessing.Pool().imap.
- Returns:
geopandas.GeoDataFrame
- geowombat.recode(data, polygon, to_replace, num_workers=1)#
Recodes a DataArray with polygon mappings.
- Parameters:
data (DataArray) – The
xarray.DataArrayto recode.polygon (GeoDataFrame | str) – The
geopandas.DataFrameor file with polygon geometry.to_replace (dict) –
How to find the values to replace. Dictionary mappings should be given as {from: to} pairs. If
to_replaceis an integer/string mapping, the to string should be ‘mode’.- {1: 5}:
recode values of 1 to 5
- {1: ‘mode’}:
recode values of 1 to the polygon mode
num_workers (Optional[int]) – The number of parallel Dask workers (only used if
to_replacehas a ‘mode’ mapping).
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif', chunks=512) as ds: >>> # Recode 1 with 5 within a polygon >>> res = gw.recode(ds, 'poly.gpkg', {1: 5})
- geowombat.replace(data, to_replace)#
Replace values given in to_replace with value.
- Parameters:
data (DataArray) – The
xarray.DataArrayto recode.to_replace (dict) –
How to find the values to replace. Dictionary mappings should be given as {from: to} pairs. If
to_replaceis an integer/string mapping, the to string should be ‘mode’.- {1: 5}:
recode values of 1 to 5
- {1: ‘mode’}:
recode values of 1 to the polygon mode
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif', chunks=512) as ds: >>> # Replace 1 with 5 >>> res = gw.replace(ds, {1: 5})
- geowombat.sample(data, method='random', band=None, n=None, strata=None, spacing=None, min_dist=None, max_attempts=10, num_workers=1, verbose=1, **kwargs)#
Generates samples from a raster.
- Parameters:
data (DataArray) – The
xarray.DataArrayto extract data from.method (Optional[str]) – The sampling method. Choices are [‘random’, ‘systematic’].
band (Optional[int or str]) – The band name to extract from. Only required if
method= ‘random’ andstratais given.n (Optional[int]) – The total number of samples. Only required if
method= ‘random’.strata (Optional[dict]) –
The strata to sample within. The dictionary key–>value pairs should be {‘conditional,value’: sample size}.
E.g.,
strata = {‘==,1’: 0.5, ‘>=,2’: 0.5} … would sample 50% of total samples within class 1 and 50% of total samples in class >= 2.
strata = {‘==,1’: 10, ‘>=,2’: 20} … would sample 10 samples within class 1 and 20 samples in class >= 2.
spacing (Optional[float]) – The spacing (in map projection units) when
method= ‘systematic’.min_dist (Optional[float or int]) – A minimum distance allowed between samples. Only applies when
method= ‘random’.max_attempts (Optional[int]) – The maximum numer of attempts to sample points >
min_distfrom each other.num_workers (Optional[int]) – The number of parallel workers for
dask.compute().verbose (Optional[int]) – The verbosity level.
kwargs (Optional[dict]) – Keyword arguments passed to
geowombat.extract.
- Return type:
GeoDataFrame- Returns:
geopandas.GeoDataFrame
Examples
>>> import geowombat as gw >>> >>> # Sample 100 points randomly across the image >>> with gw.open('image.tif') as src: >>> df = gw.sample(src, n=100) >>> >>> # Sample points systematically (with 10km spacing) across the image >>> with gw.open('image.tif') as src: >>> df = gw.sample(src, method='systematic', spacing=10000.0) >>> >>> # Sample 50% of 100 in class 1 and 50% in classes >= 2 >>> strata = {'==,1': 0.5, '>=,2': 0.5} >>> with gw.open('image.tif') as src: >>> df = gw.sample(src, band=1, n=100, strata=strata) >>> >>> # Specify a per-stratum minimum allowed point distance of 1,000 meters >>> with gw.open('image.tif') as src: >>> df = gw.sample(src, band=1, n=100, min_dist=1000, strata=strata)
- geowombat.save(data, filename, overwrite=False, scatter=None, client=None, compute=True, tags=None, compress='none', compression=None, num_workers=1, log_progress=True, tqdm_kwargs=None, bigtiff=None)[source]#
Saves a DataArray to raster using rasterio/dask.
- Parameters:
data (xarray.DataArray) – The data to write.
filename (str | Path) – The output file name to write to.
overwrite (Optional[bool]) – Whether to overwrite an existing file. Default is False.
scatter (Optional[str]) – Scatter ‘band’ or ‘time’ to separate file. Default is None.
client (Optional[Client object]) – A
dask.distributed.Clientclient object to persist data. Default is None.compute (Optinoal[bool]) – Whether to compute and write to
filename. Otherwise, return thedasktask graph. IfTrue, compute and write tofilename. IfFalse, return thedasktask graph. Default isTrue.tags (Optional[dict]) – Metadata tags to write to file. Default is None.
compress (Optional[str]) –
The file compression type. Default is ‘none’, or no compression.
Note
When using a client, it is advised to use threading. E.g.,
dask.distributed.LocalCluster(processes=False). Process-based concurrency could result in corrupted file blocks.compression (Optional[str]) –
The file compression type. Default is ‘none’, or no compression.
Deprecated since version 2.1.4: Use ‘compress’ – ‘compression’ will be removed in >=2.2.0.
num_workers (Optional[int]) – The number of dask workers (i.e., chunks) to write concurrently. Default is 1.
log_progress (Optional[bool]) – Whether to log the progress bar during writing. Default is True.
tqdm_kwargs (Optional[dict]) – Keyword arguments to pass to
tqdm.bigtiff (Optional[str]) – A GDAL BIGTIFF flag. Choices are [“YES”, “NO”, “IF_NEEDED”, “IF_SAFER”].
- Returns:
None, writes tofilename
Example
>>> import geowombat as gw >>> >>> with gw.open('file.tif') as src: >>> result = ... >>> gw.save(result, 'output.tif', compress='lzw', num_workers=8) >>> >>> # Create delayed write tasks and compute later >>> tasks = [gw.save(array, 'output.tif', compute=False) for array in array_list] >>> # Write and close files >>> dask.compute(tasks, num_workers=8)
- class geowombat.series(filenames, time_names=None, band_names=None, transfer_lib='jax', crs=None, res=None, bounds=None, resampling='nearest', nodata=0, warp_mem_limit=256, num_threads=1, window_size=None, padding=None)[source]#
Bases:
BaseSeriesA class for time series concurrent processing on a GPU.
- Parameters:
filenames (list) – The list of filenames to open.
band_names (Optional[list]) – The band associated names.
transfer_lib (Optional[str]) – The library to transfer data to. Choices are [‘jax’, ‘keras’, ‘numpy’, ‘pytorch’, ‘tensorflow’].
crs (Optional[str]) – The coordinate reference system.
res (Optional[list | tuple]) – The cell resolution.
bounds (Optional[object]) – The coordinate bounds.
resampling (Optional[str]) – The resampling method.
nodata (Optional[float | int]) – The ‘no data’ value.
warp_mem_limit (Optional[int]) – The
rasteriowarping memory limit (in MB).num_threads (Optional[int]) – The number of
rasteriowarping threads.window_size (Optional[int | list | tuple]) – The concurrent processing window size (height, width) or -1 (i.e., entire array).
padding (Optional[list | tuple]) – Padding for each window.
paddingshould be given as a tuple of (left pad, bottom pad, right pad, top pad). Ifpaddingis given, the returned list will contain a tuple ofrasterio.windows.Windowobjects as (w1, w2), where w1 contains the normal window offsets and w2 contains the padded window offsets.time_names (list) –
- Requirement:
# CUDA 11.1 pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
- Attributes:
- band_dict
- blockxsize
- blockysize
- count
- crs
- height
- nchunks
- nodata
- transform
- width
- Parameters:
filenames (list) –
time_names (list) –
band_names (list) –
transfer_lib (str) –
crs (str) –
res (list | tuple) –
bounds (BoundingBox | list | tuple) –
resampling (str) –
nodata (float | int) –
warp_mem_limit (int) –
num_threads (int) –
window_size (int | list | tuple) –
padding (list | tuple) –
Methods
apply(func, bands[, gain, offset, ...])Applies a function concurrently over windows.
group_dates(data, image_dates, band_names)Groups data by dates.
read(bands[, window, gain, offset, pool, ...])Reads a window.
ndarray_to_darray
open
warp
- apply(func, bands, gain=1.0, offset=0.0, processes=False, num_workers=1, monitor_progress=True, outfile=None, bigtiff='NO', kwargs={})[source]#
Applies a function concurrently over windows.
- Parameters:
func (object | str | list | tuple) – The function to apply. If
funcis a string, choices are [‘cv’, ‘max’, ‘mean’, ‘min’].bands (list | int) – The bands to read.
gain (Optional[float]) – A gain factor to apply.
offset (Optional[float | int]) – An offset factor to apply.
processes (Optional[bool]) – Whether to use process workers, otherwise use threads.
num_workers (Optional[int]) – The number of concurrent workers.
monitor_progress (Optional[bool]) – Whether to monitor progress with a
tqdmbar.outfile (Optional[Path | str]) – The output file.
bigtiff (Optional[str]) – Whether to create a BigTIFF file. Choices are [‘YES’, ‘NO’,”IF_NEEDED”, “IF_SAFER”]. Default is ‘NO’.
kwargs (Optional[dict]) – Keyword arguments passed to rasterio open profile.
- Returns:
If
outfileis None, yields(Window, array, [datetime, ...]). Ifoutfileis not None, returns None and writes tooutfile.
Example
>>> import itertools >>> import geowombat as gw >>> import rasterio as rio >>> >>> # Import an image with 3 bands >>> from geowombat.data import l8_224078_20200518 >>> >>> # Create a custom class >>> class TemporalMean(gw.TimeModule): >>> >>> def __init__(self): >>> super(TemporalMean, self).__init__() >>> >>> # The main function >>> def calculate(self, array): >>> >>> sl1 = (slice(0, None), slice(self.band_dict['red'], self.band_dict['red']+1), slice(0, None), slice(0, None)) >>> sl2 = (slice(0, None), slice(self.band_dict['green'], self.band_dict['green']+1), slice(0, None), slice(0, None)) >>> >>> vi = (array[sl1] - array[sl2]) / ((array[sl1] + array[sl2]) + 1e-9) >>> >>> return vi.mean(axis=0).squeeze() >>> >>> with rio.open(l8_224078_20200518) as src: >>> res = src.res >>> bounds = src.bounds >>> nodata = 0 >>> >>> # Open many files, each with 3 bands >>> with gw.series([l8_224078_20200518]*100, >>> band_names=['blue', 'green', 'red'], >>> crs='epsg:32621', >>> res=res, >>> bounds=bounds, >>> nodata=nodata, >>> num_threads=4, >>> window_size=(1024, 1024)) as src: >>> >>> src.apply(TemporalMean(), >>> bands=-1, # open all bands >>> gain=0.0001, # scale from [0,10000] -> [0,1] >>> processes=False, # use threads >>> num_workers=4, # use 4 concurrent threads, one per window >>> outfile='vi_mean.tif') >>> >>> # Import a single-band image >>> from geowombat.data import l8_224078_20200518_B4 >>> >>> # Open many files, each with 1 band >>> with gw.series([l8_224078_20200518_B4]*100, >>> band_names=['red'], >>> crs='epsg:32621', >>> res=res, >>> bounds=bounds, >>> nodata=nodata, >>> num_threads=4, >>> window_size=(1024, 1024)) as src: >>> >>> src.apply('mean', # built-in function over single-band images >>> bands=1, # open all bands >>> gain=0.0001, # scale from [0,10000] -> [0,1] >>> num_workers=4, # use 4 concurrent threads, one per window >>> outfile='red_mean.tif') >>> >>> with gw.series([l8_224078_20200518_B4]*100, >>> band_names=['red'], >>> crs='epsg:32621', >>> res=res, >>> bounds=bounds, >>> nodata=nodata, >>> num_threads=4, >>> window_size=(1024, 1024)) as src: >>> >>> src.apply(['mean', 'max', 'cv'], # calculate multiple statistics >>> bands=1, # open all bands >>> gain=0.0001, # scale from [0,10000] -> [0,1] >>> num_workers=4, # use 4 concurrent threads, one per window >>> outfile='stack_mean.tif')
- geowombat.subset(data, left=None, top=None, right=None, bottom=None, rows=None, cols=None, center=False, mask_corners=False)#
Subsets a DataArray.
- Parameters:
data (DataArray) – The
xarray.DataArrayto subset.left (Optional[float]) – The left coordinate.
top (Optional[float]) – The top coordinate.
right (Optional[float]) – The right coordinate.
bottom (Optional[float]) – The bottom coordinate.
rows (Optional[int]) – The number of output rows.
cols (Optional[int]) – The number of output rows.
center (Optional[bool]) – Whether to center the subset on
leftandtop.mask_corners (Optional[bool]) – Whether to mask corners (requires
pymorph).
- Return type:
DataArray- Returns:
xarray.DataArray
Example
>>> import geowombat as gw >>> >>> with gw.open('image.tif', chunks=512) as ds: >>> ds_sub = gw.subset( >>> ds, >>> left=-263529.884, >>> top=953985.314, >>> rows=2048, >>> cols=2048 >>> )
- geowombat.tasseled_cap(data, nodata=None, sensor=None, scale_factor=None)#
Applies a tasseled cap transformation
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If
None, the ‘no data’ value is taken from thexarray.DataArrayattributes.sensor (Optional[str]) – The data’s sensor. If
None, the band names should reflect the index being calculated.scale_factor (Optional[float]) – A scale factor to apply to the data. If
None, the scale value is taken from thexarray.DataArrayattributes.
- Return type:
DataArray
Examples
>>> import geowombat as gw >>> >>> with gw.config.update(sensor='qb', scale_factor=0.0001): >>> with gw.open('image.tif', band_names=['blue', 'green', 'red', 'nir']) as ds: >>> tcap = gw.tasseled_cap(ds)
- Return type:
DataArray- Returns:
xarray.DataArray- Parameters:
data (DataArray) –
nodata (float | int) –
sensor (str) –
scale_factor (float) –
References
- geowombat.to_netcdf(data, filename, overwrite=False, compute=True, *args, **kwargs)[source]#
Writes an Xarray DataArray to a NetCDF file.
- Parameters:
data (DataArray) – The
xarray.DataArrayto write.filename (str) – The output file name to write to.
overwrite (Optional[bool]) – Whether to overwrite an existing file. Default is
False.compute (Optinoal[bool]) – Whether to compute and write to
filename. Otherwise, return thedasktask graph. Default isTrue.args (DataArray) – Additional
DataArraysto stack.kwargs (dict) – Encoding arguments.
- Returns:
None, writes tofilename
Examples
>>> import geowombat as gw >>> import xarray as xr >>> >>> # Write a single DataArray to a .nc file >>> with gw.config.update(sensor='l7'): >>> with gw.open('LC08_L1TP_225078_20200219_20200225_01_T1.tif') as src: >>> gw.to_netcdf(src, 'filename.nc', zlib=True, complevel=5) >>> >>> # Add extra layers >>> with gw.config.update(sensor='l7'): >>> with gw.open( >>> 'LC08_L1TP_225078_20200219_20200225_01_T1.tif' >>> ) as src, gw.open( >>> 'LC08_L1TP_225078_20200219_20200225_01_T1_angles.tif', >>> band_names=['zenith', 'azimuth'] >>> ) as ang: >>> src = ( >>> xr.where( >>> src == 0, -32768, src >>> ) >>> .astype('int16') >>> .assign_attrs(**src.attrs) >>> ) >>> >>> gw.to_netcdf( >>> src, >>> 'filename.nc', >>> ang.astype('int16'), >>> zlib=True, >>> complevel=5, >>> _FillValue=-32768 >>> ) >>> >>> # Open the data and convert to a DataArray >>> with xr.open_dataset( >>> 'filename.nc', engine='h5netcdf', chunks=256 >>> ) as ds: >>> src = ds.to_array(dim='band')
- geowombat.to_raster(data, filename, readxsize=None, readysize=None, separate=False, out_block_type='gtiff', keep_blocks=False, verbose=0, overwrite=False, gdal_cache=512, scheduler='mpool', n_jobs=1, n_workers=None, n_threads=None, n_chunks=None, padding=None, tags=None, tqdm_kwargs=None, **kwargs)[source]#
Writes a
daskarray to a raster file.Note
We advise using
save()in place of this method.- Parameters:
data (DataArray) – The
xarray.DataArrayto write.filename (str) – The output file name to write to.
readxsize (Optional[int]) – The size of column chunks to read. If not given,
readxsizedefaults to Dask chunk size.readysize (Optional[int]) – The size of row chunks to read. If not given,
readysizedefaults to Dask chunk size.separate (Optional[bool]) – Whether to write blocks as separate files. Otherwise, write to a single file.
out_block_type (Optional[str]) – The output block type. Choices are [‘gtiff’, ‘zarr’]. Only used if
separate=True.keep_blocks (Optional[bool]) – Whether to keep the blocks stored on disk. Only used if
separate=True.verbose (Optional[int]) – The verbosity level.
overwrite (Optional[bool]) – Whether to overwrite an existing file.
gdal_cache (Optional[int]) – The
GDALcache size (in MB).scheduler (Optional[str]) –
The parallel task scheduler to use. Choices are [‘processes’, ‘threads’, ‘mpool’].
mpool: process pool of workers using
multiprocessing.Poolprocesses: process pool of workers usingconcurrent.futuresthreads: thread pool of workers usingconcurrent.futuresn_jobs (Optional[int]) – The total number of parallel jobs.
n_workers (Optional[int]) – The number of process workers.
n_threads (Optional[int]) – The number of thread workers.
n_chunks (Optional[int]) – The chunk size of windows. If not given, equal to
n_workersx 50.overviews (Optional[bool or list]) – Whether to build overview layers.
resampling (Optional[str]) – The resampling method for overviews when
overviewsisTrueor alist. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].padding (Optional[tuple]) – Padding for each window.
paddingshould be given as a tuple of (left pad, bottom pad, right pad, top pad). Ifpaddingis given, the returned list will contain a tuple ofrasterio.windows.Windowobjects as (w1, w2), where w1 contains the normal window offsets and w2 contains the padded window offsets.tags (Optional[dict]) – Image tags to write to file.
tqdm_kwargs (Optional[dict]) – Additional keyword arguments to pass to
tqdm.kwargs (Optional[dict]) – Additional keyword arguments to pass to
rasterio.write.
- Returns:
None, writes tofilename
Examples
>>> import geowombat as gw >>> >>> # Use 8 parallel workers >>> with gw.open('input.tif') as ds: >>> gw.to_raster(ds, 'output.tif', n_jobs=8) >>> >>> # Use 4 process workers and 2 thread workers >>> with gw.open('input.tif') as ds: >>> gw.to_raster(ds, 'output.tif', n_workers=4, n_threads=2) >>> >>> # Control the window chunks passed to concurrent.futures >>> with gw.open('input.tif') as ds: >>> gw.to_raster(ds, 'output.tif', n_workers=4, n_threads=2, n_chunks=16) >>> >>> # Compress the output and build overviews >>> with gw.open('input.tif') as ds: >>> gw.to_raster(ds, 'output.tif', n_jobs=8, overviews=True, compress='lzw')
- geowombat.to_vrt(data, filename, overwrite=False, resampling=None, nodata=None, init_dest_nodata=True, warp_mem_limit=128)[source]#
Writes a file to a VRT file.
- Parameters:
data (DataArray) – The
xarray.DataArrayto write.filename (str) – The output file name to write to.
overwrite (Optional[bool]) – Whether to overwrite an existing VRT file.
resampling (Optional[object]) – The resampling algorithm for
rasterio.vrt.WarpedVRT. Default is ‘nearest’.nodata (Optional[float or int]) – The ‘no data’ value for
rasterio.vrt.WarpedVRT.init_dest_nodata (Optional[bool]) – Whether or not to initialize output to
nodataforrasterio.vrt.WarpedVRT.warp_mem_limit (Optional[int]) – The GDAL memory limit for
rasterio.vrt.WarpedVRT.
- Returns:
None, writes tofilename
Examples
>>> import geowombat as gw >>> from rasterio.enums import Resampling >>> >>> # Transform a CRS and save to VRT >>> with gw.config.update(ref_crs=102033): >>> with gw.open('image.tif') as src: >>> gw.to_vrt( >>> src, >>> 'output.vrt', >>> resampling=Resampling.cubic, >>> warp_mem_limit=256 >>> ) >>> >>> # Load multiple files set to a common geographic extent >>> bounds = (left, bottom, right, top) >>> with gw.config.update(ref_bounds=bounds): >>> with gw.open( >>> ['image1.tif', 'image2.tif'], mosaic=True >>> ) as src: >>> gw.to_vrt(src, 'output.vrt')
- geowombat.transform_crs(data_src, dst_crs=None, dst_res=None, dst_width=None, dst_height=None, dst_bounds=None, src_nodata=None, dst_nodata=None, coords_only=False, resampling='nearest', warp_mem_limit=512, num_threads=1)[source]#
Transforms a DataArray to a new coordinate reference system.
- Parameters:
data_src (DataArray) – The data to transform.
dst_crs (Optional[CRS | int | dict | str]) – The destination CRS.
dst_res (Optional[tuple]) – The destination resolution.
dst_width (Optional[int]) – The destination width. Cannot be used with
dst_res.dst_height (Optional[int]) – The destination height. Cannot be used with
dst_res.dst_bounds (Optional[BoundingBox | tuple]) – The destination bounds, as a
rasterio.coords.BoundingBoxor as a tuple of (left, bottom, right, top).src_nodata (Optional[int | float]) – The source nodata value. Pixels with this value will not be used for interpolation. If not set, it will default to the nodata value of the source image if a masked ndarray or rasterio band, if available.
dst_nodata (Optional[int | float]) – The nodata value used to initialize the destination; it will remain in all areas not covered by the reprojected source. Defaults to the nodata value of the destination image (if set), the value of src_nodata, or 0 (GDAL default).
coords_only (Optional[bool]) – Whether to return transformed coordinates. If
coords_only=Truethen the array is not warped and the size is unchanged. It also avoids in-memory computations.resampling (Optional[str]) – The resampling method if
filenameis alist. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].warp_mem_limit (Optional[int]) – The warp memory limit.
num_threads (Optional[int]) – The number of parallel threads.
- Returns:
xarray.DataArray
- geowombat.wi(data, nodata=None, mask=False, sensor=None, scale_factor=None)#
Calculates the woody vegetation index
- Parameters:
data (DataArray) – The
xarray.DataArrayto process.nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If
None, the ‘no data’ value is taken from thexarray.DataArrayattributes.mask (Optional[bool]) – Whether to mask the results.
sensor (Optional[str]) – The data’s sensor. If
None, the band names should reflect the index being calculated.scale_factor (Optional[float]) – A scale factor to apply to the data. If
None, the scale value is taken from thexarray.DataArrayattributes.
Equation:
\[WI = \Biggl \lbrace { 0,\text{ if } { red + SWIR1 \ge 0.5 } \atop 1 - \frac{red + SWIR1}{0.5}, \text{ otherwise } }\]- Reference:
See [LWC+13]
- Returns:
Data range: 0 to 1
- Return type:
xarray.DataArray
- geowombat.xy_to_lonlat(x, y, dst_crs)#
Converts from native map coordinates to longitude and latitude.
- Parameters:
x (float) – The x coordinate to convert.
y (float) – The y coordinate to convert.
dst_crs (str, object, or DataArray) – The CRS to transform to. It can be provided as a string, a CRS instance (e.g.,
pyproj.crs.CRS), or ageowombat.DataArray.
- Returns:
(longitude, latitude)
- Return type:
tuple
Example
>>> import geowombat as gw >>> from geowombat.core import xy_to_lonlat >>> >>> x, y = 643944.6956113526, 7183104.984484519 >>> >>> with gw.open('image.tif') as src: >>> lon, lat = xy_to_lonlat(x, y, src)
Functions#
|
Loads data into memory using |
|
Extracts data within an area or points of interest. |
|
Generates samples from a raster. |
|
Calculates the area of data values. |
|
Subsets a DataArray. |
|
Clips a DataArray by vector polygon geometry. |
|
Clips a DataArray by vector polygon geometry. |
|
Masks a DataArray by vector polygon geometry. |
|
Replace values given in to_replace with value. |
|
Recodes a DataArray with polygon mappings. |
|
Co-registers an image, or images, using AROSICS. |
|
Converts polygons to points. |
|
Applies a function and writes results to file. |
|
Transforms a DataArray to a new coordinate reference system. |
|
Saves a DataArray to raster using rasterio/dask. |
|
Writes a |
|
Writes an Xarray DataArray to a NetCDF file. |
|
Writes a file to a VRT file. |
|
Converts an |
|
Converts a polygon geometry to an |
|
Applies a moving window function over Dask array blocks. |
|
Calculates the normalized difference band ratio |
|
Calculates the advanced vegetation index |
|
Calculates the enhanced vegetation index |
|
Calculates the two-band modified enhanced vegetation index |
|
Calculates the kernel normalized difference vegetation index |
|
Calculates the normalized burn ratio |
|
Calculates the normalized difference vegetation index |
|
Calculates the woody vegetation index |
|
Applies a tasseled cap transformation |
|
Converts map coordinates to array indices. |
|
Converts array indices to map coordinates. |
|
Converts bounds from longitude and latitude to native map coordinates. |
|
Converts from longitude and latitude to native map coordinates. |
|
Converts from native map coordinates to longitude and latitude. |
Classes#
|
Opens one or more raster files. |
|
A class for time series concurrent processing on a GPU. |
|
Methods |
Methods |
geowombat.core.stac Module#
- class geowombat.core.stac.STACNames(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
StrEnumSTAC names.
Methods
capitalize(/)Return a capitalized version of the string.
casefold(/)Return a version of the string suitable for caseless comparisons.
center(width[, fillchar])Return a centered string of length width.
count(sub[, start[, end]])Return the number of non-overlapping occurrences of substring sub in string S[start:end].
encode(/[, encoding, errors])Encode the string using the codec registered for encoding.
endswith(suffix[, start[, end]])Return True if S ends with the specified suffix, False otherwise.
expandtabs(/[, tabsize])Return a copy where all tab characters are expanded using spaces.
find(sub[, start[, end]])Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end].
format(*args, **kwargs)Return a formatted version of S, using substitutions from args and kwargs.
format_map(mapping)Return a formatted version of S, using substitutions from mapping.
index(sub[, start[, end]])Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end].
isalnum(/)Return True if the string is an alpha-numeric string, False otherwise.
isalpha(/)Return True if the string is an alphabetic string, False otherwise.
isascii(/)Return True if all characters in the string are ASCII, False otherwise.
isdecimal(/)Return True if the string is a decimal string, False otherwise.
isdigit(/)Return True if the string is a digit string, False otherwise.
isidentifier(/)Return True if the string is a valid Python identifier, False otherwise.
islower(/)Return True if the string is a lowercase string, False otherwise.
isnumeric(/)Return True if the string is a numeric string, False otherwise.
isprintable(/)Return True if the string is printable, False otherwise.
isspace(/)Return True if the string is a whitespace string, False otherwise.
istitle(/)Return True if the string is a title-cased string, False otherwise.
isupper(/)Return True if the string is an uppercase string, False otherwise.
join(iterable, /)Concatenate any number of strings.
ljust(width[, fillchar])Return a left-justified string of length width.
lower(/)Return a copy of the string converted to lowercase.
lstrip([chars])Return a copy of the string with leading whitespace removed.
maketrans(x[, y, z])Return a translation table usable for str.translate().
partition(sep, /)Partition the string into three parts using the given separator.
removeprefix(prefix, /)Return a str with the given prefix string removed if present.
removesuffix(suffix, /)Return a str with the given suffix string removed if present.
replace(old, new[, count])Return a copy with all occurrences of substring old replaced by new.
rfind(sub[, start[, end]])Return the highest index in S where substring sub is found, such that sub is contained within S[start:end].
rindex(sub[, start[, end]])Return the highest index in S where substring sub is found, such that sub is contained within S[start:end].
rjust(width[, fillchar])Return a right-justified string of length width.
rpartition(sep, /)Partition the string into three parts using the given separator.
rsplit(/[, sep, maxsplit])Return a list of the substrings in the string, using sep as the separator string.
rstrip([chars])Return a copy of the string with trailing whitespace removed.
split(/[, sep, maxsplit])Return a list of the substrings in the string, using sep as the separator string.
splitlines(/[, keepends])Return a list of the lines in the string, breaking at line boundaries.
startswith(prefix[, start[, end]])Return True if S starts with the specified prefix, False otherwise.
strip([chars])Return a copy of the string with leading and trailing whitespace removed.
swapcase(/)Convert uppercase characters to lowercase and lowercase characters to uppercase.
title(/)Return a version of the string where each word is titlecased.
translate(table, /)Replace each character in the string using the given translation table.
upper(/)Return a copy of the string converted to uppercase.
zfill(width, /)Pad a numeric string with zeros on the left, to fill a field of the given width.
- class geowombat.core.stac.StrEnum(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
str,Enum- Source:
Methods
capitalize(/)Return a capitalized version of the string.
casefold(/)Return a version of the string suitable for caseless comparisons.
center(width[, fillchar])Return a centered string of length width.
count(sub[, start[, end]])Return the number of non-overlapping occurrences of substring sub in string S[start:end].
encode(/[, encoding, errors])Encode the string using the codec registered for encoding.
endswith(suffix[, start[, end]])Return True if S ends with the specified suffix, False otherwise.
expandtabs(/[, tabsize])Return a copy where all tab characters are expanded using spaces.
find(sub[, start[, end]])Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end].
format(*args, **kwargs)Return a formatted version of S, using substitutions from args and kwargs.
format_map(mapping)Return a formatted version of S, using substitutions from mapping.
index(sub[, start[, end]])Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end].
isalnum(/)Return True if the string is an alpha-numeric string, False otherwise.
isalpha(/)Return True if the string is an alphabetic string, False otherwise.
isascii(/)Return True if all characters in the string are ASCII, False otherwise.
isdecimal(/)Return True if the string is a decimal string, False otherwise.
isdigit(/)Return True if the string is a digit string, False otherwise.
isidentifier(/)Return True if the string is a valid Python identifier, False otherwise.
islower(/)Return True if the string is a lowercase string, False otherwise.
isnumeric(/)Return True if the string is a numeric string, False otherwise.
isprintable(/)Return True if the string is printable, False otherwise.
isspace(/)Return True if the string is a whitespace string, False otherwise.
istitle(/)Return True if the string is a title-cased string, False otherwise.
isupper(/)Return True if the string is an uppercase string, False otherwise.
join(iterable, /)Concatenate any number of strings.
ljust(width[, fillchar])Return a left-justified string of length width.
lower(/)Return a copy of the string converted to lowercase.
lstrip([chars])Return a copy of the string with leading whitespace removed.
maketrans(x[, y, z])Return a translation table usable for str.translate().
partition(sep, /)Partition the string into three parts using the given separator.
removeprefix(prefix, /)Return a str with the given prefix string removed if present.
removesuffix(suffix, /)Return a str with the given suffix string removed if present.
replace(old, new[, count])Return a copy with all occurrences of substring old replaced by new.
rfind(sub[, start[, end]])Return the highest index in S where substring sub is found, such that sub is contained within S[start:end].
rindex(sub[, start[, end]])Return the highest index in S where substring sub is found, such that sub is contained within S[start:end].
rjust(width[, fillchar])Return a right-justified string of length width.
rpartition(sep, /)Partition the string into three parts using the given separator.
rsplit(/[, sep, maxsplit])Return a list of the substrings in the string, using sep as the separator string.
rstrip([chars])Return a copy of the string with trailing whitespace removed.
split(/[, sep, maxsplit])Return a list of the substrings in the string, using sep as the separator string.
splitlines(/[, keepends])Return a list of the lines in the string, breaking at line boundaries.
startswith(prefix[, start[, end]])Return True if S starts with the specified prefix, False otherwise.
strip([chars])Return a copy of the string with leading and trailing whitespace removed.
swapcase(/)Convert uppercase characters to lowercase and lowercase characters to uppercase.
title(/)Return a version of the string where each word is titlecased.
translate(table, /)Replace each character in the string using the given translation table.
upper(/)Return a copy of the string converted to uppercase.
zfill(width, /)Pad a numeric string with zeros on the left, to fill a field of the given width.
- geowombat.core.stac.composite_stac(stac_catalog=STACNames.ELEMENT84_V1, collection=None, bounds=None, proj_bounds=None, start_date=None, end_date=None, cloud_cover_perc=None, bands=None, chunksize=256, mask_items=None, bounds_query=None, epsg=None, resolution=None, resampling=Resampling.nearest, nodata_fill=None, frequency='MS', agg='median', max_items=100, compute=True, num_workers=4)[source]#
Creates cloud-free temporal composites from STAC data.
Wraps
open_stac()to produce composites at a specified temporal frequency. Data is cloud-masked using pixel-level QA (Landsatqa_pixel), SCL (Sentinel-2), or Fmask (HLS) bands, then aggregated using the chosen method (default: median).- Parameters:
stac_catalog (str) – The STAC catalog. See
open_stac()for options. Ignored whencollection='hls'(usesnasa_lp_cloud).collection (str) – The STAC collection. See
open_stac()for options. Use'hls'to query bothhls_l30andhls_s30, merge observations, then composite. Only bands common to both sensors are allowed (blue,green,red,nir,swir1,swir2,coastal,cirrus).bounds (
Union[Sequence[float],str,Path,GeoDataFrame]) – The search bounding box. Seeopen_stac().proj_bounds (
Sequence[float]) – The projected bounds. Seeopen_stac().start_date (str) – The start search date (yyyy-mm-dd).
end_date (str) – The end search date (yyyy-mm-dd).
cloud_cover_perc (float | int) – Maximum cloud cover percentage for scene-level filtering.
bands (sequence) – The bands to open. Do not include
qa_pixel,scl, orFmask; these are added automatically for masking.chunksize (int) – The dask chunk size.
mask_items (sequence) – Items to mask. See
open_stac()for sensor-specific defaults.bounds_query (str) – A query for GeoDataFrame bounds.
epsg (int) – An EPSG code to warp to.
resolution (float | int) – Cell resolution.
resampling (
Optional[Resampling]) – The resampling method.nodata_fill (float | int) – Fill value for nodata.
frequency (str) –
Pandas offset alias for temporal grouping. Default
'MS'(month start). Common values:'D'— daily'W'— weekly'2W'— biweekly (every 2 weeks)'MS'— monthly (month start)'QS'— quarterly (quarter start)'YS'— yearly (year start)
Multiplied forms (e.g.,
'15D','2MS') are supported. See pandas offset aliases.agg (str) – Aggregation method for compositing. Default
'median'. Options:'median','mean','min','max'.max_items (int) – Maximum STAC search items.
compute (bool) – Whether to eagerly load data.
num_workers (int) – Number of threads for parallel downloads when
compute=True. Default is 4.
- Return type:
Optional[Tuple[DataArray,DataFrame]]- Returns:
tuple of (
xarray.DataArray,pandas.DataFrame) or(None, None)if no data found.
Examples
>>> from geowombat.core.stac import composite_stac >>> >>> # Monthly median composite of Sentinel-2 >>> composite, df = composite_stac( ... collection='sentinel_s2_l2a', ... start_date='2022-01-01', ... end_date='2022-12-31', ... bounds='aoi.geojson', ... bands=['blue', 'green', 'red', 'nir'], ... cloud_cover_perc=50, ... frequency='MS', ... resolution=10.0, ... ) >>> >>> # Quarterly composite of Landsat >>> composite, df = composite_stac( ... stac_catalog='microsoft_v1', ... collection='landsat_c2_l2', ... start_date='2022-01-01', ... end_date='2022-12-31', ... bounds='aoi.geojson', ... bands=['red', 'green', 'blue'], ... cloud_cover_perc=30, ... frequency='QS', ... resolution=30.0, ... ) >>> >>> # Combined HLS (Landsat + Sentinel-2) composite >>> composite, df = composite_stac( ... collection='hls', ... start_date='2023-06-01', ... end_date='2023-08-31', ... bounds=(-77.1, 38.85, -76.95, 38.95), ... bands=['blue', 'green', 'red', 'nir'], ... epsg=32618, ... resolution=30.0, ... frequency='MS', ... )
- geowombat.core.stac.merge_stac(data, *other)[source]#
Merges DataArrays by time.
- Parameters:
data (DataArray) – The
DataArrayto merge to.other (list of DataArrays) – The
DataArraysto merge.
- Return type:
DataArray- Returns:
xarray.DataArray
- geowombat.core.stac.open_stac(stac_catalog=STACNames.ELEMENT84_V1, collection=None, bounds=None, proj_bounds=None, start_date=None, end_date=None, cloud_cover_perc=None, bands=None, chunksize=256, mask_items=None, bounds_query=None, mask_data=False, epsg=None, resolution=None, resampling=Resampling.nearest, nodata_fill=None, view_asset_keys=False, extra_assets=None, out_path='.', max_items=100, max_extra_workers=1, compute=True, num_workers=4)[source]#
Opens a collection from a spatio-temporal asset catalog (STAC).
- Parameters:
stac_catalog (str) – Choices are [‘element84_v0’, ‘element84_v1’, ‘microsoft_v1’, ‘nasa_lp_cloud’].
collection (str) –
The STAC collection to open. Catalog options:
- element84_v0:
sentinel_s2_l2a_cogs
- element84_v1:
cop_dem_glo_30 landsat_c2_l2 sentinel_s2_l2a sentinel_s2_l1c sentinel_s1_l1c naip
- microsoft_v1:
cop_dem_glo_30 landsat_c2_l1 landsat_c2_l2 landsat_l8_c2_l2 sentinel_s2_l2a sentinel_s1_l1c io_lulc usda_cdl esa_worldcover naip
- nasa_lp_cloud:
hls_l30 (HLS Landsat 30m) hls_s30 (HLS Sentinel-2 30m)
bounds (sequence | str | Path | GeoDataFrame) – The search bounding box. This can also be given with the configuration manager (e.g.,
gw.config.update(ref_bounds=bounds)). The bounds CRS must be given in WGS/84 lat/lon (i.e., EPSG=4326).proj_bounds (sequence) – The projected bounds to return data. If
None(default), the returned bounds are the union of all collection scenes. Seeboundsin gjoseph92/stackstac for details.start_date (str) – The start search date (yyyy-mm-dd).
end_date (str) – The end search date (yyyy-mm-dd).
cloud_cover_perc (float | int) – The maximum percentage cloud cover.
bands (sequence) – The bands to open.
chunksize (int) – The dask chunk size.
mask_items (sequence) – The items to mask. For Landsat: QA bit names. Defaults to
['fill', 'dilated_cloud', 'cirrus', 'cloud', 'cloud_shadow', 'snow']. For Sentinel-2: SCL class names. Defaults to['no_data', 'saturated_defective', 'cloud_shadow', 'cloud_medium_prob', 'cloud_high_prob', 'thin_cirrus']. For HLS: Fmask bit names. Defaults to['cirrus', 'cloud', 'adjacent_cloud', 'cloud_shadow', 'snow_ice'].bounds_query (Optional[str]) – A query to select bounds from the
geopandas.GeoDataFrame.mask_data (Optional[bool]) – Whether to mask the data. When
True, the appropriate QA/SCL/Fmask band is automatically loaded and used for masking.epsg (Optional[int]) – An EPSG code to warp to.
resolution (Optional[float | int]) – The cell resolution to resample to.
resampling (Optional[rasterio.enumsResampling enum]) – The resampling method.
nodata_fill (Optional[float | int]) – A fill value to replace ‘no data’ NaNs.
view_asset_keys (Optional[bool]) – Whether to view asset ids.
extra_assets (Optional[list]) – Extra assets (non-image assets) to download.
out_path (Optional[str | Path]) – The output path to save files to.
max_items (Optional[int]) – The maximum number of items to return from the search, even if there are more matching results, passed to
pystac_client.ItemSearch. See https://pystac-client.readthedocs.io/en/latest/api.html#pystac_client.ItemSearch for details.max_extra_workers (Optional[int]) – The maximum number of extra assets to download concurrently.
compute (Optional[bool]) – Whether to eagerly load data into memory. If
True(default), downloads all remote data with a progress bar using parallel threads. IfFalse, returns a lazy dask-backed array.num_workers (Optional[int]) – Number of threads for parallel downloads when
compute=True. Default is 4. Higher values can speed up I/O-bound downloads from cloud storage.
- Return type:
DataArray- Returns:
xarray.DataArray
Examples
>>> from geowombat.core.stac import open_stac, merge_stac >>> >>> data_l, df_l = open_stac( >>> stac_catalog='microsoft_v1', >>> collection='landsat_c2_l2', >>> start_date='2020-01-01', >>> end_date='2021-01-01', >>> bounds='map.geojson', >>> bands=['red', 'green', 'blue', 'qa_pixel'], >>> mask_data=True, >>> extra_assets=['ang', 'mtl.txt', 'mtl.xml'] >>> ) >>> >>> from rasterio.enums import Resampling >>> >>> data_s2, df_s2 = open_stac( >>> stac_catalog='element84_v1', >>> collection='sentinel_s2_l2a', >>> start_date='2020-01-01', >>> end_date='2021-01-01', >>> bounds='map.geojson', >>> bands=['blue', 'green', 'red'], >>> resampling=Resampling.cubic, >>> epsg=int(data_l.epsg.values), >>> extra_assets=['granule_metadata'] >>> ) >>> >>> # Merge two temporal stacks >>> stack = ( >>> merge_stac(data_l, data_s2) >>> .sel(band='red') >>> .mean(dim='time') >>> )
Functions#
|
Creates cloud-free temporal composites from STAC data. |
|
Merges DataArrays by time. |
|
Opens a collection from a spatio-temporal asset catalog (STAC). |
Classes#
|
alias of |
|
Methods |
|
Methods |
|
STAC names. |
|
Source: |
geowombat.ml Package#
- geowombat.ml.fit(data, clf, labels=None, col=None, targ_name='targ', targ_dim_name='sample', temporal_mode='panel')#
Fits a classifier given class labels.
- Parameters:
data (DataArray) – The data to predict on.
clf (object) – The classifier or classification pipeline.
labels (Optional[str | Path | GeoDataFrame]) – Class labels as polygon geometry.
col (Optional[str]) – The column in
labelsyou want to assign values from. IfNone, creates a binary raster.targ_name (Optional[str]) – The target name.
targ_dim_name (Optional[str]) – The target coordinate name.
temporal_mode (Optional[str]) – How to handle time-dimensioned data.
'panel'— each pixel-time is an independent sample (B features); output has time dimension with per-time predictions.'flatten'— flatten time into band (T*B features per pixel); output has no time dimension, one prediction per pixel. Ignored when data has no time dimension.
- Returns:
Tuple of
(X, Xna, clf), whereXis the originalxarray.DataArrayaugmented to accept a prediction dimension;Xnais a tuple(xarray.DataArray, sklearn_xarray.Target)of the reshaped feature data (with NAs removed for supervised classifiers, retained for unsupervised) and the target array (None for unsupervised); andclfis the fitted sklearn pipeline.
Example
>>> import geowombat as gw >>> from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons >>> from geowombat.ml import fit >>> >>> import geopandas as gpd >>> from sklearn.pipeline import Pipeline >>> from sklearn.preprocessing import StandardScaler, LabelEncoder >>> from sklearn.decomposition import PCA >>> from sklearn.naive_bayes import GaussianNB >>> >>> le = LabelEncoder() >>> >>> labels = gpd.read_file(l8_224078_20200518_polygons) >>> labels['lc'] = le.fit(labels.name).transform(labels.name) >>> >>> # Use supervised classification pipeline >>> pl = Pipeline([('scaler', StandardScaler()), >>> ('pca', PCA()), >>> ('clf', GaussianNB())]) >>> >>> with gw.open(l8_224078_20200518) as src: >>> X, Xy, clf = fit(src, pl, labels, col='lc')
>>> # Fit an unsupervised classifier >>> cl = Pipeline([('pca', PCA()), >>> ('cst', KMeans())]) >>> with gw.open(l8_224078_20200518) as src: >>> X, Xy, clf = fit(src, cl)
- geowombat.ml.fit_predict(data, clf, labels=None, col=None, targ_name='targ', targ_dim_name='sample', mask_nodataval=True, temporal_mode='panel')#
Fits a classifier given class labels and predicts on a DataArray.
- Parameters:
data (DataArray) – The data to predict on.
clf (object) – The classifier or classification pipeline.
labels (optional[str | Path | GeoDataFrame]) – Class labels as polygon geometry.
col (Optional[str]) – The column in
labelsyou want to assign values from. IfNone, creates a binary raster.targ_name (Optional[str]) – The target name.
targ_dim_name (Optional[str]) – The target coordinate name.
mask_nodataval (Optional[Bool]) – If true, data.attrs[“nodatavals”][0] are replaced with np.nan and the array is returned as type float
temporal_mode (Optional[str]) – How to handle time-dimensioned data. ‘panel’ — each pixel-time is an independent sample. ‘flatten’ — flatten time into band.
- Returns:
Predictions shaped (‘time’ x ‘band’ x ‘y’ x ‘x’)
- Return type:
xarray.DataArray
Example
>>> import geowombat as gw >>> from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons >>> from geowombat.ml import fit_predict >>> >>> import geopandas as gpd >>> from sklearn.pipeline import Pipeline >>> from sklearn.preprocessing import StandardScaler, LabelEncoder >>> from sklearn.decomposition import PCA >>> from sklearn.naive_bayes import GaussianNB >>> from sklearn.cluster import KMeans >>> >>> le = LabelEncoder() >>> >>> labels = gpd.read_file(l8_224078_20200518_polygons) >>> labels['lc'] = le.fit(labels.name).transform(labels.name) >>> >>> # Use a supervised classification pipeline >>> pl = Pipeline([('scaler', StandardScaler()), >>> ('pca', PCA()), >>> ('clf', GaussianNB())]) >>> >>> with gw.open(l8_224078_20200518, nodata=0) as src: >>> y = fit_predict(src, pl, labels, col='lc') >>> y.sel(band='targ').gw.imshow() >>> >>> # Use an unsupervised classification pipeline >>> cl = Pipeline([('pca', PCA()), >>> ('cst', KMeans())]) >>> with gw.open(l8_224078_20200518, nodata=0) as src: >>> y2 = fit_predict(src, cl)
- geowombat.ml.predict(data, X, clf, targ_name='targ', targ_dim_name='sample', mask_nodataval=True, temporal_mode='panel')#
Predicts on a DataArray using a fitted classifier.
- Parameters:
data (DataArray) – The data to predict on.
X (DataArray) – Data array generated by geowombat.ml.fit.
clf (object) – The classifier or classification pipeline.
targ_name (Optional[str]) – The target name.
targ_dim_name (Optional[str]) – The target coordinate name.
mask_nodataval (Optional[Bool]) – If true, data.attrs[“nodatavals”][0] are replaced with np.nan and the array is returned as type float
temporal_mode (Optional[str]) – How to handle time-dimensioned data. ‘panel’ — each pixel-time is an independent sample. ‘flatten’ — flatten time into band. Must match the temporal_mode used in fit().
- Returns:
Predictions shaped (‘time’ x ‘band’ x ‘y’ x ‘x’)
- Return type:
xarray.DataArray
Example
>>> import geowombat as gw >>> from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons >>> from geowombat.ml import fit, predict >>> import geopandas as gpd >>> from sklearn.pipeline import Pipeline >>> from sklearn.preprocessing import LabelEncoder, StandardScaler >>> from sklearn.decomposition import PCA >>> from sklearn.naive_bayes import GaussianNB
>>> le = LabelEncoder() >>> labels = gpd.read_file(l8_224078_20200518_polygons) >>> labels["lc"] = le.fit(labels.name).transform(labels.name)
>>> # Use a data pipeline >>> pl = Pipeline([('scaler', StandardScaler()), >>> ('pca', PCA()), >>> ('clf', GaussianNB())])
>>> # Fit and predict the classifier >>> with gw.config.update(ref_res=100): >>> with gw.open(l8_224078_20200518, nodata=0) as src: >>> X, Xy, clf = fit(src, pl, labels, col="lc") >>> y = predict(src, X, clf) >>> print(y)
>>> # Fit and predict an unsupervised classifier >>> cl = Pipeline([('pca', PCA()), >>> ('cst', KMeans())]) >>> with gw.open(l8_224078_20200518) as src: >>> X, Xy, clf = fit(src, cl) >>> y1 = predict(src, X, clf)
Functions#
|
Fits a classifier given class labels. |
|
Fits a classifier given class labels and predicts on a DataArray. |
|
Predicts on a DataArray using a fitted classifier. |
geowombat.detect Package#
Object detection for geowombat.
Tiled, georeferenced bounding-box detection on top of Ultralytics YOLO
and TorchGeo / torchvision Faster R-CNN / RetinaNet, plus
training-dataset construction, accuracy metrics, and SAM-based polygon
refinement. Mirrors the fit / predict / fit_predict shape of
geowombat.ml.
Quick example#
>>> import geowombat as gw
>>> from geowombat.detect import YOLODetector
>>> det = YOLODetector(weights='yolov8n.pt')
>>> with gw.open('aerial.tif') as src:
... preds = src.gw.detect(det, conf=0.25)
- class geowombat.detect.SAMRefiner(checkpoint, model_type='vit_b', device='auto')[source]#
Bases:
objectRefine bounding-box detections into polygons using SAM.
Each input box is used as a prompt to the Segment Anything model (or SAM2 if installed) to produce a precise polygon mask, then polygonized back into vector geometry in the source CRS.
Requires:
pip install geowombat[sam]- Parameters:
- checkpointstr or Path
Path to a SAM checkpoint (
sam_vit_b.pth, etc).- model_type{‘vit_b’, ‘vit_l’, ‘vit_h’}
SAM backbone size. Default ‘vit_b’.
- devicestr
‘cpu’, ‘cuda’, or ‘auto’. Default ‘auto’.
Methods
refine(src, boxes_gdf[, band_indices, ...])Refine boxes to polygon masks.
- refine(src, boxes_gdf, band_indices=None, scale=None, pad_pixels=8, simplify_tolerance=0.5)[source]#
Refine boxes to polygon masks.
- Parameters:
- srcxarray.DataArray
Source raster (must match
boxes_gdf.crs).- boxes_gdfgeopandas.GeoDataFrame
Detector output. Each row’s geometry should be the bbox.
- band_indiceslist of int, optional
RGB bands for SAM input.
- scaletuple of (lo, hi), optional
Stretch range.
- pad_pixelsint
Pad around each box when reading the chip. Default 8.
- simplify_tolerancefloat
Polygon simplification tolerance in CRS units. Default 0.5.
- Returns:
- geopandas.GeoDataFrame
Same columns as input; geometry replaced by polygons.
- class geowombat.detect.TorchGeoDetector(model='faster-rcnn', weights=None, num_classes=None, classes=None, device='auto')[source]#
Bases:
GeoWombatDetectorDetection wrapper around TorchGeo / torchvision detection models.
Supports Faster R-CNN and RetinaNet via torchvision with optional pretrained weights from TorchGeo (e.g. xView). Axis-aligned only.
- Parameters:
- model{‘faster-rcnn’, ‘retinanet’}
Detection head. Default ‘faster-rcnn’.
- weightsstr, optional
TorchGeo weights enum string (e.g. ‘FCN_RESNET50_XVIEW’) or path to a state dict. If None, uses torchvision COCO pretrained.
- num_classesint, optional
Number of classes including background. Required when loading a custom-trained checkpoint.
- classeslist of str, optional
Class names corresponding to non-background ids 1..N.
- devicestr
‘cpu’, ‘cuda’, or ‘auto’. Default ‘auto’.
- Attributes:
- class_names
Methods
predict(src[, tile_size, overlap, conf, ...])Run tiled, georeferenced inference over a raster.
Notes
For aerial / satellite imagery, the most useful TorchGeo weights are typically ‘FASTERRCNN_RESNET50_FPN_XVIEW’ (when available in your TorchGeo version). Check
torchgeo.modelsfor the current catalogue.
- class geowombat.detect.YOLODetector(weights='yolov8n.pt', classes=None, oriented=None, device='auto', imgsz=None)[source]#
Bases:
GeoWombatDetectorUltralytics YOLO detector for georeferenced rasters.
Supports axis-aligned (default) and oriented bounding boxes. The underlying model is any path accepted by
ultralytics.YOLO— pretrained weights (‘yolov8n.pt’, ‘yolo11n.pt’, ‘yolov8n-obb.pt’) or a custom-trained checkpoint.NOTE: Ultralytics is licensed AGPL-3.0; ensure your use case is compatible before deploying.
- Parameters:
- weightsstr or Path
YOLO weights file. Default ‘yolov8n.pt’.
- classeslist of str, optional
Override class names. If None, names come from the model.
- orientedbool
Set to True when using an OBB weight (file ends in
-obb.pt). Auto-detected from the filename if not specified.- devicestr
‘cpu’, ‘cuda’, or ‘auto’. Default ‘auto’.
- imgszint
Inference size passed to YOLO. Default matches
tile_sizeinpredict().
- Attributes:
- class_names
Methods
fit(dataset_yaml[, epochs, imgsz])Fine-tune YOLO on a dataset produced by
build_yolo_dataset.predict(src[, tile_size, overlap, conf, ...])Run tiled, georeferenced inference over a raster.
- fit(dataset_yaml, epochs=50, imgsz=640, **kwargs)[source]#
Fine-tune YOLO on a dataset produced by
build_yolo_dataset.Thin wrapper around
ultralytics.YOLO.train.- Parameters:
- dataset_yamlstr or Path
Path to
data.yamlwritten bybuild_yolo_dataset.- epochsint
Training epochs. Default 50.
- imgszint
Training image size. Default 640.
- **kwargs
Additional kwargs forwarded to
YOLO.train.
- geowombat.detect.boxes_from_polygons(gdf, oriented=False)[source]#
Convert polygon geometries to bounding-box geometries.
Two flavors of bounding box are supported:
Axis-aligned (AABB) —
oriented=False(default). Sides parallel to the image axes. Right for objects that line up with the grid: buildings in a nadir aerial frame, cars in a parking lot, parcels.Oriented (OBB) —
oriented=True. Minimum rotated rectangle around each polygon. Right for objects that appear at arbitrary angles in overhead imagery — ships, planes on a tarmac, vehicles on diagonal roads, storage tanks viewed off-nadir.
For overhead / aerial work, OBB is almost always the better choice and should be paired with weights pretrained on the DOTA-v1 benchmark (e.g.
yolov8n-obb.pt). Mixing OBB labels with non-OBB weights will fail at training time.Quality of OBB output depends on the input polygon. The minimum rotated rectangle uses the polygon’s extreme points, so loose blob-shaped digitization yields a sloppy OBB. Trace tightly along the object’s long axis when labeling.
- Parameters:
- gdfgeopandas.GeoDataFrame
Input labels. Geometries may be
PolygonorMultiPolygon. Pass-through for any geometry that is already a box.- orientedbool
If True, return minimum rotated rectangles (4-corner polygons). If False, return axis-aligned envelopes (default).
- Returns:
- geopandas.GeoDataFrame
Same columns as input with geometries replaced by boxes. A new column
_box_kindis added:'aabb'or'obb'.
See also
build_yolo_datasetCalls this internally when
oriented=True.
- geowombat.detect.build_dataset(src, labels, class_col, out_dir, tile_size=640, overlap=0.1, val_split=0.2, min_box_pixels=8, background_ratio=0.0, band_indices=None, scale=None, oriented=False, image_format='jpg', seed=42, class_names=None)#
Write a YOLO-format training dataset from a raster + label GDF.
- Parameters:
- srcxarray.DataArray
Raster opened with
gw.open().- labelsgeopandas.GeoDataFrame, str, or Path
Vector labels. Polygons are converted to bounding boxes; existing box geometries are used as-is.
- class_colstr
Column in
labelsholding class name/id.- out_dirstr or Path
Output directory. Will be created if missing. The Ultralytics layout
images/{train,val}+labels/{train,val}is written plus adata.yamlat the root.- tile_sizeint
Square tile edge in pixels. Default 640.
- overlapfloat
Fractional overlap between adjacent tiles (0..0.9). Default 0.1.
- val_splitfloat
Fraction of tiles assigned to the validation split. Default 0.2.
- min_box_pixelsint
Minimum width or height (in pixels) for a box to be kept after tile clipping. Default 8.
- background_ratiofloat
Fraction (0..1) of empty tiles to retain. Default 0 (drop all).
- band_indiceslist of int, optional
Three band indices (0-based) for the R, G, B channels. Required for non-3-band rasters or non-uint8 data unless the source is already 3-band uint8.
- scaletuple of (lo, hi), optional
Linear stretch applied before writing. If None and dtype is uint8, no stretch is applied; otherwise a per-tile 2-98 pct stretch is used.
- orientedbool
If True, write OBB labels (8 corner coords). Default False.
- image_format{‘jpg’, ‘png’}
Tile image format. Default ‘jpg’.
- seedint
RNG seed for train/val split. Default 42.
- class_nameslist of str, optional
Override class ordering. If None, classes are taken from
labels[class_col]sorted alphabetically.
- Returns:
- dict
Summary with keys
out_dir,classes,n_train,n_val,n_boxes.
- geowombat.detect.build_yolo_dataset(src, labels, class_col, out_dir, tile_size=640, overlap=0.1, val_split=0.2, min_box_pixels=8, background_ratio=0.0, band_indices=None, scale=None, oriented=False, image_format='jpg', seed=42, class_names=None)[source]#
Write a YOLO-format training dataset from a raster + label GDF.
- Parameters:
- srcxarray.DataArray
Raster opened with
gw.open().- labelsgeopandas.GeoDataFrame, str, or Path
Vector labels. Polygons are converted to bounding boxes; existing box geometries are used as-is.
- class_colstr
Column in
labelsholding class name/id.- out_dirstr or Path
Output directory. Will be created if missing. The Ultralytics layout
images/{train,val}+labels/{train,val}is written plus adata.yamlat the root.- tile_sizeint
Square tile edge in pixels. Default 640.
- overlapfloat
Fractional overlap between adjacent tiles (0..0.9). Default 0.1.
- val_splitfloat
Fraction of tiles assigned to the validation split. Default 0.2.
- min_box_pixelsint
Minimum width or height (in pixels) for a box to be kept after tile clipping. Default 8.
- background_ratiofloat
Fraction (0..1) of empty tiles to retain. Default 0 (drop all).
- band_indiceslist of int, optional
Three band indices (0-based) for the R, G, B channels. Required for non-3-band rasters or non-uint8 data unless the source is already 3-band uint8.
- scaletuple of (lo, hi), optional
Linear stretch applied before writing. If None and dtype is uint8, no stretch is applied; otherwise a per-tile 2-98 pct stretch is used.
- orientedbool
If True, write OBB labels (8 corner coords). Default False.
- image_format{‘jpg’, ‘png’}
Tile image format. Default ‘jpg’.
- seedint
RNG seed for train/val split. Default 42.
- class_nameslist of str, optional
Override class ordering. If None, classes are taken from
labels[class_col]sorted alphabetically.
- Returns:
- dict
Summary with keys
out_dir,classes,n_train,n_val,n_boxes.
- geowombat.detect.detection_accuracy(predictions, truth, class_col='class_name', iou_thresholds=(0.5,), score_col='score', class_agnostic=False)[source]#
Compute detection accuracy metrics + a review-ready GeoDataFrame.
- Parameters:
- predictionsgeopandas.GeoDataFrame
Detector output. Must include
geometry,score(orscore_col), and a class column.- truthgeopandas.GeoDataFrame
Ground-truth boxes/polygons in the same CRS.
- class_colstr
Column with class labels in both inputs.
- iou_thresholdssequence of float, or ‘coco’
IoU thresholds to evaluate.
'coco'expands to 0.5..0.95 step 0.05 and returns mAP@[.5:.95].- score_colstr
Score column in predictions. Default ‘score’.
- class_agnosticbool
If True, ignore class labels (treat as a single class).
- Returns:
- dict
- Keys:
metrics: DataFrame indexed by class with columnsprecision, recall, f1, ap, tp, fp, fn, supportfor each IoU threshold.summary: dict of overall mAP per threshold + mAP@[.5:.95].matched: GeoDataFrame of TP/FP/FN rows ready for QGIS review (seeexport_for_review).confusion: DataFrame of class confusion (per-class truth vs. matched-prediction class) at the lowest IoU.
- geowombat.detect.export_for_review(matched_gdf, out_path, layer='detections_review')[source]#
Write a matched GeoDataFrame to GeoPackage for QGIS review.
The output file is suitable for the QGIS attribute form workflow and feature-stepping plugins (e.g. GoToNextFeature3+). The
reviewer_labelfield is empty for the user to fill in (values like'TP','FP','FN','unclear'are recommended);notesis free text.- Parameters:
- matched_gdfgeopandas.GeoDataFrame
Output from
detection_accuracy.- out_pathstr or Path
Destination .gpkg path.
- layerstr
GeoPackage layer name.
- geowombat.detect.fit(detector, dataset_yaml, **kwargs)[source]#
Fine-tune a detector on a YOLO-format dataset.
- Return type:
Any- Parameters:
- detectorYOLODetector
Detector to fine-tune (only YOLO supports
.fittoday).- dataset_yamlstr or Path
Path to
data.yamlwritten bybuild_dataset.- **kwargs
Forwarded to
detector.fit(epochs,imgsz, …).
- Returns:
- The underlying training results object.
- geowombat.detect.fit_predict(src, detector, labels, class_col, out_dir, tile_size=640, overlap=0.1, epochs=50, predict_kwargs=None, **dataset_kwargs)[source]#
Build a training dataset, fine-tune, and predict in one call.
Mirrors
gw.ml.fit_predictfor classification: end-to-end from raster + labels to predictions.- Return type:
Tuple[GeoDataFrame,dict]- Parameters:
- srcxarray.DataArray
Raster opened with
gw.open().- detectorYOLODetector
Detector to fine-tune and run inference with.
- labelsgeopandas.GeoDataFrame, str, or Path
Vector labels.
- class_colstr
Column in
labelsholding class name/id.- out_dirstr or Path
Output directory for the generated YOLO dataset.
- tile_sizeint
Tile edge in pixels. Default 640.
- overlapfloat
Tile overlap for both dataset creation and inference. Default 0.1.
- epochsint
Fine-tuning epochs. Default 50.
- predict_kwargsdict, optional
Extra kwargs passed to
detector.predict.- **dataset_kwargs
Extra kwargs passed to
build_dataset(e.g.val_split,min_box_pixels,background_ratio,band_indices,scale,oriented).
- Returns:
- (geopandas.GeoDataFrame, dict)
Predictions and the dataset-build summary.
- geowombat.detect.plot_detections(src, predictions=None, truth=None, ax=None, band_indices=None, scale=None, max_features=2000)[source]#
Plot a raster with detections color-coded by TP / FP / FN.
- Parameters:
- srcxarray.DataArray
Background raster opened with
gw.open().- predictionsgeopandas.GeoDataFrame, optional
Detector output. Plotted in solid lines.
- truthgeopandas.GeoDataFrame, optional
Ground truth. Plotted as dashed lines.
- axmatplotlib.axes.Axes, optional
Existing axes. A new figure is created if None.
- band_indiceslist of int, optional
RGB bands for the background.
- scaletuple of (lo, hi), optional
Stretch range for background display.
- max_featuresint
Subsample to this many features before plotting to keep the figure responsive on large outputs. Default 2000.
- Returns:
- matplotlib.axes.Axes
- geowombat.detect.predict(src, detector, **kwargs)[source]#
Run tiled, georeferenced inference over a raster.
Thin wrapper around
detector.predict(src, **kwargs)so detection follows the same module-level call shape asgw.ml.predict.- Return type:
GeoDataFrame- Parameters:
- srcxarray.DataArray
Raster opened with
gw.open().- detectorYOLODetector or TorchGeoDetector
Pre-built detector instance.
- **kwargs
Forwarded to
detector.predict(tile_size,overlap,conf,band_indices,scale,nms_iou,max_det,progress).
- Returns:
- geopandas.GeoDataFrame
Detections in the source CRS.
- geowombat.detect.recompute_from_review(review_input, class_col='class_name', layer='detections_review')[source]#
Recompute metrics after a human has edited
reviewer_label.The reviewer is expected to fill
reviewer_labelwith one of:'TP','FP','FN', or'unclear'(case-insensitive). Rows where the label is blank are treated as the automatic status.'unclear'rows are excluded from the metrics entirely.- Parameters:
- review_inputstr, Path, or GeoDataFrame
Reviewed GeoPackage path or in-memory GeoDataFrame.
- class_colstr
Column to use for per-class breakdown.
- layerstr
GeoPackage layer if reading from disk.
- Returns:
- dict
Per-class and overall counts/precision/recall/F1 after incorporating human review.
Functions#
|
Convert polygon geometries to bounding-box geometries. |
|
Write a YOLO-format training dataset from a raster + label GDF. |
|
Write a YOLO-format training dataset from a raster + label GDF. |
|
Compute detection accuracy metrics + a review-ready GeoDataFrame. |
|
Write a matched GeoDataFrame to GeoPackage for QGIS review. |
|
Fine-tune a detector on a YOLO-format dataset. |
|
Build a training dataset, fine-tune, and predict in one call. |
|
Plot a raster with detections color-coded by TP / FP / FN. |
|
Run tiled, georeferenced inference over a raster. |
|
Recompute metrics after a human has edited |
Classes#
|
Refine bounding-box detections into polygons using SAM. |
|
Detection wrapper around TorchGeo / torchvision detection models. |
|
Ultralytics YOLO detector for georeferenced rasters. |
geowombat.radiometry Package#
- class geowombat.radiometry.BRDF[source]#
Bases:
GeoVolKernelsA class for Bidirectional Reflectance Distribution Function (BRDF) normalization.
Methods
get_mean_sza(central_latitude)Returns the mean solar zenith angle (SZA) as a function of the central latitude.
norm_brdf(data, solar_za, solar_az, ...[, ...])Applies Nadir Bidirectional Reflectance Distribution Function (BRDF) normalization using the global c-factor method
get_kernels
- norm_brdf(data, solar_za, solar_az, sensor_za, sensor_az, central_latitude=None, sensor=None, wavelengths=None, src_nodata=-32768, dst_nodata=-32768, mask=None, scale_factor=1.0, out_range=None, scale_angles=True, vol_weight=1.0)[source]#
Applies Nadir Bidirectional Reflectance Distribution Function (BRDF) normalization using the global c-factor method
- Parameters:
data (2d or 3d DataArray) – The data to normalize.
solar_za (2d DataArray) – The solar zenith angles (degrees).
solar_az (2d DataArray) – The solar azimuth angles (degrees).
sensor_za (2d DataArray) – The sensor azimuth angles (degrees).
sensor_az (2d DataArray) – The sensor azimuth angles (degrees).
central_latitude (Optional[float or 2d DataArray]) – The central latitude.
sensor (Optional[str]) – The satellite sensor.
wavelengths (str list) – The wavelength(s) to normalize.
src_nodata (Optional[int or float]) – The input ‘no data’ value.
dst_nodata (Optional[int or float]) – The output ‘no data’ value.
mask (Optional[DataArray]) – A data mask, where clear values are 0.
scale_factor (Optional[float]) – A scale factor to apply to the input data.
out_range (Optional[float]) – The out data range. If not given, the output data are return in a 0-1 range.
scale_angles (Optional[bool]) – Whether to scale the pixel angle arrays.
vol_weight (Optional[float]) – Weight to be applied to the volumetric kernel in the c-factor equation
References
See [RZJ+16] for the c-factor method.
For further background on BRDF:
- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> from geowombat.radiometry import BRDF >>> >>> brdf = BRDF() >>> >>> # Example where pixel angles are stored in separate GeoTiff files >>> with gw.config.update(sensor='l7', scale_factor=0.0001): >>> >>> with gw.open('solarz.tif') as solarz, >>> gw.open('solara.tif') as solara, >>> gw.open('sensorz.tif') as sensorz, >>> gw.open('sensora.tif') as sensora: >>> >>> with gw.open('landsat.tif') as src: >>> src_norm = brdf.norm_brdf(src, solarz, solara, sensorz, sensora)
- class geowombat.radiometry.HLSFmaskBits(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
EnumHLS Fmask quality bit definitions (uint8).
- class geowombat.radiometry.LinearAdjustments[source]#
Bases:
objectA class for linear bandpass adjustments.
Methods
bandpass(data[, sensor, to, band_names, ...])Applies a bandpass adjustment by applying a linear function to surface reflectance values.
- bandpass(data, sensor=None, to='l8', band_names=None, scale_factor=1, src_nodata=0, dst_nodata=0)[source]#
Applies a bandpass adjustment by applying a linear function to surface reflectance values.
- Parameters:
data (DataArray) – The data to adjust.
sensor (Optional[str]) – The sensor to adjust.
to (Optional[str]) – The sensor to adjust to.
band_names (Optional[list]) – The bands to adjust. If not given, all bands are adjusted.
scale_factor (Optional[float]) – A scale factor to apply to the input data.
src_nodata (Optional[int or float]) – The input ‘no data’ value.
dst_nodata (Optional[int or float]) – The output ‘no data’ value.
Reference:
Sentinel-2 and Landsat 8:
https://hls.gsfc.nasa.gov/algorithms/bandpass-adjustment/
See [CHGF19] for further details
Landsat 7 and Landsat 8:
See [RZJ+16] (Table 2)
- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> from geowombat.radiometry import LinearAdjustments >>> >>> la = LinearAdjustments() >>> >>> # Adjust all Sentinel-2 bands to Landsat 8 >>> with gw.config.update(sensor='s2'): >>> with gw.open('sentinel-2.tif') as ds: >>> ds_adjusted = la.bandpass(ds, to='l8')
- class geowombat.radiometry.QABits(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
EnumQA bits.
- class geowombat.radiometry.QAMasker(qa, sensor, mask_items=None, modis_qa_band=1, modis_quality=2, confidence_level='yes')[source]#
Bases:
objectA class for masking bit-packed quality flags.
- Parameters:
qa (DataArray) – The band quality array.
sensor (str) –
The sensor name. Choices are [‘ard’, ‘hls’, ‘l8-pre’, ‘l8-c1’, ‘l-c1’, ‘modis’, ‘s2a’, ‘s2c’].
- Codes:
- ’ard’:
USGS Landsat Analysis Ready Data <https://www.usgs.gov/land-resources/nli/landsat/us-landsat-analysis-ready-data?qt-science_support_page_related_con=0#qt-science_support_page_related_con>`_
- ’hls’:
- ’l-c1’:
Landsat Collection 1 L4-5 and L7
- ’l8-c1’:
Landsat Collection 1 L8
- ’s2a’:
Sentinel 2A (surface reflectance)
- ’s2c’:
Sentinel 2C (top of atmosphere)
mask_items (str list) – A list of items to mask.
modis_qa_position (Optional[int]) – The MODIS QA band position. Default is 1.
modis_quality (Optional[int]) – The MODIS quality level. Default is 2.
confidence_level (Optional[str]) – The confidence level. Choices are [‘notdet’, ‘no’, ‘maybe’, ‘yes’].
References
- Landsat Collection 1:
Examples
>>> import geowombat as gw >>> from geowombat.radiometry import QAMasker >>> >>> # Get the MODIS cloud mask. >>> with gw.open('qa.tif') as qa: >>> mask = QAMasker(qs, 'modis').to_mask() >>> >>> # NASA HLS >>> with gw.open('qa.tif') as qa: >>> mask = QAMasker(qs, 'hls', ['cloud']).to_mask()
Methods
to_mask()Converts QA bit-packed data to an integer mask.
- class geowombat.radiometry.RadTransforms[source]#
Bases:
MetaDataA class for radiometric transformations.
Methods
dn_to_radiance(dn, gain, bias)Converts digital numbers to radiance.
dn_to_sr(dn, solar_za, solar_az, sensor_za, ...)Converts digital numbers to surface reflectance.
dn_to_toar(dn, gain, bias[, solar_za, ...])Converts digital numbers to top-of-atmosphere reflectance.
get_landsat_coefficients(meta_file)Gets coefficients from a Landsat metadata file.
get_sentinel_coefficients(meta_file)Gets coefficients from a Sentinel metadata file.
radiance_to_toar(radiance, solar_za, global_args)Converts radiance to top-of-atmosphere reflectance.
toar_to_rad(toar, meta)Converts top of atmosphere reflectance to top of atmosphere radiance.
toar_to_sr(toar, solar_za, solar_az, ...[, ...])Converts top-of-atmosphere reflectance to surface reflectance.
- dn_to_radiance(dn, gain, bias)[source]#
Converts digital numbers to radiance.
- Parameters:
dn (DataArray) – The digital number data to calibrate.
gain (DataArray) – A gain value.
bias (DataArray) – A bias value.
- Returns:
xarray.DataArray
- dn_to_sr(dn, solar_za, solar_az, sensor_za, sensor_az, src_nodata=-32768, dst_nodata=-32768, sensor=None, method='srem', angle_factor=0.01, meta=None, interp_method='fast', **kwargs)[source]#
Converts digital numbers to surface reflectance.
- Parameters:
dn (DataArray) – The digital number data to calibrate.
solar_za (DataArray) – The solar zenith angle.
solar_az (DataArray) – The solar azimuth angle.
sensor_za (DataArray) – The sensor, or view, zenith angle.
sensor_az (DataArray) – The sensor, or view, azimuth angle.
src_nodata (Optional[int or float]) – The input ‘no data’ value.
dst_nodata (Optional[int or float]) – The output ‘no data’ value.
sensor (Optional[str]) – The data’s sensor.
method (Optional[str]) – The correction method to use. Choices are [‘srem’, ‘6s’].
angle_factor (Optional[float]) – The scale factor for angles.
meta (Optional[namedtuple]) – A metadata object with gain and bias coefficients.
interp_method (Optional[str]) – The LUT interpolation method if
method= ‘6s’. Choices are [‘fast’, ‘slow’]. ‘fast’: Uses nearest neighbor lookup withscipy.interpolate.NearestNDInterpolator. ‘slow’: Uses linear interpolation withscipy.interpolate.LinearNDInterpolator.kwargs (Optional[dict]) – Extra keyword arguments passed to
radiometry.sixs.SixS().rad_to_sr.
References
https://www.usgs.gov/land-resources/nli/landsat/using-usgs-landsat-level-1-data-product
- Returns:
Data range: 0-1
- Return type:
xarray.DataArray
Examples
>>> from geowombat.radiometry import RadTransforms >>> >>> sr = RadTransforms() >>> meta = sr.get_landsat_coefficients('file.MTL') >>> >>> # Convert DNs to surface reflectance using Landsat metadata >>> with gw.open('dn.tif') as ds: >>> sr_data = sr.dn_to_sr(ds, solar_za, sensor_za, meta=meta)
- dn_to_toar(dn, gain, bias, solar_za=None, angle_factor=0.01, sun_angle=True)[source]#
Converts digital numbers to top-of-atmosphere reflectance.
- Parameters:
dn (DataArray) – The digital number data to calibrate.
gain (DataArray | dict) – A gain value.
bias (DataArray | dict) – A bias value.
solar_za (DataArray) – The solar zenith angle.
angle_factor (Optional[float]) – The scale factor for angles.
sun_angle (Optional[bool]) – Whether to correct for the sun angle.
- Returns:
xarray.DataArray
- static radiance_to_toar(radiance, solar_za, global_args)[source]#
Converts radiance to top-of-atmosphere reflectance.
- Parameters:
radiance (DataArray) – The radiance data to calibrate.
solar_za (DataArray) – The solar zenith angle.
global_args (namedtuple) – Global arguments.
- Returns:
xarray.DataArray
- static toar_to_rad(toar, meta)[source]#
Converts top of atmosphere reflectance to top of atmosphere radiance.
- Parameters:
toar (DataArray) – The top of atmosphere reflectance (0-1).
sza (float | DataArray) – The solar zenith angle (in degrees).
meta (Optional[namedtuple]) – A metadata object with gain and bias coefficients.
- Returns:
xarray.DataArray
- toar_to_sr(toar, solar_za, solar_az, sensor_za, sensor_az, sensor=None, src_nodata=-32768, dst_nodata=-32768, method='srem', angle_factor=0.01, meta=None, interp_method='fast', **kwargs)[source]#
Converts top-of-atmosphere reflectance to surface reflectance.
- Parameters:
toar (DataArray) – The top-of-atmosphere reflectance (0-1).
solar_za (float | DataArray) – The solar zenith angle.
solar_az (DataArray) – The solar azimuth angle.
sensor_za (DataArray) – The sensor zenith angle.
sensor_az (DataArray) – The sensor azimuth angle.
sensor (Optional[str]) – The data’s sensor.
src_nodata (Optional[int or float]) – The input ‘no data’ value.
dst_nodata (Optional[int or float]) – The output ‘no data’ value.
method (Optional[str]) –
The method to use. Choices are [‘srem’, ‘6s’].
- Choices:
’srem’: A Simplified and Robust Surface Reflectance Estimation Method (SREM)
angle_factor (Optional[float]) – The scale factor for angles.
meta (Optional[namedtuple]) – A metadata object with gain and bias coefficients.
interp_method (Optional[str]) – The LUT interpolation method if
method= ‘6s’. Choices are [‘fast’, ‘slow’]. ‘fast’: Uses nearest neighbor lookup withscipy.interpolate.NearestNDInterpolator. ‘slow’: Uses linear interpolation withscipy.interpolate.LinearNDInterpolator.kwargs (Optional[dict]) – Extra keyword arguments passed to
radiometry.sixs.SixS().toar_to_sr.
References
See [BNN+19] for the SREM method.
See [PFRS10] and [LLS+20] for the 6S method.
- Returns:
Data range: 0-1
- Return type:
xarray.DataArray
- class geowombat.radiometry.SCLValues(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
EnumSentinel-2 Scene Classification Layer (SCL) values.
- Reference:
https://sentinels.copernicus.eu/web/sentinel/ technical-guides/sentinel-2-msi/level-2a/algorithm-overview
- class geowombat.radiometry.SixS[source]#
Bases:
Altitude,SixSMixinA class to handle loading, downloading and interpolating of LUTs (look up tables) used by the 6S emulator.
- Parameters:
sensor (str) – The sensor to adjust.
rad_scale (Optional[float]) – The radiance scale factor. Scaled values should be in the range [0,1000].
angle_factor (Optional[float]) – The angle scale factor.
Example
>>> sixs = SixS('l5', verbose=1) >>> >>> with gw.config.update(sensor='l7'): >>> with gw.open('image.tif') as src, gw.open('solar_za') as sza: >>> sixs.rad_to_sr(src, 'blue', sza, doy, h2o=1.0, o3=0.4, aot=0.3)
Methods
rad_to_sr(data, sensor, wavelength, sza, doy)Converts radiance to surface reflectance using a 6S radiative transfer model lookup table.
toar_to_sr(data, sensor, wavelength, sza, doy)Converts top of atmosphere reflectance to surface reflectance using 6S outputs.
get_mean_altitude
prepare_coeff
- rad_to_sr(data, sensor, wavelength, sza, doy, src_nodata=-32768, dst_nodata=-32768, angle_factor=0.01, interp_method='fast', h2o=1.0, o3=0.4, aot=0.3, altitude=0.0, n_jobs=1)[source]#
Converts radiance to surface reflectance using a 6S radiative transfer model lookup table.
- Parameters:
data (DataArray) – The data to correct, in radiance.
sensor (str) – The sensor name.
wavelength (str) – The band wavelength to process.
sza (float | DataArray) – The solar zenith angle.
doy (int) – The day of year.
src_nodata (Optional[int or float]) – The input ‘no data’ value.
dst_nodata (Optional[int or float]) – The output ‘no data’ value.
angle_factor (Optional[float]) – The scale factor for angles.
interp_method (Optional[str]) – The LUT interpolation method. Choices are [‘fast’, ‘slow’]. ‘fast’: Uses nearest neighbor lookup with
scipy.interpolate.NearestNDInterpolator. ‘slow’: Uses linear interpolation withscipy.interpolate.LinearNDInterpolator.h2o (Optional[float]) – The water vapor (g/m^2). [0,8.5].
o3 (Optional[float]) – The ozone (cm-atm). [0,8].
aot (Optional[float | DataArray]) – The aerosol optical thickness (unitless). [0,3].
altitude (Optional[float]) – The altitude over the sensor acquisition location.
n_jobs (Optional[int]) – The number of parallel jobs for
dask.compute.
- Returns:
Data range: 0-1
- Return type:
xarray.DataArray
- toar_to_sr(data, sensor, wavelength, sza, doy, src_nodata=-32768, dst_nodata=-32768, angle_factor=0.01, interp_method='fast', h2o=1.0, o3=0.4, aot=0.3, altitude=0.0, n_jobs=1)[source]#
Converts top of atmosphere reflectance to surface reflectance using 6S outputs.
- Parameters:
data (DataArray) – The top of atmosphere reflectance.
sensor (str) – The sensor name.
wavelength (str) – The band wavelength to process.
sza (float | DataArray) – The solar zenith angle.
doy (int) – The day of year.
src_nodata (Optional[int or float]) – The input ‘no data’ value.
dst_nodata (Optional[int or float]) – The output ‘no data’ value.
angle_factor (Optional[float]) – The scale factor for angles.
interp_method (Optional[str]) – The LUT interpolation method. Choices are [‘fast’, ‘slow’]. ‘fast’: Uses nearest neighbor lookup with
scipy.interpolate.NearestNDInterpolator. ‘slow’: Uses linear interpolation withscipy.interpolate.LinearNDInterpolator.h2o (Optional[float]) – The water vapor (g/m^2). [0,8.5].
o3 (Optional[float]) – The ozone (cm-atm). [0,8].
aot (Optional[float | DataArray]) – The aerosol optical thickness (unitless). [0,3].
altitude (Optional[float]) – The altitude over the sensor acquisition location.
n_jobs (Optional[int]) – The number of parallel jobs for
dask.compute.
- 6S model outputs:
- t_g (float): The total gaseous transmission of the atmosphere.
s.run() –> s.outputs.total_gaseous_transmittance
- p_alpha (float): The atmospheric reflectance.
s.run() –> s.outputs.atmospheric_intrinsic_reflectance
- s (float): The spherical albedo of the atmosphere.
s.run() –> s.outputs.spherical_albedo
- t_s (float): The atmospheric transmittance from sun to target.
s.run() –> s.outputs.transmittance_total_scattering.downward
- t_v (float): The atmospheric transmittance from target to satellite.
s.run() –> s.outputs.transmittance_total_scattering.upward
- class geowombat.radiometry.Topo[source]#
Bases:
objectA class for topographic normalization.
Methods
norm_topo(data, elev, solar_za, solar_az[, ...])Applies topographic normalization.
- norm_topo(data, elev, solar_za, solar_az, slope=None, aspect=None, method='empirical-rotation', slope_thresh=2, nodata=0, elev_nodata=-32768, scale_factor=1, angle_scale=0.01, n_jobs=1, robust=False, min_samples=100, slope_kwargs=None, aspect_kwargs=None, band_coeffs=None)[source]#
Applies topographic normalization.
- Parameters:
data (2d or 3d DataArray) – The data to normalize, in the range 0-1.
elev (2d DataArray) – The elevation data.
solar_za (2d DataArray) – The solar zenith angles (degrees).
solar_az (2d DataArray) – The solar azimuth angles (degrees).
slope (2d DataArray) – The slope data. If not given, slope is calculated from
elev.aspect (2d DataArray) – The aspect data. If not given, aspect is calculated from
elev.method (Optional[str]) – The method to apply. Choices are [‘c’, ‘empirical-rotation’].
slope_thresh (Optional[float or int]) – The slope threshold. Any samples with values <
slope_threshare not adjusted.nodata (Optional[int or float]) – The ‘no data’ value for
data.elev_nodata (Optional[float or int]) – The ‘no data’ value for
elev.scale_factor (Optional[float]) – A scale factor to apply to the input data.
angle_scale (Optional[float]) – The angle scale factor.
n_jobs (Optional[int]) – The number of parallel workers for
LinearRegression.fit.robust (Optional[bool]) – Whether to fit a robust regression.
min_samples (Optional[int]) – The minimum number of samples required to fit a regression.
slope_kwargs (Optional[dict]) – Keyword arguments passed to
gdal.DEMProcessingOptionsto calculate the slope.aspect_kwargs (Optional[dict]) – Keyword arguments passed to
gdal.DEMProcessingOptionsto calculate the aspect.band_coeffs (Optional[dict]) – Slope and intercept coefficients for each band.
References
See [TGG82] for the C-correction method. See [TWM+10] for the Empirical Rotation method.
- Returns:
xarray.DataArray
Examples
>>> import geowombat as gw >>> from geowombat.radiometry import Topo >>> >>> topo = Topo() >>> >>> # Example where pixel angles are stored in separate GeoTiff files >>> with gw.config.update(sensor='l7', scale_factor=0.0001, nodata=0): >>> >>> with gw.open('landsat.tif') as src, >>> gw.open('srtm') as elev, >>> gw.open('solarz.tif') as solarz, >>> gw.open('solara.tif') as solara: >>> >>> src_norm = topo.norm_topo(src, elev, solarz, solara, n_jobs=-1)
- geowombat.radiometry.landsat_pixel_angles(angles_file, ref_file, out_dir, sensor, l57_angles_path=None, l8_angles_path=None, subsample=1, resampling='bilinear', num_workers=1, verbose=0, chunks=256)[source]#
Generates Landsat pixel angle files.
- Parameters:
angles_file (str) – The angles file.
ref_file (str) – A reference file.
out_dir (str) – The output directory.
sensor (str) – The sensor.
l57_angles_path (str) – The path to the Landsat 5 and 7 angles bin.
l8_angles_path (str) – The path to the Landsat 8 angles bin.
subsample (Optional[int]) – The sub-sample factor when calculating the angles.
resampling (Optional[str]) – The resampling method if
filenameis alist. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].num_workers (Optional[int]) – The maximum number of concurrent workers.
verbose (Optional[int]) – The verbosity level.
chunks (Optional[int]) – The file chunk size. Default is 256.
- Return type:
AngleInfo- Returns:
zenith and azimuth angles as a
namedtupleof angle file names
- geowombat.radiometry.sentinel_pixel_angles(metadata, ref_file, nodata=-32768, resampling='bilinear', num_workers=1, resample_res=60.0, chunksize=None)[source]#
Generates Sentinel pixel angle files.
- Parameters:
metadata (str) – The metadata file.
ref_file (str) – A reference image to use for geo-information.
nodata (Optional[int or float]) – The ‘no data’ value.
num_workers (Optional[int]) – The maximum number of concurrent workers.
resample_res (Optional[float]) – The resolution to resample to.
resampling (str) –
chunksize (Tuple[int, int]) –
- Return type:
AngleInfo
References
https://www.sentinel-hub.com/faq/how-can-i-access-meta-data-information-sentinel-2-l2a marujore/sentinel_angle_bands
- Return type:
AngleInfo- Returns:
zenith and azimuth angles as a
namedtupleof angle file names- Parameters:
metadata (str | Path) –
ref_file (str | Path) –
nodata (float | int) –
resampling (str) –
num_workers (int) –
resample_res (float | int) –
chunksize (Tuple[int, int]) –
Functions#
|
Generates Landsat pixel angle files. |
|
Generates Sentinel pixel angle files. |
Classes#
|
A class for Bidirectional Reflectance Distribution Function (BRDF) normalization. |
|
A class for topographic normalization. |
A class for linear bandpass adjustments. |
|
A class for radiometric transformations. |
|
|
Methods |
|
A class to handle loading, downloading and interpolating of LUTs (look up tables) used by the 6S emulator. |
|
A class for masking bit-packed quality flags. |
|
QA bits. |
|
Sentinel-2 Scene Classification Layer (SCL) values. |
|
HLS Fmask quality bit definitions (uint8). |