geowombat.core package#

Submodules#

geowombat.core.api module#

geowombat.core.api.load(image_list, time_names, band_names, chunks=512, nodata=65535, in_range=None, out_range=None, data_slice=None, num_workers=1, src=None, scheduler='ray')[source]#

Loads data into memory using xarray.open_mfdataset() and ray. This function does not check data alignments and CRSs. It assumes each image in image_list has the same y and x dimensions and that the coordinates align.

The load function cannot be used if dataclasses was pip installed.

Parameters:
  • image_list (list) – The list of image file paths.

  • time_names (list) – The list of image datetime objects.

  • band_names (list) – The list of bands to open.

  • chunks (Optional[int]) – The dask chunk size.

  • nodata (Optional[float | int]) – The ‘no data’ value.

  • in_range (Optional[tuple]) – The input (min, max) range. If not given, defaults to (0, 10000).

  • out_range (Optional[tuple]) – The output (min, max) range. If not given, defaults to (0, 1).

  • data_slice (Optional[tuple]) – The slice object to read, given as (time, bands, rows, columns).

  • num_workers (Optional[int]) – The number of threads.

  • scheduler (Optional[str]) – The distributed scheduler. Currently not implemented.

Returns:

Datetime list, array of (time x bands x rows x columns)

Return type:

list, numpy.ndarray

Example

>>> import datetime
>>> import geowombat as gw
>>>
>>> image_names = ['LT05_L1TP_227082_19990311_20161220_01_T1.nc',
>>>                'LT05_L1TP_227081_19990311_20161220_01_T1.nc',
>>>                'LT05_L1TP_227082_19990327_20161220_01_T1.nc']
>>>
>>> image_dates = [datetime.datetime(1999, 3, 11, 0, 0),
>>>                datetime.datetime(1999, 3, 11, 0, 0),
>>>                datetime.datetime(1999, 3, 27, 0, 0)]
>>>
>>> data_slice = (slice(0, None), slice(0, None), slice(0, 64), slice(0, 64))
>>>
>>> # Load data into memory
>>> dates, y = gw.load(image_names,
>>>                    image_dates,
>>>                    ['red', 'nir'],
>>>                    chunks=512,
>>>                    nodata=65535,
>>>                    data_slice=data_slice,
>>>                    num_workers=4)
class geowombat.core.api.open(filename, band_names=None, time_names=None, stack_dim='time', bounds=None, bounds_by='reference', resampling='nearest', persist_filenames=False, netcdf_vars=None, mosaic=False, overlap='max', nodata=None, scale_factor=None, offset=None, dtype=None, scale_data=False, num_workers=1, **kwargs)[source]#

Bases: object

Opens one or more raster files.

Parameters:
  • filename (str or list) – The file name, search string, or a list of files to open.

  • band_names (Optional[1d array-like]) – A list of band names if bounds is given or window is given. Default is None.

  • time_names (Optional[1d array-like]) – A list of names to give the time dimension if bounds is given. Default is None.

  • stack_dim (Optional[str]) – The stack dimension. Choices are [‘time’, ‘band’].

  • bounds (Optional[1d array-like]) – A bounding box to subset to, given as [minx, maxy, miny, maxx]. Default is None.

  • bounds_by (Optional[str]) –

    How to concatenate the output extent if filename is a list and mosaic = False. Choices are [‘intersection’, ‘union’, ‘reference’]. * reference: Use the bounds of the reference image. If a ref_image is not given, the first image in

    the filename list is used.

    • intersection: Use the intersection (i.e., minimum extent) of all the image bounds

    • union: Use the union (i.e., maximum extent) of all the image bounds

  • resampling (Optional[str]) – The resampling method if filename is a list. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].

  • persist_filenames (Optional[bool]) – Whether to persist the filenames list with the xarray.DataArray attributes. By default, persist_filenames=False to avoid storing large file lists.

  • netcdf_vars (Optional[list]) – NetCDF variables to open as a band stack.

  • mosaic (Optional[bool]) – If filename is a list, whether to mosaic the arrays instead of stacking.

  • overlap (Optional[str]) – The keyword that determines how to handle overlapping data if filenames is a list. Choices are [‘min’, ‘max’, ‘mean’].

  • nodata (Optional[float | int]) –

    A ‘no data’ value to set. Default is None. If nodata is None, the ‘no data’ value is set from the file metadata. If nodata is given, then the file ‘no data’ value is overridden. See docstring examples for use of nodata in geowombat.config.update.

    Note

    The geowombat.config.update overrides this argument. Thus, preference is always given in the following order:

    1. geowombat.config.update(nodata not None)

    2. open(nodata not None)

    3. file ‘no data’ value from metadata ‘_FillValue’ or ‘nodatavals’

  • scale_factor (Optional[float | int]) –

    A scale value to apply to the opened data. The same rules used in nodata apply. I.e.,

    Note

    The geowombat.config.update overrides this argument. Thus, preference is always given in the following order:

    1. geowombat.config.update(scale_factor not None)

    2. open(scale_factor not None)

    3. file scale value from metadata ‘scales’

  • offset (Optional[float | int]) –

    An offset value to apply to the opened data. The same rules used in nodata apply. I.e.,

    Note

    The geowombat.config.update overrides this argument. Thus, preference is always given in the following order:

    1. geowombat.config.update(offset not None)

    2. open(offset not None)

    3. file offset value from metadata ‘offsets’

  • dtype (Optional[str]) – A data type to force the output to. If not given, the data type is extracted from the file.

  • scale_data (Optional[bool]) –

    Whether to apply scaling to the opened data. Default is False. Scaled data are returned as:

    scaled = data * gain + offset

    See the arguments nodata, scale_factor, and offset for rules regarding how scaling is applied.

  • num_workers (Optional[int]) – The number of parallel workers for Dask if bounds is given or window is given. Default is 1.

  • kwargs (Optional[dict]) – Keyword arguments passed to the file opener.

Returns:

xarray.DataArray or xarray.Dataset

Examples

>>> import geowombat as gw
>>>
>>> # Open an image
>>> with gw.open('image.tif') as ds:
>>>     print(ds)
>>>
>>> # Open a list of images, stacking along the 'time' dimension
>>> with gw.open(['image1.tif', 'image2.tif']) as ds:
>>>     print(ds)
>>>
>>> # Open all GeoTiffs in a directory, stack along the 'time' dimension
>>> with gw.open('*.tif') as ds:
>>>     print(ds)
>>>
>>> # Use a context manager to handle images of difference sizes and projections
>>> with gw.config.update(ref_image='image1.tif'):
>>>     # Use 'time' names to stack and mosaic non-aligned images with identical dates
>>>     with gw.open(['image1.tif', 'image2.tif', 'image3.tif'],
>>>
>>>         # The first two images were acquired on the same date
>>>         #   and will be merged into a single time layer
>>>         time_names=['date1', 'date1', 'date2']) as ds:
>>>
>>>         print(ds)
>>>
>>> # Mosaic images across space using a reference
>>> #   image for the CRS and cell resolution
>>> with gw.config.update(ref_image='image1.tif'):
>>>     with gw.open(['image1.tif', 'image2.tif'], mosaic=True) as ds:
>>>         print(ds)
>>>
>>> # Mix configuration keywords
>>> with gw.config.update(ref_crs='image1.tif', ref_res='image1.tif', ref_bounds='image2.tif'):
>>>     # The ``bounds_by`` keyword overrides the extent bounds
>>>     with gw.open(['image1.tif', 'image2.tif'], bounds_by='union') as ds:
>>>         print(ds)
>>>
>>> # Resample an image to 10m x 10m cell size
>>> with gw.config.update(ref_crs=(10, 10)):
>>>     with gw.open('image.tif', resampling='cubic') as ds:
>>>         print(ds)
>>>
>>> # Open a list of images at a window slice
>>> from rasterio.windows import Window
>>> # Stack two images, opening band 3
>>> with gw.open(
>>>     ['image1.tif', 'image2.tif'],
>>>     band_names=['date1', 'date2'],
>>>     num_workers=8,
>>>     indexes=3,
>>>     window=Window(row_off=0, col_off=0, height=100, width=100),
>>>     dtype='float32'
>>> ) as ds:
>>>     print(ds)
>>>
>>> # Scale data upon opening, using the image metadata to get scales and offsets
>>> with gw.open('image.tif', scale_data=True) as ds:
>>>     print(ds)
>>>
>>> # Scale data upon opening, specifying scales and overriding metadata
>>> with gw.open('image.tif', scale_data=True, scale_factor=1e-4) as ds:
>>>     print(ds)
>>>
>>> # Scale data upon opening, specifying scales and overriding metadata
>>> with gw.config.update(scale_factor=1e-4):
>>>     with gw.open('image.tif', scale_data=True) as ds:
>>>         print(ds)
>>>
>>> # Open a NetCDF variable, specifying a NetCDF prefix and variable to open
>>> with gw.open('netcdf:image.nc:blue') as src:
>>>     print(src)
>>>
>>> # Open a NetCDF image without access to transforms by providing full file path
>>> # NOTE: This will be faster than the above method
>>> # as it uses ``xarray.open_dataset`` and bypasses CRS checks.
>>> # NOTE: The chunks must be provided by the user.
>>> # NOTE: Providing band names will ensure the correct order when reading from a NetCDF dataset.
>>> with gw.open(
>>>     'image.nc',
>>>     chunks={'band': -1, 'y': 256, 'x': 256},
>>>     band_names=['blue', 'green', 'red', 'nir', 'swir1', 'swir2'],
>>>     engine='h5netcdf'
>>> ) as src:
>>>     print(src)
>>>
>>> # Open multiple NetCDF variables as an array stack
>>> with gw.open('netcdf:image.nc', netcdf_vars=['blue', 'green', 'red']) as src:
>>>     print(src)

Methods

close

close()[source]#
geowombat.core.api.read(filename, band_names=None, time_names=None, bounds=None, chunks=256, num_workers=1, **kwargs)[source]#

Reads a window slice in-memory.

Parameters:
  • filename (str or list) – A file name or list of file names to open read.

  • band_names (Optional[list]) – A list of names to give the output band dimension.

  • time_names (Optional[list]) – A list of names to give the time dimension.

  • bounds (Optional[1d array-like]) – A bounding box to subset to, given as [minx, miny, maxx, maxy] or [left, bottom, right, top].

  • chunks (Optional[tuple]) – The data chunk size.

  • num_workers (Optional[int]) – The number of parallel dask workers.

  • kwargs (Optional[dict]) – Keyword arguments to pass to rasterio.write.

Returns:

xarray.DataArray

geowombat.core.api.read_list(file_list, chunks, **kwargs)[source]#
class geowombat.core.api.series(filenames, time_names=None, band_names=None, transfer_lib='jax', crs=None, res=None, bounds=None, resampling='nearest', nodata=0, warp_mem_limit=256, num_threads=1, window_size=None, padding=None)[source]#

Bases: BaseSeries

A class for time series concurrent processing on a GPU.

Parameters:
  • filenames (list) – The list of filenames to open.

  • band_names (Optional[list]) – The band associated names.

  • transfer_lib (Optional[str]) – The library to transfer data to. Choices are [‘jax’, ‘keras’, ‘numpy’, ‘pytorch’, ‘tensorflow’].

  • crs (Optional[str]) – The coordinate reference system.

  • res (Optional[list | tuple]) – The cell resolution.

  • bounds (Optional[object]) – The coordinate bounds.

  • resampling (Optional[str]) – The resampling method.

  • nodata (Optional[float | int]) – The ‘no data’ value.

  • warp_mem_limit (Optional[int]) – The rasterio warping memory limit (in MB).

  • num_threads (Optional[int]) – The number of rasterio warping threads.

  • window_size (Optional[int | list | tuple]) – The concurrent processing window size (height, width) or -1 (i.e., entire array).

  • padding (Optional[list | tuple]) – Padding for each window. padding should be given as a tuple of (left pad, bottom pad, right pad, top pad). If padding is given, the returned list will contain a tuple of rasterio.windows.Window objects as (w1, w2), where w1 contains the normal window offsets and w2 contains the padded window offsets.

  • time_names (list) –

Requirement:

> # CUDA 11.1 > pip install –upgrade “jax[cuda111]” -f https://storage.googleapis.com/jax-releases/jax_releases.html

Attributes:
band_dict
blockxsize
blockysize
count
crs
height
nchunks
nodata
transform
width
Parameters:
  • filenames (list) –

  • time_names (list) –

  • band_names (list) –

  • transfer_lib (str) –

  • crs (str) –

  • res (list | tuple) –

  • bounds (BoundingBox | list | tuple) –

  • resampling (str) –

  • nodata (float | int) –

  • warp_mem_limit (int) –

  • num_threads (int) –

  • window_size (int | list | tuple) –

  • padding (list | tuple) –

Methods

apply(func, bands[, gain, offset, ...])

Applies a function concurrently over windows.

group_dates(data, image_dates, band_names)

Groups data by dates.

read(bands[, window, gain, offset, pool, ...])

Reads a window.

ndarray_to_darray

open

warp

apply(func, bands, gain=1.0, offset=0.0, processes=False, num_workers=1, monitor_progress=True, outfile=None, bigtiff='NO', kwargs={})[source]#

Applies a function concurrently over windows.

Parameters:
  • func (object | str | list | tuple) – The function to apply. If func is a string, choices are [‘cv’, ‘max’, ‘mean’, ‘min’].

  • bands (list | int) – The bands to read.

  • gain (Optional[float]) – A gain factor to apply.

  • offset (Optional[float | int]) – An offset factor to apply.

  • processes (Optional[bool]) – Whether to use process workers, otherwise use threads.

  • num_workers (Optional[int]) – The number of concurrent workers.

  • monitor_progress (Optional[bool]) – Whether to monitor progress with a tqdm bar.

  • outfile (Optional[Path | str]) – The output file.

  • bigtiff (Optional[str]) – Whether to create a BigTIFF file. Choices are [‘YES’, ‘NO’,”IF_NEEDED”, “IF_SAFER”]. Default is ‘NO’.

  • kwargs (Optional[dict]) – Keyword arguments passed to rasterio open profile.

Returns:

Window, array, [datetime, …] If outfile is not None:

None, writes to outfile

Return type:

If outfile is None

Example

>>> import itertools
>>> import geowombat as gw
>>> import rasterio as rio
>>>
>>> # Import an image with 3 bands
>>> from geowombat.data import l8_224078_20200518
>>>
>>> # Create a custom class
>>> class TemporalMean(gw.TimeModule):
>>>
>>>     def __init__(self):
>>>         super(TemporalMean, self).__init__()
>>>
>>>     # The main function
>>>     def calculate(self, array):
>>>
>>>         sl1 = (slice(0, None), slice(self.band_dict['red'], self.band_dict['red']+1), slice(0, None), slice(0, None))
>>>         sl2 = (slice(0, None), slice(self.band_dict['green'], self.band_dict['green']+1), slice(0, None), slice(0, None))
>>>
>>>         vi = (array[sl1] - array[sl2]) / ((array[sl1] + array[sl2]) + 1e-9)
>>>
>>>         return vi.mean(axis=0).squeeze()
>>>
>>> with rio.open(l8_224078_20200518) as src:
>>>     res = src.res
>>>     bounds = src.bounds
>>>     nodata = 0
>>>
>>> # Open many files, each with 3 bands
>>> with gw.series([l8_224078_20200518]*100,
>>>                band_names=['blue', 'green', 'red'],
>>>                crs='epsg:32621',
>>>                res=res,
>>>                bounds=bounds,
>>>                nodata=nodata,
>>>                num_threads=4,
>>>                window_size=(1024, 1024)) as src:
>>>
>>>     src.apply(TemporalMean(),
>>>               bands=-1,             # open all bands
>>>               gain=0.0001,          # scale from [0,10000] -> [0,1]
>>>               processes=False,      # use threads
>>>               num_workers=4,        # use 4 concurrent threads, one per window
>>>               outfile='vi_mean.tif')
>>>
>>> # Import a single-band image
>>> from geowombat.data import l8_224078_20200518_B4
>>>
>>> # Open many files, each with 1 band
>>> with gw.series([l8_224078_20200518_B4]*100,
>>>                band_names=['red'],
>>>                crs='epsg:32621',
>>>                res=res,
>>>                bounds=bounds,
>>>                nodata=nodata,
>>>                num_threads=4,
>>>                window_size=(1024, 1024)) as src:
>>>
>>>     src.apply('mean',               # built-in function over single-band images
>>>               bands=1,              # open all bands
>>>               gain=0.0001,          # scale from [0,10000] -> [0,1]
>>>               num_workers=4,        # use 4 concurrent threads, one per window
>>>               outfile='red_mean.tif')
>>>
>>> with gw.series([l8_224078_20200518_B4]*100,
>>>                band_names=['red'],
>>>                crs='epsg:32621',
>>>                res=res,
>>>                bounds=bounds,
>>>                nodata=nodata,
>>>                num_threads=4,
>>>                window_size=(1024, 1024)) as src:
>>>
>>>     src.apply(['mean', 'max', 'cv'],    # calculate multiple statistics
>>>               bands=1,                  # open all bands
>>>               gain=0.0001,              # scale from [0,10000] -> [0,1]
>>>               num_workers=4,            # use 4 concurrent threads, one per window
>>>               outfile='stack_mean.tif')
read(bands, window=None, gain=1.0, offset=0.0, pool=None, num_workers=None, tqdm_obj=None)[source]#

Reads a window.

Return type:

Any

Parameters:
  • bands (int | list) –

  • window (Window | None) –

  • gain (float) –

  • offset (float | int) –

  • pool (Any | None) –

  • num_workers (int | None) –

  • tqdm_obj (Any | None) –

geowombat.core.base module#

class geowombat.core.base.PropertyMixin[source]#

Bases: object

Methods

check_sensor(data[, sensor, return_error])

Checks if a sensor name is provided.

check_sensor_band_names(data, sensor, band_names)

Checks if band names can be collected from a sensor's wavelength names.

static check_sensor(data, sensor=None, return_error=True)[source]#

Checks if a sensor name is provided.

static check_sensor_band_names(data, sensor, band_names)[source]#

Checks if band names can be collected from a sensor’s wavelength names.

geowombat.core.conversion module#

class geowombat.core.conversion.Converters[source]#

Bases: object

Methods

array_to_polygon(data[, mask, connectivity, ...])

Converts an xarray.DataArray` to a ``geopandas.GeoDataFrame

bounds_to_coords(bounds, dst_crs)

Converts bounds from longitude and latitude to native map coordinates.

coords_to_indices(x, y, transform)

Converts map coordinates to array indices.

dask_to_xarray(data, dask_data, band_names)

Converts a Dask array to an Xarray DataArray.

indices_to_coords(col_index, row_index, ...)

Converts array indices to map coordinates.

lonlat_to_xy(lon, lat, dst_crs)

Converts from longitude and latitude to native map coordinates.

ndarray_to_xarray(data, numpy_data, band_names)

Converts a NumPy array to an Xarray DataArray.

polygon_to_array(polygon[, col, data, ...])

Converts a polygon geometry to an xarray.DataArray.

polygons_to_points(data, df[, frac, ...])

Converts polygons to points.

xarray_to_xdataset(data_array, band_names, ...)

Converts an Xarray DataArray to a Xarray Dataset.

xy_to_lonlat(x, y, dst_crs)

Converts from native map coordinates to longitude and latitude.

prepare_points

static array_to_polygon(data, mask=None, connectivity=4, num_workers=1)[source]#

Converts an xarray.DataArray` to a ``geopandas.GeoDataFrame

Parameters:
  • data (DataArray) – The xarray.DataArray to convert.

  • mask (Optional[str, numpy ndarray, or rasterio Band object]) – Must evaluate to bool (rasterio.bool_ or rasterio.uint8). Values of False or 0 will be excluded from feature generation. Note well that this is the inverse sense from Numpy’s, where a mask value of True indicates invalid data in an array. If source is a Numpy masked array and mask is None, the source’s mask will be inverted and used in place of mask. If mask is equal to ‘source’, then data is used as the mask.

  • connectivity (Optional[int]) – Use 4 or 8 pixel connectivity for grouping pixels into features.

  • num_workers (Optional[int]) – The number of parallel workers to send to dask.compute().

Return type:

GeoDataFrame

Returns:

geopandas.GeoDataFrame

Example

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif') as src:
>>>
>>>     # Convert the input image to a GeoDataFrame
>>>     df = gw.array_to_polygon(
>>>         src,
>>>         mask='source',
>>>         num_workers=8
>>>     )
bounds_to_coords(bounds, dst_crs)[source]#

Converts bounds from longitude and latitude to native map coordinates.

Parameters:
  • bounds (tuple | rasterio.coords.BoundingBox) – The lat/lon bounds to transform.

  • dst_crs (str, object, or DataArray) – The CRS to transform to. It can be provided as a string, a CRS instance (e.g., pyproj.crs.CRS), or a geowombat.DataArray.

Returns:

(left, bottom, right, top)

Return type:

tuple

static coords_to_indices(x, y, transform)[source]#

Converts map coordinates to array indices.

Parameters:
  • x (float or 1d array) – The x coordinates.

  • y (float or 1d array) – The y coordinates.

  • transform (object) – The affine transform.

Returns:

(col_index, row_index)

Return type:

tuple

Example

>>> import geowombat as gw
>>> from geowombat.core import coords_to_indices
>>>
>>> with gw.open('image.tif') as src:
>>>     j, i = coords_to_indices(x, y, src)
static dask_to_xarray(data, dask_data, band_names)[source]#

Converts a Dask array to an Xarray DataArray.

Parameters:
  • data (DataArray) – The DataArray with attribute information.

  • dask_data (Dask Array) – The Dask array to convert.

  • band_names (1d array-like) – The output band names.

Return type:

DataArray

Returns:

xarray.DataArray

static indices_to_coords(col_index, row_index, transform)[source]#

Converts array indices to map coordinates.

Parameters:
  • col_index (float or 1d array) – The column index.

  • row_index (float or 1d array) – The row index.

  • transform (Affine, DataArray, or tuple) – The affine transform.

Returns:

(x, y)

Return type:

tuple

Example

>>> import geowombat as gw
>>> from geowombat.core import indices_to_coords
>>>
>>> with gw.open('image.tif') as src:
>>>     x, y = indices_to_coords(j, i, src)
static lonlat_to_xy(lon, lat, dst_crs)[source]#

Converts from longitude and latitude to native map coordinates.

Parameters:
  • lon (float) – The longitude to convert.

  • lat (float) – The latitude to convert.

  • dst_crs (str, object, or DataArray) – The CRS to transform to. It can be provided as a string, a CRS instance (e.g., pyproj.crs.CRS), or a geowombat.DataArray.

Returns:

(x, y)

Return type:

tuple

Example

>>> import geowombat as gw
>>> from geowombat.core import lonlat_to_xy
>>>
>>> lon, lat = -55.56822206, -25.46214220
>>>
>>> with gw.open('image.tif') as src:
>>>     x, y = lonlat_to_xy(lon, lat, src)
static ndarray_to_xarray(data, numpy_data, band_names, row_chunks=None, col_chunks=None, y=None, x=None, attrs=None)[source]#

Converts a NumPy array to an Xarray DataArray.

Parameters:
  • data (DataArray) – The DataArray with attribute information.

  • numpy_data (ndarray) – The ndarray to convert.

  • band_names (1d array-like) – The output band names.

  • row_chunks (Optional[int]) – Overrides data.gw.row_chunks.

  • col_chunks (Optional[int]) – Overrides data.gw.col_chunks.

  • y (Optional[1d array]) – Overrides data.y.

  • x (Optional[1d array]) – Overrides data.x.

  • attrs (Optional[dict]) – Overrides data.attrs.

Return type:

DataArray

Returns:

xarray.DataArray

polygon_to_array(polygon, col=None, data=None, cellx=None, celly=None, band_name=None, row_chunks=512, col_chunks=512, src_res=None, fill=0, default_value=1, all_touched=True, dtype='uint8', sindex=None, tap=False, bounds_by='intersection')[source]#

Converts a polygon geometry to an xarray.DataArray.

Parameters:
  • polygon (GeoDataFrame | str) – The geopandas.DataFrame or file with polygon geometry.

  • col (Optional[str]) – The column in polygon you want to assign values from. If not set, creates a binary raster.

  • data (Optional[DataArray]) – An xarray.DataArray to use as a reference for rasterizing.

  • cellx (Optional[float]) – The output cell x size.

  • celly (Optional[float]) – The output cell y size.

  • band_name (Optional[list]) – The xarray.DataArray band name.

  • row_chunks (Optional[int]) – The dask row chunk size.

  • col_chunks (Optional[int]) – The dask column chunk size.

  • (Optional[tuple] (src_res) – A source resolution to align to.

  • fill (Optional[int]) – Used as fill value for all areas not covered by input geometries to rasterio.features.rasterize.

  • default_value (Optional[int]) – Used as value for all geometries, if not provided in shapes to rasterio.features.rasterize.

  • all_touched (Optional[bool]) – If True, all pixels touched by geometries will be burned in. If false, only pixels whose center is within the polygon or that are selected by Bresenham’s line algorithm will be burned in. The all_touched value for rasterio.features.rasterize().

  • dtype (Optional[str | numpy data type]) – The output data type for rasterio.features.rasterize().

  • sindex (Optional[object]) – An instanced of geopandas.GeoDataFrame.sindex.

  • tap (Optional[bool]) – Whether to target align pixels.

  • bounds_by (Optional[str]) –

    How to concatenate the output extent. Choices are [‘intersection’, ‘union’, ‘reference’].

    • reference: Use the bounds of the reference image

    • intersection: Use the intersection (i.e., minimum extent) of all the image bounds

    • union: Use the union (i.e., maximum extent) of all the image bounds

  • src_res (Sequence[float] | None) –

Return type:

DataArray

Returns:

xarray.DataArray

Example

>>> import geowombat as gw
>>> import geopandas as gpd
>>>
>>> df = gpd.read_file('polygons.gpkg')
>>>
>>> # 100x100 cell size
>>> data = gw.polygon_to_array(df, 100.0, 100.0)
>>>
>>> # Align to an existing image
>>> with gw.open('image.tif') as src:
>>>     data = gw.polygon_to_array(df, data=src)
static polygons_to_points(data, df, frac=1.0, min_frac_area=None, all_touched=False, id_column='id', n_jobs=1, **kwargs)[source]#

Converts polygons to points.

Parameters:
  • data (DataArray or Dataset) – The xarray.DataArray or xarray.Dataset.

  • df (GeoDataFrame) – The geopandas.GeoDataFrame containing the geometry to rasterize.

  • frac (Optional[float]) – A fractional subset of points to extract in each feature.

  • min_frac_area (Optional[int | float]) – A minimum polygon area to use frac. Otherwise, use all samples within a polygon.

  • all_touched (Optional[bool]) – The all_touched argument is passed to rasterio.features.rasterize.

  • id_column (Optional[str]) – The ‘id’ column.

  • n_jobs (Optional[int]) – The number of features to rasterize in parallel.

  • kwargs (Optional[dict]) – Keyword arguments passed to multiprocessing.Pool().imap.

Returns:

geopandas.GeoDataFrame

prepare_points(data, aoi, frac=1.0, min_frac_area=None, all_touched=False, id_column='id', mask=None, n_jobs=8, verbose=0, **kwargs)[source]#
static xarray_to_xdataset(data_array, band_names, time_names, ycoords=None, xcoords=None, attrs=None)[source]#

Converts an Xarray DataArray to a Xarray Dataset.

Parameters:
  • data_array (DataArray) –

  • band_names (list) –

  • time_names (list) –

  • ycoords (1d array-like) –

  • xcoords (1d array-like) –

  • attrs (dict) –

Return type:

Dataset

Returns:

xarray.Dataset

static xy_to_lonlat(x, y, dst_crs)[source]#

Converts from native map coordinates to longitude and latitude.

Parameters:
  • x (float) – The x coordinate to convert.

  • y (float) – The y coordinate to convert.

  • dst_crs (str, object, or DataArray) – The CRS to transform to. It can be provided as a string, a CRS instance (e.g., pyproj.crs.CRS), or a geowombat.DataArray.

Returns:

(longitude, latitude)

Return type:

tuple

Example

>>> import geowombat as gw
>>> from geowombat.core import xy_to_lonlat
>>>
>>> x, y = 643944.6956113526, 7183104.984484519
>>>
>>> with gw.open('image.tif') as src:
>>>     lon, lat = xy_to_lonlat(x, y, src)

geowombat.core.geoxarray module#

class geowombat.core.geoxarray.GeoWombatAccessor(xarray_obj)[source]#

Bases: _UpdateConfig, DataProperties

A method to access a xarray.DataArray. This class is typically not accessed directly, but rather through a call to geowombat.open.

  • An xarray.DataArray object will have a gw method.

  • To access GeoWombat methods, use xarray.DataArray.gw.

Attributes:
affine

Get the affine transform object.

altitude

Get satellite altitudes (in km)

array_is_dask

Get whether the array is a Dask array.

avail_sensors

Get supported sensors.

band_chunks

Get the band chunk size.

bottom

Get the array bounding box bottom coordinate.

bounds

Get the array bounding box (left, bottom, right, top)

bounds_as_namedtuple

Get the array bounding box as a rasterio.coords.BoundingBox

cellx

Get the cell size in the x direction.

cellxh

Get the half width of the cell size in the x direction.

celly

Get the cell size in the y direction.

cellyh

Get the half width of the cell size in the y direction.

central_um

Get a dictionary of central wavelengths (in micrometers)

chunk_grid

Get the image chunk grid.

col_chunks

Get the column chunk size.

crs_to_pyproj

Get the CRS as a pyproj.CRS object.

data_are_separate

Checks whether the data are loaded separately.

data_are_stacked

Checks whether the data are stacked.

dtype

Get the data type of the DataArray.

filenames

Gets the data filenames.

footprint_grid

Get the image footprint grid.

geodataframe

Get a geopandas.GeoDataFrame of the array bounds.

geometry

Get the polygon geometry of the array bounding box.

has_band

Check whether the DataArray has a band attribute.

has_band_coord

Check whether the DataArray has a band coordinate.

has_band_dim

Check whether the DataArray has a band dimension.

has_time

Check whether the DataArray has a time attribute.

has_time_coord

Check whether the DataArray has a time coordinate.

has_time_dim

Check whether the DataArray has a time dimension.

left

Get the array bounding box left coordinate.

meta

Get the array metadata.

nbands

Get the number of array bands.

ncols

Get the number of array columns.

ndims

Get the number of array dimensions.

nodataval

Get the ‘no data’ value from the attributes.

nrows

Get the number of array rows.

ntime

Get the number of time dimensions.

offsetval

Get the offset value.

pydatetime

Get Python datetime objects from the time dimension.

right

Get the array bounding box right coordinate.

row_chunks

Get the row chunk size.

scaleval

Get the scale factor value.

sensor_names

Get sensor full names.

time_chunks

Get the time chunk size.

top

Get the array bounding box top coordinate.

transform

Get the data transform (cell x, 0, left, 0, cell y, top)

unary_union

Get a representation of the union of the image bounds.

wavelengths

Get a dictionary of sensor wavelengths.

Methods

apply(filename, user_func[, n_jobs])

Applies a user function to an Xarray Dataset or DataArray and writes to file.

assign_nodata_attrs(nodata)

Assigns 'no data' attributes.

avi([nodata, mask, sensor, scale_factor])

Calculates the advanced vegetation index

band_mask(valid_bands[, src_nodata, ...])

Creates a mask from band nonzeros.

bounds_overlay(bounds[, how])

Checks whether the bounds overlay the image bounds.

calc_area(values[, op, units, row_chunks, ...])

Calculates the area of data values.

check_chunksize(chunksize, array_size)

Asserts that the chunk size fits within intervals of 16 and is smaller than the array.

clip(df[, query, mask_data, expand_by])

Clips a DataArray by vector polygon geometry.

clip_by_polygon(df[, query, mask_data, ...])

Clips a DataArray by vector polygon geometry.

compare(op, b[, return_binary])

Comparison operation.

compute(**kwargs)

Computes data.

evi([nodata, mask, sensor, scale_factor])

Calculates the enhanced vegetation index

evi2([nodata, mask, sensor, scale_factor])

Calculates the two-band modified enhanced vegetation index

extract(aoi[, bands, time_names, ...])

Extracts data within an area or points of interest.

gcvi([nodata, mask, sensor, scale_factor])

Calculates the green chlorophyll vegetation index

imshow([mask, nodata, flip, text_color, rot])

Shows an image on a plot.

kndvi([nodata, mask, sensor, scale_factor])

Calculates the kernel normalized difference vegetation index

mask(df[, query, keep])

Masks a DataArray.

mask_nodata()

Masks 'no data' values with nans.

match_data(data, band_names)

Coerces the xarray.DataArray to match another xarray.DataArray.

moving([stat, perc, w, nodata, weights])

Applies a moving window function to the DataArray.

n_windows([row_chunks, col_chunks])

Calculates the number of windows in a row/column iteration.

nbr([nodata, mask, sensor, scale_factor])

Calculates the normalized burn ratio

ndvi([nodata, mask, sensor, scale_factor])

Calculates the normalized difference vegetation index

norm_brdf(solar_za, solar_az, sensor_za, ...)

Applies Bidirectional Reflectance Distribution Function (BRDF) normalization.

norm_diff(b1, b2[, nodata, mask, sensor, ...])

Calculates the normalized difference band ratio.

read(band, **kwargs)

Reads data for a band or bands.

recode(polygon, to_replace[, num_workers])

Recodes a DataArray with polygon mappings.

replace(to_replace)

Replace values given in to_replace with value.

sample([method, band, n, strata, spacing, ...])

Generates samples from a raster.

save(filename[, mode, nodata, overwrite, ...])

Saves a DataArray to raster using rasterio/dask.

set_nodata([src_nodata, dst_nodata, ...])

Sets 'no data' values and applies scaling to an xarray.DataArray.

subset([left, top, right, bottom, rows, ...])

Subsets a DataArray.

tasseled_cap([nodata, sensor, scale_factor])

Applies a tasseled cap transformation

to_netcdf(filename, *args, **kwargs)

Writes an Xarray DataArray to a NetCDF file.

to_polygon([mask, connectivity])

Converts a dask array to a GeoDataFrame

to_raster(filename[, readxsize, readysize, ...])

Writes an Xarray DataArray to a raster file.

to_vector(filename[, mask, connectivity])

Writes an Xarray DataArray to a vector file.

to_vrt(filename[, overwrite, resampling, ...])

Writes a file to a VRT file.

transform_crs([dst_crs, dst_res, dst_width, ...])

Transforms an xarray.DataArray to a new coordinate reference system.

wi([nodata, mask, sensor, scale_factor])

Calculates the woody vegetation index

windows([row_chunks, col_chunks, ...])

Generates windows for a row/column iteration.

apply(filename, user_func, n_jobs=1, **kwargs)[source]#

Applies a user function to an Xarray Dataset or DataArray and writes to file.

Parameters:
  • filename (str | Path) – The output file name to write to.

  • user_func (func) – The user function to apply.

  • n_jobs (Optional[int]) – The number of parallel jobs for the cluster.

  • kwargs (Optional[dict]) – Keyword arguments passed to to_raster().

Example

>>> import geowombat as gw
>>>
>>> def user_func(ds_):
>>>     return ds_.max(axis=0)
>>>
>>> with gw.open('image.tif', chunks=512) as ds:
>>>     ds.gw.apply(
>>>         'output.tif',
>>>         user_func,
>>>         n_jobs=8,
>>>         overwrite=True,
>>>         blockxsize=512,
>>>         blockysize=512
>>>     )
assign_nodata_attrs(nodata)[source]#

Assigns ‘no data’ attributes.

Parameters:

nodata (float | int) – The ‘no data’ value to assign.

Return type:

DataArray

Returns:

xarray.DataArray

avi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#

Calculates the advanced vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor.

  • scale_factor (Optional[float]) – A scale factor to apply to the data.

Return type:

DataArray

Equation:

\[AVI = {(NIR \times (1.0 - red) \times (NIR - red))}^{0.3334}\]
Returns:

Data range: 0 to 1

Return type:

xarray.DataArray

Parameters:
  • nodata (float | int | None) –

  • mask (bool) –

  • sensor (str | None) –

  • scale_factor (float | None) –

band_mask(valid_bands, src_nodata=None, dst_clear_val=0, dst_mask_val=1)[source]#

Creates a mask from band nonzeros.

Parameters:
  • valid_bands (list) – The bands considered valid.

  • src_nodata (Optional[float | int]) – The source ‘no data’ value.

  • dst_clear_val (Optional[int]) – The destination clear value.

  • dst_mask_val (Optional[int]) – The destination mask value.

Return type:

DataArray

Returns:

xarray.DataArray

bounds_overlay(bounds, how='intersects')[source]#

Checks whether the bounds overlay the image bounds.

Parameters:
  • bounds (tuple | rasterio.coords.BoundingBox | shapely.geometry) – The bounds to check. If given as a tuple, the order should be (left, bottom, right, top).

  • how (Optional[str]) – Choices are any shapely.geometry binary predicates.

Return type:

bool

Returns:

bool

Example

>>> import geowombat as gw
>>>
>>> bounds = (left, bottom, right, top)
>>>
>>> with gw.open('image.tif') as src
>>>     intersects = src.gw.bounds_overlay(bounds)
>>>
>>> from rasterio.coords import BoundingBox
>>>
>>> bounds = BoundingBox(left, bottom, right, top)
>>>
>>> with gw.open('image.tif') as src
>>>     contains = src.gw.bounds_overlay(bounds, how='contains')
calc_area(values, op='eq', units='km2', row_chunks=None, col_chunks=None, n_workers=1, n_threads=1, scheduler='threads', n_chunks=100)[source]#

Calculates the area of data values.

Parameters:
  • values (list) – A list of values.

  • op (Optional[str]) – The value sign. Choices are [‘gt’, ‘ge’, ‘lt’, ‘le’, ‘eq’].

  • units (Optional[str]) – The units to return. Choices are [‘km2’, ‘ha’].

  • row_chunks (Optional[int]) – The row chunk size to process in parallel.

  • col_chunks (Optional[int]) – The column chunk size to process in parallel.

  • n_workers (Optional[int]) – The number of parallel workers for scheduler.

  • n_threads (Optional[int]) – The number of parallel threads for dask.compute().

  • scheduler (Optional[str]) –

    The parallel task scheduler to use. Choices are [‘processes’, ‘threads’, ‘mpool’].

    mpool: process pool of workers using multiprocessing.Pool processes: process pool of workers using concurrent.futures threads: thread pool of workers using concurrent.futures

  • n_chunks (Optional[int]) – The chunk size of windows. If not given, equal to n_workers x 50.

Return type:

DataFrame

Returns:

pandas.DataFrame

Example

>>> import geowombat as gw
>>>
>>> # Read a land cover image with 512x512 chunks
>>> with gw.open('land_cover.tif', chunks=512) as src:
>>>
>>>     df = src.gw.calc_area(
>>>         [1, 2, 5],        # calculate the area of classes 1, 2, and 5
>>>         units='km2',      # return area in kilometers squared
>>>         n_workers=4,
>>>         row_chunks=1024,  # iterate over larger chunks to use 512 chunks in parallel
>>>         col_chunks=1024
>>>     )
check_chunksize(chunksize, array_size)[source]#

Asserts that the chunk size fits within intervals of 16 and is smaller than the array.

Parameters:
  • chunksize (int) – The chunk size to check.

  • array_size (int) – The array dimension size to check against.

Return type:

int

Returns:

int

clip(df, query=None, mask_data=False, expand_by=0)[source]#

Clips a DataArray by vector polygon geometry.

Deprecated since version 2.1.7: Use xarray.DataArray.gw.clip_by_polygon().

Parameters:
  • df (GeoDataFrame) – The geopandas.GeoDataFrame to clip to.

  • query (Optional[str]) – A query to apply to df.

  • mask_data (Optional[bool]) – Whether to mask values outside of the df geometry envelope.

  • expand_by (Optional[int]) – Expand the clip array bounds by expand_by pixels on each side.

Returns:

xarray.DataArray

clip_by_polygon(df, query=None, mask_data=False, expand_by=0)[source]#

Clips a DataArray by vector polygon geometry.

Parameters:
  • df (GeoDataFrame) – The geopandas.GeoDataFrame to clip to.

  • query (Optional[str]) – A query to apply to df.

  • mask_data (Optional[bool]) – Whether to mask values outside of the df geometry envelope.

  • expand_by (Optional[int]) – Expand the clip array bounds by expand_by pixels on each side.

Return type:

DataArray

Returns:

xarray.DataArray

Examples

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif') as ds:
>>>     ds = ds.gw.clip_by_polygon(df, query="Id == 1")
compare(op, b, return_binary=False)[source]#

Comparison operation.

Parameters:
  • op (str) – The comparison operation.

  • b (float | int) – The value to compare to.

  • return_binary (Optional[bool]) – Whether to return a binary (1 or 0) array.

Returns:

Valid data where op meets criteria b, otherwise nans.

Return type:

xarray.DataArray

Example

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif') as src:
>>>     # Mask all values greater than 10
>>>     thresh = src.gw.compare(op='lt', b=10)
compute(**kwargs)[source]#

Computes data.

Parameters:

kwargs (Optional[dict]) – Keyword arguments to pass to dask.compute().

Return type:

ndarray

Returns:

numpy.ndarray

property data_are_separate: bool#

Checks whether the data are loaded separately.

Returns:

bool

property data_are_stacked: bool#

Checks whether the data are stacked.

Returns:

bool

evi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#

Calculates the enhanced vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor.

  • scale_factor (Optional[float]) – A scale factor to apply to the data.

Return type:

DataArray

Equation:

\[EVI = 2.5 \times \frac{NIR - red}{NIR \times 6 \times red - 7.5 \times blue + 1}\]
Returns:

Data range: 0 to 1

Return type:

xarray.DataArray

Parameters:
  • nodata (float | int | None) –

  • mask (bool) –

  • sensor (str | None) –

  • scale_factor (float | None) –

evi2(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#

Calculates the two-band modified enhanced vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor.

  • scale_factor (Optional[float]) – A scale factor to apply to the data.

Return type:

DataArray

Equation:

\[EVI2 = 2.5 \times \frac{NIR - red}{NIR + 1 + 2.4 \times red}\]
Returns:

Data range: 0 to 1

Return type:

xarray.DataArray

Parameters:
  • nodata (float | int | None) –

  • mask (bool) –

  • sensor (str | None) –

  • scale_factor (float | None) –

extract(aoi, bands=None, time_names=None, band_names=None, frac=1.0, min_frac_area=None, all_touched=False, id_column='id', time_format='%Y%m%d', mask=None, n_jobs=8, verbose=0, n_workers=1, n_threads=-1, use_client=False, address=None, total_memory=24, processes=False, pool_kwargs=None, **kwargs)[source]#

Extracts data within an area or points of interest. Projections do not need to match, as they are handled ‘on-the-fly’.

Parameters:
  • aoi (str or GeoDataFrame) – A file or geopandas.GeoDataFrame to extract data frame.

  • bands (Optional[int or 1d array-like]) – A band or list of bands to extract. If not given, all bands are used. Bands should be GDAL-indexed (i.e., the first band is 1, not 0).

  • band_names (Optional[list]) – A list of band names. Length should be the same as bands.

  • time_names (Optional[list]) – A list of time names.

  • frac (Optional[float]) – A fractional subset of points to extract in each polygon feature.

  • min_frac_area (Optional[int | float]) – A minimum polygon area to use frac. Otherwise, use all samples within a polygon.

  • all_touched (Optional[bool]) – The all_touched argument is passed to rasterio.features.rasterize().

  • id_column (Optional[str]) – The id column name.

  • time_format (Optional[str]) – The datetime conversion format if time_names are datetime objects.

  • mask (Optional[GeoDataFrame or Shapely Polygon]) – A shapely.geometry.Polygon mask to subset to.

  • n_jobs (Optional[int]) – The number of features to rasterize in parallel.

  • verbose (Optional[int]) – The verbosity level.

  • n_workers (Optional[int]) – The number of process workers. Only applies when use_client = True.

  • n_threads (Optional[int]) – The number of thread workers. Only applies when use_client = True.

  • use_client (Optional[bool]) – Whether to use a dask client.

  • address (Optional[str]) – A cluster address to pass to client. Only used when use_client = True.

  • total_memory (Optional[int]) – The total memory (in GB) required when use_client = True.

  • processes (Optional[bool]) – Whether to use process workers with the dask.distributed client. Only applies when use_client = True.

  • pool_kwargs (Optional[dict]) – Keyword arguments passed to multiprocessing.Pool().imap().

  • kwargs (Optional[dict]) – Keyword arguments passed to dask.compute().

Return type:

GeoDataFrame

Returns:

geopandas.GeoDataFrame

Examples

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif') as src:
>>>     df = src.gw.extract('poly.gpkg')
>>>
>>> # On a cluster
>>> # Use a local cluster
>>> with gw.open('image.tif') as src:
>>>     df = src.gw.extract('poly.gpkg', use_client=True, n_threads=16)
>>>
>>> # Specify the client address with a local cluster
>>> with LocalCluster(
>>>     n_workers=1,
>>>     threads_per_worker=8,
>>>     scheduler_port=0,
>>>     processes=False,
>>>     memory_limit='4GB'
>>> ) as cluster:
>>>
>>>     with gw.open('image.tif') as src:
>>>         df = src.gw.extract(
>>>             'poly.gpkg',
>>>             use_client=True,
>>>             address=cluster
>>>         )
property filenames: Sequence[str | Path]#

Gets the data filenames.

Returns:

list

gcvi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#

Calculates the green chlorophyll vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor.

  • scale_factor (Optional[float]) – A scale factor to apply to the data.

Return type:

DataArray

Equation:

\[GCVI = \frac{NIR}{green} - 1\]
Returns:

Data range: -1 to 1

Return type:

xarray.DataArray

Parameters:
  • nodata (float | int | None) –

  • mask (bool) –

  • sensor (str | None) –

  • scale_factor (float | None) –

imshow(mask=False, nodata=0, flip=False, text_color='black', rot=30, **kwargs)[source]#

Shows an image on a plot.

Parameters:
  • mask (Optional[bool]) – Whether to mask ‘no data’ values (given by nodata).

  • nodata (Optional[int or float]) – The ‘no data’ value.

  • flip (Optional[bool]) – Whether to flip an RGB array’s band order.

  • text_color (Optional[str]) – The text color.

  • rot (Optional[int]) – The degree rotation for the x-axis tick labels.

  • kwargs (Optional[dict]) – Keyword arguments passed to xarray.plot.imshow.

Return type:

None

Returns:

None

Examples

>>> with gw.open('image.tif') as ds:
>>>     ds.gw.imshow(band_names=['red', 'green', 'red'], mask=True, vmin=0.1, vmax=0.9, robust=True)
kndvi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#

Calculates the kernel normalized difference vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor.

  • scale_factor (Optional[float]) – A scale factor to apply to the data.

Return type:

DataArray

Equation:

\[kNDVI = tanh({NDVI}^2)\]
Returns:

Data range: -1 to 1

Return type:

xarray.DataArray

Parameters:
  • nodata (float | int | None) –

  • mask (bool) –

  • sensor (str | None) –

  • scale_factor (float | None) –

mask(df, query=None, keep='in')[source]#

Masks a DataArray.

Parameters:
  • df (GeoDataFrame or str) – The geopandas.GeoDataFrame or filename to use for masking.

  • query (Optional[str]) – A query to apply to df.

  • keep (Optional[str]) – If keep = ‘in’, mask values outside of the geometry (keep inside). Otherwise, if keep = ‘out’, mask values inside (keep outside).

Return type:

DataArray

Returns:

xarray.DataArray

mask_nodata()[source]#

Masks ‘no data’ values with nans.

Return type:

DataArray

Returns:

xarray.DataArray

match_data(data, band_names)[source]#

Coerces the xarray.DataArray to match another xarray.DataArray.

Parameters:
  • data (DataArray) – The xarray.DataArray to match to.

  • band_names (1d array-like) – The output band names.

Return type:

DataArray

Returns:

xarray.DataArray

Example

>>> import geowombat as gw
>>> import xarray as xr
>>>
>>> other_array = xr.DataArray()
>>>
>>> with gw.open('image.tif') as src:
>>>     new_array = other_array.gw.match_data(src, ['bd1'])
moving(stat='mean', perc=50, w=3, nodata=None, weights=False)[source]#

Applies a moving window function to the DataArray.

Parameters:
  • stat (Optional[str]) – The statistic to compute. Choices are [‘mean’, ‘std’, ‘var’, ‘min’, ‘max’, ‘perc’].

  • perc (Optional[int]) – The percentile to return if stat = ‘perc’.

  • w (Optional[int]) – The moving window size (in pixels).

  • nodata (Optional[int or float]) – A ‘no data’ value to ignore.

  • weights (Optional[bool]) – Whether to weight values by distance from window center.

Return type:

DataArray

Returns:

xarray.DataArray

Examples

>>> import geowombat as gw
>>>
>>> # Calculate the mean within a 5x5 window
>>> with gw.open('image.tif') as src:
>>>     res = src.gw.moving(stat='mean', w=5, nodata=32767.0)
>>>
>>> # Calculate the 90th percentile within a 15x15 window
>>> with gw.open('image.tif') as src:
>>>     res = src.gw.moving(stat='perc', w=15, perc=90, nodata=32767.0)
>>>     res.data.compute(num_workers=4)
n_windows(row_chunks=None, col_chunks=None)[source]#

Calculates the number of windows in a row/column iteration.

Parameters:
  • row_chunks (Optional[int]) – The row chunk size. If not given, defaults to opened DataArray chunks.

  • col_chunks (Optional[int]) – The column chunk size. If not given, defaults to opened DataArray chunks.

Return type:

int

Returns:

int

nbr(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#

Calculates the normalized burn ratio

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor.

  • scale_factor (Optional[float]) – A scale factor to apply to the data.

Return type:

DataArray

Equation:

\[NBR = \frac{NIR - SWIR1}{NIR + SWIR1}\]
Returns:

Data range: -1 to 1

Return type:

xarray.DataArray

Parameters:
  • nodata (float | int | None) –

  • mask (bool) –

  • sensor (str | None) –

  • scale_factor (float | None) –

ndvi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#

Calculates the normalized difference vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor.

  • scale_factor (Optional[float]) – A scale factor to apply to the data.

Return type:

DataArray

Equation:

\[NDVI = \frac{NIR - red}{NIR + red}\]
Returns:

Data range: -1 to 1

Return type:

xarray.DataArray

Parameters:
  • nodata (float | int | None) –

  • mask (bool) –

  • sensor (str | None) –

  • scale_factor (float | None) –

norm_brdf(solar_za, solar_az, sensor_za, sensor_az, sensor=None, wavelengths=None, nodata=None, mask=None, scale_factor=1.0, scale_angles=True)[source]#

Applies Bidirectional Reflectance Distribution Function (BRDF) normalization.

Parameters:
  • solar_za (2d DataArray) – The solar zenith angles (degrees).

  • solar_az (2d DataArray) – The solar azimuth angles (degrees).

  • sensor_za (2d DataArray) – The sensor azimuth angles (degrees).

  • sensor_az (2d DataArray) – The sensor azimuth angles (degrees).

  • sensor (Optional[str]) – The satellite sensor.

  • wavelengths (str list) – The wavelength(s) to normalize.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.

  • mask (Optional[DataArray]) – A data mask, where clear values are 0.

  • scale_factor (Optional[float]) – A scale factor to apply to the input data.

  • scale_angles (Optional[bool]) – Whether to scale the pixel angle arrays.

Returns:

xarray.DataArray

Examples

>>> import geowombat as gw
>>>
>>> # Example where pixel angles are stored in separate GeoTiff files
>>> with gw.config.update(sensor='l7', scale_factor=0.0001, nodata=0):
>>>
>>>     with gw.open('solarz.tif') as solarz,
>>>         gw.open('solara.tif') as solara,
>>>             gw.open('sensorz.tif') as sensorz,
>>>                 gw.open('sensora.tif') as sensora:
>>>
>>>         with gw.open('landsat.tif') as ds:
>>>             ds_brdf = ds.gw.norm_brdf(solarz, solara, sensorz, sensora)
norm_diff(b1, b2, nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#

Calculates the normalized difference band ratio.

Parameters:
  • b1 (str) – The band name of the first band.

  • b2 (str) – The band name of the second band.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor.

  • scale_factor (Optional[float]) – A scale factor to apply to the data.

Return type:

DataArray

Equation:

\[{norm}_{diff} = \frac{b2 - b1}{b2 + b1}\]
Returns:

Data range: -1 to 1

Return type:

xarray.DataArray

Parameters:
  • b1 (Any) –

  • b2 (Any) –

  • nodata (float | int | None) –

  • mask (bool) –

  • sensor (str | None) –

  • scale_factor (float | None) –

read(band, **kwargs)[source]#

Reads data for a band or bands.

Parameters:

band (int | list) – A band or list of bands to read.

Return type:

ndarray

Returns:

xarray.DataArray

recode(polygon, to_replace, num_workers=1)[source]#

Recodes a DataArray with polygon mappings.

Parameters:
  • polygon (GeoDataFrame | str) – The geopandas.DataFrame or file with polygon geometry.

  • to_replace (dict) –

    How to find the values to replace. Dictionary mappings should be given as {from: to} pairs. If to_replace is an integer/string mapping, the to string should be ‘mode’.

    {1: 5}:

    recode values of 1 to 5

    {1: ‘mode’}:

    recode values of 1 to the polygon mode

  • num_workers (Optional[int]) – The number of parallel Dask workers (only used if to_replace has a ‘mode’ mapping).

Return type:

DataArray

Returns:

xarray.DataArray

Example

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif', chunks=512) as ds:
>>>     # Recode 1 with 5 within a polygon
>>>     res = ds.gw.recode('poly.gpkg', {1: 5})
replace(to_replace)[source]#

Replace values given in to_replace with value.

Parameters:

to_replace (dict) –

How to find the values to replace. Dictionary mappings should be given as {from: to} pairs. If to_replace is an integer/string mapping, the to string should be ‘mode’.

{1: 5}:

recode values of 1 to 5

{1: ‘mode’}:

recode values of 1 to the polygon mode

Return type:

DataArray

Returns:

xarray.DataArray

Example

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif', chunks=512) as ds:
>>>     # Replace 1 with 5
>>>     res = ds.gw.replace({1: 5})
sample(method='random', band=None, n=None, strata=None, spacing=None, min_dist=None, max_attempts=10, num_workers=1, verbose=1, **kwargs)[source]#

Generates samples from a raster.

Parameters:
  • data (DataArray) – The xarray.DataArray to extract data from.

  • method (Optional[str]) – The sampling method. Choices are [‘random’, ‘systematic’].

  • band (Optional[int or str]) – The band name to extract from. Only required if method = ‘random’ and strata is given.

  • n (Optional[int]) – The total number of samples. Only required if method = ‘random’.

  • strata (Optional[dict]) –

    The strata to sample within. The dictionary key–>value pairs should be {‘conditional,value’: proportion}.

    E.g.,

    strata = {‘==,1’: 0.5, ‘>=,2’: 0.5} … would sample 50% of total samples within class 1 and 50% of total samples in class >= 2.

    strata = {‘==,1’: 10, ‘>=,2’: 20} … would sample 10 samples within class 1 and 20 samples in class >= 2.

  • spacing (Optional[float]) – The spacing (in map projection units) when method = ‘systematic’.

  • min_dist (Optional[float or int]) – A minimum distance allowed between samples. Only applies when method = ‘random’.

  • max_attempts (Optional[int]) – The maximum numer of attempts to sample points > min_dist from each other.

  • num_workers (Optional[int]) – The number of parallel workers for dask.compute().

  • verbose (Optional[int]) – The verbosity level.

  • kwargs (Optional[dict]) – Keyword arguments passed to geowombat.extract.

Return type:

GeoDataFrame

Returns:

geopandas.GeoDataFrame

Examples

>>> import geowombat as gw
>>>
>>> # Sample 100 points randomly across the image
>>> with gw.open('image.tif') as ds:
>>>     df = ds.gw.sample(n=100)
>>>
>>> # Sample points systematically (with 10km spacing) across the image
>>> with gw.open('image.tif') as ds:
>>>     df = ds.gw.sample(method='systematic', spacing=10000.0)
>>>
>>> # Sample 50% of 100 in class 1 and 50% in classes >= 2
>>> strata = {'==,1': 0.5, '>=,2': 0.5}
>>> with gw.open('image.tif') as ds:
>>>     df = ds.gw.sample(band=1, n=100, strata=strata)
>>>
>>> # Specify a per-stratum minimum allowed point distance of 1,000 meters
>>> with gw.open('image.tif') as ds:
>>>     df = ds.gw.sample(band=1, n=100, min_dist=1000, strata=strata)
save(filename, mode='w', nodata=None, overwrite=False, client=None, compute=True, tags=None, compress='none', compression=None, num_workers=1, log_progress=True, tqdm_kwargs=None, bigtiff=None)[source]#

Saves a DataArray to raster using rasterio/dask.

Parameters:
  • filename (str | Path) – The output file name to write to.

  • mode (Optional[str]) – The file storage mode. Choices are [‘w’, ‘r+’].

  • nodata (Optional[float | int]) – The ‘no data’ value. If None (default), the ‘no data’ value is taken from the DataArray metadata.

  • overwrite (Optional[bool]) – Whether to overwrite an existing file. Default is False.

  • client (Optional[Client object]) – A dask.distributed.Client client object to persist data. Default is None.

  • compute (Optinoal[bool]) – Whether to compute and write to filename. Otherwise, return the dask task graph. If True, compute and write to filename. If False, return the dask task graph. Default is True.

  • tags (Optional[dict]) – Metadata tags to write to file. Default is None.

  • compress (Optional[str]) – The file compression type. Default is ‘none’, or no compression.

  • compression (Optional[str]) –

    The file compression type. Default is ‘none’, or no compression.

    Deprecated since version 2.1.4: Use ‘compress’ – ‘compression’ will be removed in >=2.2.0.

  • num_workers (Optional[int]) – The number of dask workers (i.e., chunks) to write concurrently. Default is 1.

  • log_progress (Optional[bool]) – Whether to log the progress bar during writing. Default is True.

  • tqdm_kwargs (Optional[dict]) – Keyword arguments to pass to tqdm.

  • bigtiff (Optional[str]) – A GDAL BIGTIFF flag. Choices are [“YES”, “NO”, “IF_NEEDED”, “IF_SAFER”].

Return type:

None

Returns:

None, writes to filename

Example

>>> import geowombat as gw
>>>
>>> with gw.open('file.tif') as src:
>>>     result = ...
>>>     result.gw.save('output.tif', compress='lzw', num_workers=8)
set_nodata(src_nodata=None, dst_nodata=None, out_range=None, dtype=None, scale_factor=None, offset=None)[source]#

Sets ‘no data’ values and applies scaling to an xarray.DataArray.

Parameters:
  • src_nodata (int | float) – The ‘no data’ values to replace. Default is None.

  • dst_nodata (int | float) – The ‘no data’ value to set. Default is nan.

  • out_range (Optional[tuple]) – The output clip range. Default is None.

  • dtype (Optional[str]) – The output data type. Default is None.

  • scale_factor (Optional[float | int]) – A scale factor to apply. Default is None.

  • offset (Optional[float | int]) – An offset to apply. Default is None.

Return type:

DataArray

Returns:

xarray.DataArray

Example

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif') as src:
>>>     src = src.gw.set_nodata(0, 65535, out_range=(0, 10000), dtype='uint16')
subset(left=None, top=None, right=None, bottom=None, rows=None, cols=None, center=False, mask_corners=False)[source]#

Subsets a DataArray.

Parameters:
  • left (Optional[float]) – The left coordinate.

  • top (Optional[float]) – The top coordinate.

  • right (Optional[float]) – The right coordinate.

  • bottom (Optional[float]) – The bottom coordinate.

  • rows (Optional[int]) – The number of output rows.

  • cols (Optional[int]) – The number of output rows.

  • center (Optional[bool]) – Whether to center the subset on left and top.

  • mask_corners (Optional[bool]) – Whether to mask corners (requires pymorph).

  • chunksize (Optional[tuple]) – A new chunk size for the output.

Return type:

DataArray

Returns:

xarray.DataArray

Example

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif', chunks=512) as ds:
>>>     ds_sub = ds.gw.subset(
>>>         left=-263529.884,
>>>         top=953985.314,
>>>         rows=2048,
>>>         cols=2048
>>>     )
tasseled_cap(nodata=None, sensor=None, scale_factor=1.0)[source]#

Applies a tasseled cap transformation

Parameters:
  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.

  • sensor (Optional[str]) – The data’s sensor.

  • scale_factor (Optional[float]) – A scale factor to apply to the data.

Return type:

DataArray

Returns:

xarray.DataArray

Examples

>>> import geowombat as gw
>>>
>>> with gw.config.update(sensor='qb', scale_factor=0.0001):
>>>     with gw.open(
>>>         'image.tif', band_names=['blue', 'green', 'red', 'nir']
>>>     ) as ds:
>>>         tcap = ds.gw.tasseled_cap()
to_netcdf(filename, *args, **kwargs)[source]#

Writes an Xarray DataArray to a NetCDF file.

Parameters:
  • filename (Path | str) – The output file name to write to.

  • args (DataArray) – Additional DataArrays to stack.

  • kwargs (dict) – Encoding arguments.

Return type:

None

Examples

>>> import geowombat as gw
>>> import xarray as xr
>>>
>>> # Write a single DataArray to a .nc file
>>> with gw.config.update(sensor='l7'):
>>>     with gw.open('LC08_L1TP_225078_20200219_20200225_01_T1.tif') as src:
>>>         src.gw.to_netcdf('filename.nc', zlib=True, complevel=5)
>>>
>>> # Add extra layers
>>> with gw.config.update(sensor='l7'):
>>>     with gw.open(
>>>         'LC08_L1TP_225078_20200219_20200225_01_T1.tif'
>>>     ) as src, gw.open(
>>>         'LC08_L1TP_225078_20200219_20200225_01_T1_angles.tif',
>>>         band_names=['zenith', 'azimuth']
>>>     ) as ang:
>>>         src = (
>>>             xr.where(
>>>                 src == 0, -32768, src
>>>             )
>>>             .astype('int16')
>>>             .assign_attrs(**src.attrs)
>>>         )
>>>
>>>         src.gw.to_netcdf(
>>>             'filename.nc',
>>>             ang.astype('int16'),
>>>             zlib=True,
>>>             complevel=5,
>>>             _FillValue=-32768
>>>         )
>>>
>>> # Open the data and convert to a DataArray
>>> with xr.open_dataset(
>>>     'filename.nc', engine='h5netcdf', chunks=256
>>> ) as ds:
>>>     src = ds.to_array(dim='band')
to_polygon(mask=None, connectivity=4)[source]#

Converts a dask array to a GeoDataFrame

Parameters:
  • mask (Optional[numpy ndarray or rasterio Band object]) – Must evaluate to bool (rasterio.bool_ or rasterio.uint8). Values of False or 0 will be excluded from feature generation. Note well that this is the inverse sense from Numpy’s, where a mask value of True indicates invalid data in an array. If source is a Numpy masked array and mask is None, the source’s mask will be inverted and used in place of mask.

  • connectivity (Optional[int]) – Use 4 or 8 pixel connectivity for grouping pixels into features.

Return type:

GeoDataFrame

Returns:

geopandas.GeoDataFrame

Example

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif') as src:
>>>
>>>     # Convert the input image to a GeoDataFrame
>>>     df = src.gw.to_polygon(mask='source', num_workers=8)
to_raster(filename, readxsize=None, readysize=None, separate=False, out_block_type='gtiff', keep_blocks=False, verbose=0, overwrite=False, gdal_cache=512, scheduler='processes', n_jobs=1, n_workers=None, n_threads=None, n_chunks=None, overviews=False, resampling='nearest', driver='GTiff', nodata=None, blockxsize=512, blockysize=512, tags=None, **kwargs)[source]#

Writes an Xarray DataArray to a raster file.

Note

We advise using save() in place of this method.

Parameters:
  • filename (str) – The output file name to write to.

  • readxsize (Optional[int]) – The size of column chunks to read. If not given, readxsize defaults to Dask chunk size.

  • readysize (Optional[int]) – The size of row chunks to read. If not given, readysize defaults to Dask chunk size.

  • separate (Optional[bool]) – Whether to write blocks as separate files. Otherwise, write to a single file.

  • out_block_type (Optional[str]) – The output block type. Choices are [‘gtiff’, ‘zarr’]. Only used if separate = True.

  • keep_blocks (Optional[bool]) – Whether to keep the blocks stored on disk. Only used if separate = True.

  • verbose (Optional[int]) – The verbosity level.

  • overwrite (Optional[bool]) – Whether to overwrite an existing file.

  • gdal_cache (Optional[int]) – The GDAL cache size (in MB).

  • scheduler (Optional[str]) – The concurrent.futures scheduler to use. Choices are [‘processes’, ‘threads’].

  • n_jobs (Optional[int]) – The total number of parallel jobs.

  • n_workers (Optional[int]) – The number of processes.

  • n_threads (Optional[int]) – The number of threads.

  • n_chunks (Optional[int]) – The chunk size of windows. If not given, equal to n_workers x 3.

  • overviews (Optional[bool or list]) – Whether to build overview layers.

  • resampling (Optional[str]) – The resampling method for overviews when overviews is True or a list. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].

  • driver (Optional[str]) – The raster driver.

  • nodata (Optional[int]) – A ‘no data’ value.

  • blockxsize (Optional[int]) – The output x block size. Ignored if separate = True.

  • blockysize (Optional[int]) – The output y block size. Ignored if separate = True.

  • tags (Optional[dict]) – Image tags to write to file.

  • kwargs (Optional[dict]) – Additional keyword arguments to pass to rasterio.write.

Return type:

None

Returns:

None

Examples

>>> import geowombat as gw
>>>
>>> # Use dask.compute()
>>> with gw.open('input.tif') as ds:
>>>     ds.gw.to_raster('output.tif', n_jobs=8)
>>>
>>> # Use a dask client
>>> with gw.open('input.tif') as ds:
>>>     ds.gw.to_raster('output.tif', use_client=True, n_workers=8, n_threads=4)
>>>
>>> # Compress the output
>>> with gw.open('input.tif') as ds:
>>>     ds.gw.to_raster('output.tif', n_jobs=8, compress='lzw')
to_vector(filename, mask=None, connectivity=4)[source]#

Writes an Xarray DataArray to a vector file.

Parameters:
  • filename (str) – The output file name to write to.

  • mask (numpy ndarray or rasterio Band object, optional) – Must evaluate to bool (rasterio.bool_ or rasterio.uint8). Values of False or 0 will be excluded from feature generation. Note well that this is the inverse sense from Numpy’s, where a mask value of True indicates invalid data in an array. If source is a Numpy masked array and mask is None, the source’s mask will be inverted and used in place of mask.

  • connectivity (Optional[int]) – Use 4 or 8 pixel connectivity for grouping pixels into features.

Return type:

None

Returns:

None

to_vrt(filename, overwrite=False, resampling=None, nodata=None, init_dest_nodata=True, warp_mem_limit=128)[source]#

Writes a file to a VRT file.

Parameters:
  • filename (str | Path) – The output file name to write to.

  • overwrite (Optional[bool]) – Whether to overwrite an existing VRT file.

  • resampling (Optional[object]) – The resampling algorithm for rasterio.vrt.WarpedVRT.

  • nodata (Optional[float or int]) – The ‘no data’ value for rasterio.vrt.WarpedVRT.

  • init_dest_nodata (Optional[bool]) – Whether or not to initialize output to nodata for rasterio.vrt.WarpedVRT.

  • warp_mem_limit (Optional[int]) – The GDAL memory limit for rasterio.vrt.WarpedVRT.

Return type:

None

Examples

>>> import geowombat as gw
>>> from rasterio.enums import Resampling
>>>
>>> # Transform a CRS and save to VRT
>>> with gw.config.update(ref_crs=102033):
>>>     with gw.open('image.tif') as src:
>>>         src.gw.to_vrt(
>>>             'output.vrt',
>>>             resampling=Resampling.cubic,
>>>             warp_mem_limit=256
>>>         )
>>>
>>> # Load multiple files set to a common geographic extent
>>> bounds = (left, bottom, right, top)
>>> with gw.config.update(ref_bounds=bounds):
>>>     with gw.open(
>>>         ['image1.tif', 'image2.tif'], mosaic=True
>>>     ) as src:
>>>         src.gw.to_vrt('output.vrt')
transform_crs(dst_crs=None, dst_res=None, dst_width=None, dst_height=None, dst_bounds=None, src_nodata=None, dst_nodata=None, coords_only=False, resampling='nearest', warp_mem_limit=512, num_threads=1)[source]#

Transforms an xarray.DataArray to a new coordinate reference system.

Parameters:
  • dst_crs (Optional[CRS | int | dict | str]) – The destination CRS.

  • dst_res (Optional[tuple]) – The destination resolution.

  • dst_width (Optional[int]) – The destination width. Cannot be used with dst_res.

  • dst_height (Optional[int]) – The destination height. Cannot be used with dst_res.

  • dst_bounds (Optional[BoundingBox | tuple]) – The destination bounds, as a rasterio.coords.BoundingBox or as a tuple of (left, bottom, right, top).

  • src_nodata (Optional[int | float]) – The source nodata value. Pixels with this value will not be used for interpolation. If not set, it will default to the nodata value of the source image if a masked ndarray or rasterio band, if available.

  • dst_nodata (Optional[int | float]) – The nodata value used to initialize the destination; it will remain in all areas not covered by the reprojected source. Defaults to the nodata value of the destination image (if set), the value of src_nodata, or 0 (GDAL default).

  • coords_only (Optional[bool]) – Whether to return transformed coordinates. If coords_only = True then the array is not warped and the size is unchanged. It also avoids in-memory computations.

  • resampling (Optional[str]) – The resampling method if filename is a list. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].

  • warp_mem_limit (Optional[int]) – The warp memory limit.

  • num_threads (Optional[int]) – The number of parallel threads.

Return type:

DataArray

Returns:

xarray.DataArray

Example

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif') as src:
>>>     dst = src.gw.transform_crs(4326)
wi(nodata=None, mask=False, sensor=None, scale_factor=1.0)[source]#

Calculates the woody vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor.

  • scale_factor (Optional[float]) – A scale factor to apply to the data.

Return type:

DataArray

Equation:

\[WI = \Biggl \lbrace { 0,\text{ if } { red + SWIR1 \ge 0.5 } \atop 1 - \frac{red + SWIR1}{0.5}, \text{ otherwise } }\]
Returns:

Data range: 0 to 1

Return type:

xarray.DataArray

Parameters:
  • nodata (float | int | None) –

  • mask (bool) –

  • sensor (str | None) –

  • scale_factor (float | None) –

windows(row_chunks=None, col_chunks=None, return_type='window', ndim=2)[source]#

Generates windows for a row/column iteration.

Parameters:
  • row_chunks (Optional[int]) – The row chunk size. If not given, defaults to opened DataArray chunks.

  • col_chunks (Optional[int]) – The column chunk size. If not given, defaults to opened DataArray chunks.

  • return_type (Optional[str]) – The data to return. Choices are [‘data’, ‘slice’, ‘window’].

  • ndim (Optional[int]) – The number of required dimensions if return_type = ‘data’ or ‘slice’.

Returns:

yields xarray.DataArray, tuple, or rasterio.windows.Window

geowombat.core.io module#

geowombat.core.io.apply(infile, outfile, block_func, args=None, count=1, scheduler='processes', gdal_cache=512, n_jobs=4, overwrite=False, tags=None, **kwargs)[source]#

Applies a function and writes results to file.

Parameters:
  • infile (str) – The input file to process.

  • outfile (str) – The output file.

  • block_func (func) – The user function to apply to each block. The function should always return the window, the data, and at least one argument. The block data inside the function will be a 2d array if the input image has 1 band, otherwise a 3d array.

  • args (Optional[tuple]) – Additional arguments to pass to block_func.

  • count (Optional[int]) – The band count for the output file.

  • scheduler (Optional[str]) – The concurrent.futures scheduler to use. Choices are [‘threads’, ‘processes’].

  • gdal_cache (Optional[int]) – The GDAL cache size (in MB).

  • n_jobs (Optional[int]) – The number of blocks to process in parallel.

  • overwrite (Optional[bool]) – Whether to overwrite an existing output file.

  • tags (Optional[dict]) – Image tags to write to file.

  • kwargs (Optional[dict]) – Additional keyword arguments to pass to rasterio.open.

Returns:

None, writes to outfile

Examples

>>> import geowombat as gw
>>>
>>> # Here is a function with no arguments
>>> def my_func0(w, block, arg):
>>>     return w, block
>>>
>>> gw.apply('input.tif',
>>>          'output.tif',
>>>           my_func0,
>>>           n_jobs=8)
>>>
>>> # Here is a function with 1 argument
>>> def my_func1(w, block, arg):
>>>     return w, block * arg
>>>
>>> gw.apply('input.tif',
>>>          'output.tif',
>>>           my_func1,
>>>           args=(10.0,),
>>>           n_jobs=8)
geowombat.core.io.compress_raster(infile, outfile, n_jobs=1, gdal_cache=512, compress='lzw', tags=None)[source]#

Compresses a raster file.

Parameters:
  • infile (str) – The file to compress.

  • outfile (str) – The output file.

  • n_jobs (Optional[int]) – The number of concurrent blocks to write.

  • gdal_cache (Optional[int]) – The GDAL cache size (in MB).

  • compress (Optional[str]) – The compression method.

  • tags (Optional[dict]) – Image tags to write to file.

Returns:

None

geowombat.core.io.get_norm_indices(n_bands, window_slice, indexes_multi)[source]#
geowombat.core.io.save(data, filename, mode='w', nodata=None, overwrite=False, client=None, compute=True, tags=None, compress='none', compression=None, num_workers=1, log_progress=True, tqdm_kwargs=None, bigtiff=None)[source]#

Saves a DataArray to raster using rasterio/dask.

Parameters:
  • filename (str | Path) – The output file name to write to.

  • overwrite (Optional[bool]) – Whether to overwrite an existing file. Default is False.

  • mode (Optional[str]) – The file storage mode. Choices are [‘w’, ‘r+’].

  • nodata (Optional[float | int]) – The ‘no data’ value. If None (default), the ‘no data’ value is taken from the DataArray metadata.

  • client (Optional[Client object]) – A dask.distributed.Client client object to persist data. Default is None.

  • compute (Optinoal[bool]) – Whether to compute and write to filename. Otherwise, return the dask task graph. If True, compute and write to filename. If False, return the dask task graph. Default is True.

  • tags (Optional[dict]) – Metadata tags to write to file. Default is None.

  • compress (Optional[str]) – The file compression type. Default is ‘none’, or no compression.

  • compression (Optional[str]) –

    The file compression type. Default is ‘none’, or no compression.

    Deprecated since version 2.1.4: Use ‘compress’ – ‘compression’ will be removed in >=2.2.0.

  • num_workers (Optional[int]) – The number of dask workers (i.e., chunks) to write concurrently. Default is 1.

  • log_progress (Optional[bool]) – Whether to log the progress bar during writing. Default is True.

  • tqdm_kwargs (Optional[dict]) – Keyword arguments to pass to tqdm.

  • bigtiff (Optional[str]) – A GDAL BIGTIFF flag. Choices are [“YES”, “NO”, “IF_NEEDED”, “IF_SAFER”].

  • data (DataArray) –

Returns:

None, writes to filename

Example

>>> import geowombat as gw
>>>
>>> with gw.open('file.tif') as src:
>>>     result = ...
>>>     gw.save(result, 'output.tif', compress='lzw', num_workers=8)
>>>
>>> # Create delayed write tasks and compute later
>>> tasks = [gw.save(array, 'output.tif', compute=False) for array in array_list]
>>> # Write and close files
>>> dask.compute(tasks, num_workers=8)
geowombat.core.io.to_netcdf(data, filename, overwrite=False, compute=True, *args, **kwargs)[source]#

Writes an Xarray DataArray to a NetCDF file.

Parameters:
  • data (DataArray) – The xarray.DataArray to write.

  • filename (str) – The output file name to write to.

  • overwrite (Optional[bool]) – Whether to overwrite an existing file. Default is False.

  • compute (Optinoal[bool]) – Whether to compute and write to filename. Otherwise, return the dask task graph. Default is True.

  • args (DataArray) – Additional DataArrays to stack.

  • kwargs (dict) – Encoding arguments.

Returns:

None, writes to filename

Examples

>>> import geowombat as gw
>>> import xarray as xr
>>>
>>> # Write a single DataArray to a .nc file
>>> with gw.config.update(sensor='l7'):
>>>     with gw.open('LC08_L1TP_225078_20200219_20200225_01_T1.tif') as src:
>>>         gw.to_netcdf(src, 'filename.nc', zlib=True, complevel=5)
>>>
>>> # Add extra layers
>>> with gw.config.update(sensor='l7'):
>>>     with gw.open(
>>>         'LC08_L1TP_225078_20200219_20200225_01_T1.tif'
>>>     ) as src, gw.open(
>>>         'LC08_L1TP_225078_20200219_20200225_01_T1_angles.tif',
>>>         band_names=['zenith', 'azimuth']
>>>     ) as ang:
>>>         src = (
>>>             xr.where(
>>>                 src == 0, -32768, src
>>>             )
>>>             .astype('int16')
>>>             .assign_attrs(**src.attrs)
>>>         )
>>>
>>>         gw.to_netcdf(
>>>             src,
>>>             'filename.nc',
>>>             ang.astype('int16'),
>>>             zlib=True,
>>>             complevel=5,
>>>             _FillValue=-32768
>>>         )
>>>
>>> # Open the data and convert to a DataArray
>>> with xr.open_dataset(
>>>     'filename.nc', engine='h5netcdf', chunks=256
>>> ) as ds:
>>>     src = ds.to_array(dim='band')
geowombat.core.io.to_raster(data, filename, readxsize=None, readysize=None, separate=False, out_block_type='gtiff', keep_blocks=False, verbose=0, overwrite=False, gdal_cache=512, scheduler='mpool', n_jobs=1, n_workers=None, n_threads=None, n_chunks=None, padding=None, tags=None, tqdm_kwargs=None, **kwargs)[source]#

Writes a dask array to a raster file.

Note

We advise using save() in place of this method.

Parameters:
  • data (DataArray) – The xarray.DataArray to write.

  • filename (str) – The output file name to write to.

  • readxsize (Optional[int]) – The size of column chunks to read. If not given, readxsize defaults to Dask chunk size.

  • readysize (Optional[int]) – The size of row chunks to read. If not given, readysize defaults to Dask chunk size.

  • separate (Optional[bool]) – Whether to write blocks as separate files. Otherwise, write to a single file.

  • out_block_type (Optional[str]) – The output block type. Choices are [‘gtiff’, ‘zarr’]. Only used if separate = True.

  • keep_blocks (Optional[bool]) – Whether to keep the blocks stored on disk. Only used if separate = True.

  • verbose (Optional[int]) – The verbosity level.

  • overwrite (Optional[bool]) – Whether to overwrite an existing file.

  • gdal_cache (Optional[int]) – The GDAL cache size (in MB).

  • scheduler (Optional[str]) –

    The parallel task scheduler to use. Choices are [‘processes’, ‘threads’, ‘mpool’].

    mpool: process pool of workers using multiprocessing.Pool processes: process pool of workers using concurrent.futures threads: thread pool of workers using concurrent.futures

  • n_jobs (Optional[int]) – The total number of parallel jobs.

  • n_workers (Optional[int]) – The number of process workers.

  • n_threads (Optional[int]) – The number of thread workers.

  • n_chunks (Optional[int]) – The chunk size of windows. If not given, equal to n_workers x 50.

  • overviews (Optional[bool or list]) – Whether to build overview layers.

  • resampling (Optional[str]) – The resampling method for overviews when overviews is True or a list. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].

  • padding (Optional[tuple]) – Padding for each window. padding should be given as a tuple of (left pad, bottom pad, right pad, top pad). If padding is given, the returned list will contain a tuple of rasterio.windows.Window objects as (w1, w2), where w1 contains the normal window offsets and w2 contains the padded window offsets.

  • tags (Optional[dict]) – Image tags to write to file.

  • tqdm_kwargs (Optional[dict]) – Additional keyword arguments to pass to tqdm.

  • kwargs (Optional[dict]) – Additional keyword arguments to pass to rasterio.write.

Returns:

None, writes to filename

Examples

>>> import geowombat as gw
>>>
>>> # Use 8 parallel workers
>>> with gw.open('input.tif') as ds:
>>>     gw.to_raster(ds, 'output.tif', n_jobs=8)
>>>
>>> # Use 4 process workers and 2 thread workers
>>> with gw.open('input.tif') as ds:
>>>     gw.to_raster(ds, 'output.tif', n_workers=4, n_threads=2)
>>>
>>> # Control the window chunks passed to concurrent.futures
>>> with gw.open('input.tif') as ds:
>>>     gw.to_raster(ds, 'output.tif', n_workers=4, n_threads=2, n_chunks=16)
>>>
>>> # Compress the output and build overviews
>>> with gw.open('input.tif') as ds:
>>>     gw.to_raster(ds, 'output.tif', n_jobs=8, overviews=True, compress='lzw')
geowombat.core.io.to_vrt(data, filename, overwrite=False, resampling=None, nodata=None, init_dest_nodata=True, warp_mem_limit=128)[source]#

Writes a file to a VRT file.

Parameters:
  • data (DataArray) – The xarray.DataArray to write.

  • filename (str) – The output file name to write to.

  • overwrite (Optional[bool]) – Whether to overwrite an existing VRT file.

  • resampling (Optional[object]) – The resampling algorithm for rasterio.vrt.WarpedVRT. Default is ‘nearest’.

  • nodata (Optional[float or int]) – The ‘no data’ value for rasterio.vrt.WarpedVRT.

  • init_dest_nodata (Optional[bool]) – Whether or not to initialize output to nodata for rasterio.vrt.WarpedVRT.

  • warp_mem_limit (Optional[int]) – The GDAL memory limit for rasterio.vrt.WarpedVRT.

Returns:

None, writes to filename

Examples

>>> import geowombat as gw
>>> from rasterio.enums import Resampling
>>>
>>> # Transform a CRS and save to VRT
>>> with gw.config.update(ref_crs=102033):
>>>     with gw.open('image.tif') as src:
>>>         gw.to_vrt(
>>>             src,
>>>             'output.vrt',
>>>             resampling=Resampling.cubic,
>>>             warp_mem_limit=256
>>>         )
>>>
>>> # Load multiple files set to a common geographic extent
>>> bounds = (left, bottom, right, top)
>>> with gw.config.update(ref_bounds=bounds):
>>>     with gw.open(
>>>         ['image1.tif', 'image2.tif'], mosaic=True
>>>     ) as src:
>>>         gw.to_vrt(src, 'output.vrt')

geowombat.core.parallel module#

class geowombat.core.parallel.ParallelTask(data=None, chunks=None, row_chunks=None, col_chunks=None, padding=None, scheduler='threads', get_ray=False, n_workers=1, n_chunks=None)[source]#

Bases: object

A class for parallel tasks over a xarray.DataArray with returned results for each chunk.

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • row_chunks (Optional[int]) – The row chunk size to process in parallel.

  • col_chunks (Optional[int]) – The column chunk size to process in parallel.

  • padding (Optional[tuple]) – Padding for each window. padding should be given as a tuple of (left pad, bottom pad, right pad, top pad). If padding is given, the returned list will contain a tuple of rasterio.windows.Window objects as (w1, w2), where w1 contains the normal window offsets and w2 contains the padded window offsets.

  • scheduler (Optional[str]) –

    The parallel task scheduler to use. Choices are [‘processes’, ‘threads’, ‘mpool’].

    mpool: process pool of workers using multiprocessing.Pool ray: process pool of workers using ray.remote. processes: process pool of workers using concurrent.futures threads: thread pool of workers using concurrent.futures

  • get_ray (Optional[bool]) – Whether to get results from ray futures.

  • n_workers (Optional[int]) – The number of parallel workers for scheduler.

  • n_chunks (Optional[int]) – The chunk size of windows. If not given, equal to n_workers x 50.

Examples

>>> import geowombat as gw
>>> from geowombat.core.parallel import ParallelTask
>>>
>>> ########################
>>> # Use concurrent threads
>>> ########################
>>>
>>> def user_func_threads(*args):
>>>     data, window_id, num_workers = list(itertools.chain(*args))
>>>     return data.data.sum().compute(scheduler='threads', num_workers=num_workers)
>>>
>>> # Process 8 windows in parallel using threads
>>> # Process 4 dask chunks in parallel using threads
>>> # 32 total workers are needed
>>> with gw.open('image.tif') as src:
>>>     pt = ParallelTask(src, scheduler='threads', n_workers=8)
>>>     res = pt.map(user_func_threads, 4)
>>>
>>> #########
>>> # Use Ray
>>> #########
>>>
>>> import ray
>>>
>>> @ray.remote
>>> def user_func_ray(data_block_id, data_slice, window_id, num_workers):
>>>     return (
>>>         data_block_id[data_slice].data.sum()
>>>         .compute(scheduler='threads', num_workers=num_workers)
>>>     )
>>>
>>> ray.init(num_cpus=8)
>>>
>>> with gw.open('image.tif', chunks=512) as src:
>>>     pt = ParallelTask(
>>>         src, row_chunks=1024, col_chunks=1024, scheduler='ray', n_workers=8
>>>     )
>>>     res = ray.get(pt.map(user_func_ray, 4))
>>>
>>> ray.shutdown()
>>>
>>> #####################################
>>> # Use with a dask.distributed cluster
>>> #####################################
>>>
>>> from dask.distributed import LocalCluster
>>>
>>> with LocalCluster(
>>>     n_workers=4,
>>>     threads_per_worker=2,
>>>     scheduler_port=0,
>>>     processes=False,
>>>     memory_limit='4GB'
>>> ) as cluster:
>>>
>>>     with gw.open('image.tif') as src:
>>>         pt = ParallelTask(src, scheduler='threads', n_workers=4, n_chunks=50)
>>>         res = pt.map(user_func_threads, 2)
>>>
>>> # Map over multiple rasters
>>> for pt in ParallelTask(
>>>     ['image1.tif', 'image2.tif'], scheduler='threads', n_workers=4, n_chunks=500
>>> ):
>>>     res = pt.map(user_func_threads, 2)

Methods

map(func, *args, **kwargs)

Maps a function over a DataArray.

map(func, *args, **kwargs)[source]#

Maps a function over a DataArray.

Parameters:
  • func (func) –

    The function to apply to the data chunks.

    When using any scheduler other than ‘ray’ (i.e., ‘mpool’, ‘threads’, ‘processes’), the function should always be defined with *args. With these schedulers, the function will always return the DataArray window and window id as the first two arguments. If no user arguments are passed to map , the function will look like:

    def my_func(*args):

    data, window_id = list(itertools.chain(*args)) # do something return results

    If user arguments are passed, e.g., map(my_func, arg1, arg2), the function will look like:

    def my_func(*args):

    data, window_id, arg1, arg2 = list(itertools.chain(*args)) # do something return results

    When scheduler = ‘ray’, the user function requires an additional slice argument that looks like:

    @ray.remote def my_ray_func(data_block_id, data_slice, window_id):

    # do something return results

    Note the addition of the @ray.remote decorator, as well as the explicit arguments in the function call. Extra user arguments would look like:

    @ray.remote def my_ray_func(data_block_id, data_slice, window_id, arg1, arg2):

    # do something return results

    Other ray classes can also be used in place of a function.

  • args (items) – Function arguments.

  • kwargs (Optional[dict]) – Keyword arguments passed to multiprocessing.Pool().imap.

Returns:

Results for each data chunk.

Return type:

list

Examples

>>> import geowombat as gw
>>> from geowombat.core.parallel import ParallelTask
>>> from geowombat.data import l8_224078_20200518_points, l8_224078_20200518
>>> import geopandas as gpd
>>> import rasterio as rio
>>> import ray
>>> from ray.util import ActorPool
>>>
>>> @ray.remote
>>> class Actor(object):
>>>
>>>     def __init__(self, aoi_id=None, id_column=None, band_names=None):
>>>
>>>         self.aoi_id = aoi_id
>>>         self.id_column = id_column
>>>         self.band_names = band_names
>>>
>>>     # While the names can differ, these three arguments are required.
>>>     # For ``ParallelTask``, the callable function within an ``Actor`` must be named exec_task.
>>>     def exec_task(self, data_block_id, data_slice, window_id):
>>>
>>>         data_block = data_block_id[data_slice]
>>>         left, bottom, right, top = data_block.gw.bounds
>>>         aoi_sub = self.aoi_id.cx[left:right, bottom:top]
>>>
>>>         if aoi_sub.empty:
>>>             return aoi_sub
>>>
>>>         # Return a GeoDataFrame for each actor
>>>         return gw.extract(data_block,
>>>                           aoi_sub,
>>>                           id_column=self.id_column,
>>>                           band_names=self.band_names)
>>>
>>> ray.init(num_cpus=8)
>>>
>>> band_names = [1, 2, 3]
>>> df_id = ray.put(gpd.read_file(l8_224078_20200518_points))
>>>
>>> with rio.Env(GDAL_CACHEMAX=256*1e6) as env:
>>>
>>>     # Since we are iterating over the image block by block, we do not need to load
>>>     # a lazy dask array (i.e., chunked).
>>>     with gw.open(l8_224078_20200518, band_names=band_names, chunks=None) as src:
>>>
>>>         # Setup the pool of actors, one for each resource available to ``ray``.
>>>         actor_pool = ActorPool([Actor.remote(aoi_id=df_id, id_column='id', band_names=band_names)
>>>                                 for n in range(0, int(ray.cluster_resources()['CPU']))])
>>>
>>>         # Setup the task object
>>>         pt = ParallelTask(src, row_chunks=4096, col_chunks=4096, scheduler='ray', n_chunks=1000)
>>>         results = pt.map(actor_pool)
>>>
>>> del df_id, actor_pool
>>>
>>> ray.shutdown()

geowombat.core.properties module#

class geowombat.core.properties.DataProperties[source]#

Bases: object

Attributes:
affine

Get the affine transform object.

altitude

Get satellite altitudes (in km)

array_is_dask

Get whether the array is a Dask array.

avail_sensors

Get supported sensors.

band_chunks

Get the band chunk size.

bottom

Get the array bounding box bottom coordinate.

bounds

Get the array bounding box (left, bottom, right, top)

bounds_as_namedtuple

Get the array bounding box as a rasterio.coords.BoundingBox

cellx

Get the cell size in the x direction.

cellxh

Get the half width of the cell size in the x direction.

celly

Get the cell size in the y direction.

cellyh

Get the half width of the cell size in the y direction.

central_um

Get a dictionary of central wavelengths (in micrometers)

chunk_grid

Get the image chunk grid.

col_chunks

Get the column chunk size.

crs_to_pyproj

Get the CRS as a pyproj.CRS object.

dtype

Get the data type of the DataArray.

footprint_grid

Get the image footprint grid.

geodataframe

Get a geopandas.GeoDataFrame of the array bounds.

geometry

Get the polygon geometry of the array bounding box.

has_band

Check whether the DataArray has a band attribute.

has_band_coord

Check whether the DataArray has a band coordinate.

has_band_dim

Check whether the DataArray has a band dimension.

has_time

Check whether the DataArray has a time attribute.

has_time_coord

Check whether the DataArray has a time coordinate.

has_time_dim

Check whether the DataArray has a time dimension.

left

Get the array bounding box left coordinate.

meta

Get the array metadata.

nbands

Get the number of array bands.

ncols

Get the number of array columns.

ndims

Get the number of array dimensions.

nodataval

Get the ‘no data’ value from the attributes.

nrows

Get the number of array rows.

ntime

Get the number of time dimensions.

offsetval

Get the offset value.

pydatetime

Get Python datetime objects from the time dimension.

right

Get the array bounding box right coordinate.

row_chunks

Get the row chunk size.

scaleval

Get the scale factor value.

sensor_names

Get sensor full names.

time_chunks

Get the time chunk size.

top

Get the array bounding box top coordinate.

transform

Get the data transform (cell x, 0, left, 0, cell y, top)

unary_union

Get a representation of the union of the image bounds.

wavelengths

Get a dictionary of sensor wavelengths.

property affine: Affine#

Get the affine transform object.

property altitude#

Get satellite altitudes (in km)

property array_is_dask: bool#

Get whether the array is a Dask array.

property avail_sensors: list#

Get supported sensors.

property band_chunks: int#

Get the band chunk size.

property bottom: float#

Get the array bounding box bottom coordinate.

property bounds: Tuple[float, float, float, float]#

Get the array bounding box (left, bottom, right, top)

property bounds_as_namedtuple: BoundingBox#

Get the array bounding box as a rasterio.coords.BoundingBox

property cellx: float#

Get the cell size in the x direction.

property cellxh: float#

Get the half width of the cell size in the x direction.

property celly: float#

Get the cell size in the y direction.

property cellyh: float#

Get the half width of the cell size in the y direction.

property central_um#

Get a dictionary of central wavelengths (in micrometers)

property chunk_grid: GeoDataFrame#

Get the image chunk grid.

property col_chunks: int#

Get the column chunk size.

property crs_to_pyproj: CRS#

Get the CRS as a pyproj.CRS object.

property dtype: str#

Get the data type of the DataArray.

property footprint_grid: GeoDataFrame#

Get the image footprint grid.

property geodataframe: GeoDataFrame#

Get a geopandas.GeoDataFrame of the array bounds.

property geometry: box#

Get the polygon geometry of the array bounding box.

property has_band: bool#

Check whether the DataArray has a band attribute.

property has_band_coord: bool#

Check whether the DataArray has a band coordinate.

property has_band_dim: bool#

Check whether the DataArray has a band dimension.

property has_time: bool#

Check whether the DataArray has a time attribute.

property has_time_coord: bool#

Check whether the DataArray has a time coordinate.

property has_time_dim: bool#

Check whether the DataArray has a time dimension.

property left: float#

Get the array bounding box left coordinate.

Pixel shift reference:

pydata/xarrayrasterio_.py http://web.archive.org/web/20160326194152/http://remotesensing.org/geotiff/spec/geotiff2.5.html#2.5.2

property meta: Metadata#

Get the array metadata.

property nbands: int#

Get the number of array bands.

property ncols: int#

Get the number of array columns.

property ndims: int#

Get the number of array dimensions.

property nodataval: float | int | None#

Get the ‘no data’ value from the attributes.

property nrows: int#

Get the number of array rows.

property ntime: int#

Get the number of time dimensions.

property offsetval: float | int#

Get the offset value.

property pydatetime#

Get Python datetime objects from the time dimension.

property right: float#

Get the array bounding box right coordinate.

property row_chunks: int#

Get the row chunk size.

property scaleval: float | int#

Get the scale factor value.

property sensor_names#

Get sensor full names.

property time_chunks: int#

Get the time chunk size.

property top: float#

Get the array bounding box top coordinate.

property transform: Tuple[float, float, float, float, float, float]#

Get the data transform (cell x, 0, left, 0, cell y, top)

property unary_union#

Get a representation of the union of the image bounds.

property wavelengths#

Get a dictionary of sensor wavelengths.

class geowombat.core.properties.Metadata(left, bottom, right, top, bounds, affine, geometry)[source]#

Bases: object

Parameters:
  • left (float) –

  • bottom (float) –

  • right (float) –

  • top (float) –

  • bounds (Tuple[float, float, float, float]) –

  • affine (Affine) –

  • geometry (box) –

affine: Affine#
bottom: float#
bounds: Tuple[float, float, float, float]#
geometry: box#
left: float#
right: float#
top: float#
class geowombat.core.properties.WavelengthsBGR(blue, green, red)#

Bases: tuple

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

blue#

Alias for field number 0

green#

Alias for field number 1

red#

Alias for field number 2

class geowombat.core.properties.WavelengthsBGRN(blue, green, red, nir)#

Bases: tuple

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

blue#

Alias for field number 0

green#

Alias for field number 1

nir#

Alias for field number 3

red#

Alias for field number 2

class geowombat.core.properties.WavelengthsL57(blue, green, red, nir, swir1, swir2)#

Bases: tuple

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

blue#

Alias for field number 0

green#

Alias for field number 1

nir#

Alias for field number 3

red#

Alias for field number 2

swir1#

Alias for field number 4

swir2#

Alias for field number 5

class geowombat.core.properties.WavelengthsL57Pan(blue, green, red, nir, swir1, swir2, pan)#

Bases: tuple

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

blue#

Alias for field number 0

green#

Alias for field number 1

nir#

Alias for field number 3

pan#

Alias for field number 6

red#

Alias for field number 2

swir1#

Alias for field number 4

swir2#

Alias for field number 5

class geowombat.core.properties.WavelengthsL57Thermal(blue, green, red, nir, swir1, thermal, swir2)#

Bases: tuple

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

blue#

Alias for field number 0

green#

Alias for field number 1

nir#

Alias for field number 3

red#

Alias for field number 2

swir1#

Alias for field number 4

swir2#

Alias for field number 6

thermal#

Alias for field number 5

class geowombat.core.properties.WavelengthsL8(coastal, blue, green, red, nir, swir1, swir2, cirrus)#

Bases: tuple

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

blue#

Alias for field number 1

cirrus#

Alias for field number 7

coastal#

Alias for field number 0

green#

Alias for field number 2

nir#

Alias for field number 4

red#

Alias for field number 3

swir1#

Alias for field number 5

swir2#

Alias for field number 6

class geowombat.core.properties.WavelengthsL8Thermal(coastal, blue, green, red, nir, swir1, swir2, cirrus, tirs1, tirs2)#

Bases: tuple

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

blue#

Alias for field number 1

cirrus#

Alias for field number 7

coastal#

Alias for field number 0

green#

Alias for field number 2

nir#

Alias for field number 4

red#

Alias for field number 3

swir1#

Alias for field number 5

swir2#

Alias for field number 6

tirs1#

Alias for field number 8

tirs2#

Alias for field number 9

class geowombat.core.properties.WavelengthsL9(coastal, blue, green, red, nir, swir1, swir2, cirrus)#

Bases: tuple

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

blue#

Alias for field number 1

cirrus#

Alias for field number 7

coastal#

Alias for field number 0

green#

Alias for field number 2

nir#

Alias for field number 4

red#

Alias for field number 3

swir1#

Alias for field number 5

swir2#

Alias for field number 6

class geowombat.core.properties.WavelengthsL9Thermal(coastal, blue, green, red, nir, swir1, swir2, cirrus, tirs1, tirs2)#

Bases: tuple

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

blue#

Alias for field number 1

cirrus#

Alias for field number 7

coastal#

Alias for field number 0

green#

Alias for field number 2

nir#

Alias for field number 4

red#

Alias for field number 3

swir1#

Alias for field number 5

swir2#

Alias for field number 6

tirs1#

Alias for field number 8

tirs2#

Alias for field number 9

class geowombat.core.properties.WavelengthsMODSR(red, nir, blue, green, nir2, swir1, swir2)#

Bases: tuple

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

blue#

Alias for field number 2

green#

Alias for field number 3

nir#

Alias for field number 1

nir2#

Alias for field number 4

red#

Alias for field number 0

swir1#

Alias for field number 5

swir2#

Alias for field number 6

class geowombat.core.properties.WavelengthsPan(pan)#

Bases: tuple

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

pan#

Alias for field number 0

class geowombat.core.properties.WavelengthsRGB(red, green, blue)#

Bases: tuple

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

blue#

Alias for field number 2

green#

Alias for field number 1

red#

Alias for field number 0

class geowombat.core.properties.WavelengthsRGBN(red, green, blue, nir)#

Bases: tuple

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

blue#

Alias for field number 2

green#

Alias for field number 1

nir#

Alias for field number 3

red#

Alias for field number 0

class geowombat.core.properties.WavelengthsS2(blue, green, red, nir1, nir2, nir3, nir, rededge, swir1, swir2)#

Bases: tuple

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

blue#

Alias for field number 0

green#

Alias for field number 1

nir#

Alias for field number 6

nir1#

Alias for field number 3

nir2#

Alias for field number 4

nir3#

Alias for field number 5

red#

Alias for field number 2

rededge#

Alias for field number 7

swir1#

Alias for field number 8

swir2#

Alias for field number 9

class geowombat.core.properties.WavelengthsS220(nir1, nir2, nir3, rededge, swir1, swir2)#

Bases: tuple

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

nir1#

Alias for field number 0

nir2#

Alias for field number 1

nir3#

Alias for field number 2

rededge#

Alias for field number 3

swir1#

Alias for field number 4

swir2#

Alias for field number 5

class geowombat.core.properties.WavelengthsS2Cloudless(coastal, blue, red, nir1, nir, rededge, water, cirrus, swir1, swir2)#

Bases: tuple

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

blue#

Alias for field number 1

cirrus#

Alias for field number 7

coastal#

Alias for field number 0

nir#

Alias for field number 4

nir1#

Alias for field number 3

red#

Alias for field number 2

rededge#

Alias for field number 5

swir1#

Alias for field number 8

swir2#

Alias for field number 9

water#

Alias for field number 6

geowombat.core.properties.WavelengthsS2Full#

alias of WavelengthsS2

geowombat.core.properties.get_sensor_info(key=None, sensor=None)[source]#

geowombat.core.series module#

class geowombat.core.series.BaseSeries[source]#

Bases: _SeriesProps, _Warp

Attributes:
band_dict
blockxsize
blockysize
count
crs
height
nchunks
nodata
transform
width

Methods

group_dates(data, image_dates, band_names)

Groups data by dates.

ndarray_to_darray

open

warp

group_dates(data, image_dates, band_names)[source]#

Groups data by dates.

Return type:

Tuple[ndarray, List[datetime]]

Parameters:
  • data (ndarray) –

  • image_dates (List[datetime]) –

  • band_names (List[str]) –

static ndarray_to_darray(data, image_dates, band_names, y, x, attrs=None)[source]#
Return type:

DataArray

Parameters:
  • data (ndarray) –

  • image_dates (List[datetime]) –

  • band_names (List[str]) –

  • y (ndarray) –

  • x (ndarray) –

  • attrs (Dict | None) –

open(filenames)[source]#
class geowombat.core.series.SeriesStats(time_stats)[source]#

Bases: TimeModule

Methods

__call__(w, array, band_dict)

abs_slope_q1(data)

Calculates the absolute slope of the first quarter.

abs_slope_q2(data)

Calculates the absolute slope of the second quarter.

abs_slope_q3(data)

Calculates the absolute slope of the third quarter.

abs_slope_q4(data)

Calculates the absolute slope of the fourth quarter.

amp(array)

Calculates the amplitude.

calculate(array)

cv(array)

Calculates the coefficient of variation.

max(array)

Calculates the max.

mean(array)

Calculates the mean.

mean_abs_diff(array)

Calculates the mean absolute difference.

median(array)

Calculates the median.

min(array)

Calculates the min.

norm_abs_energy(array)

Calculates the normalized absolute energy.

percentile(array, p)

Calculates the nth percentile.

abs_slope_q1(data)[source]#

Calculates the absolute slope of the first quarter.

abs_slope_q2(data)[source]#

Calculates the absolute slope of the second quarter.

abs_slope_q3(data)[source]#

Calculates the absolute slope of the third quarter.

abs_slope_q4(data)[source]#

Calculates the absolute slope of the fourth quarter.

static amp(array)[source]#

Calculates the amplitude.

calculate(array)[source]#
static cv(array)[source]#

Calculates the coefficient of variation.

static max(array)[source]#

Calculates the max.

static mean(array)[source]#

Calculates the mean.

mean_abs_diff(array)[source]#

Calculates the mean absolute difference.

static median(array)[source]#

Calculates the median.

static min(array)[source]#

Calculates the min.

static norm_abs_energy(array)[source]#

Calculates the normalized absolute energy.

static percentile(array, p)[source]#

Calculates the nth percentile.

class geowombat.core.series.TimeModule[source]#

Bases: object

Methods

__call__(w, array, band_dict)

calculate(data)

Calculates the user function.

abstract calculate(data)[source]#

Calculates the user function.

Parameters:
  • | (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

  • data (Any) –

Return type:

Any

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class geowombat.core.series.TimeModulePipeline(module_list)[source]#

Bases: object

Methods

__call__(w, array, band_dict)

Parameters:

module_list (List[TimeModule]) –

class geowombat.core.series.TransferLib(transfer_lib)[source]#

Bases: object

Device transfers.

Parameters:

transfer_lib (str) –

The device library to transfer to. Choices are [‘jax’, ‘keras’, ‘numpy’, ‘pytorch’, ‘tensorflow’].

’jax’ -> GPU ‘keras’ -> GPU ‘numpy’ -> CPU ‘pytorch’ -> GPU ‘tensorflow’ -> GPU

Methods

__call__(array)

jax

keras

numpy

pytorch

tensorflow

static jax(array)[source]#
static keras(array)[source]#
static numpy(array)[source]#
static pytorch(array)[source]#
static tensorflow(array)[source]#

geowombat.core.sops module#

class geowombat.core.sops.SpatialOperations[source]#

Bases: PropertyMixin

Methods

calc_area(data, values[, op, units, ...])

Calculates the area of data values.

check_sensor(data[, sensor, return_error])

Checks if a sensor name is provided.

check_sensor_band_names(data, sensor, band_names)

Checks if band names can be collected from a sensor's wavelength names.

clip(data, df[, query, mask_data, expand_by])

Clips a DataArray by vector polygon geometry.

clip_by_polygon(data, df[, query, ...])

Clips a DataArray by vector polygon geometry.

coregister(target, reference[, ...])

Co-registers an image, or images, using AROSICS.

extract(data, aoi[, bands, time_names, ...])

Extracts data within an area or points of interest.

mask(data, dataframe[, query, keep])

Masks a DataArray by vector polygon geometry.

recode(data, polygon, to_replace[, num_workers])

Recodes a DataArray with polygon mappings.

replace(data, to_replace)

Replace values given in to_replace with value.

sample(data[, method, band, n, strata, ...])

Generates samples from a raster.

subset(data[, left, top, right, bottom, ...])

Subsets a DataArray.

static calc_area(data, values, op='eq', units='km2', row_chunks=None, col_chunks=None, n_workers=1, n_threads=1, scheduler='threads', n_chunks=100)[source]#

Calculates the area of data values.

Parameters:
  • data (DataArray) – The xarray.DataArray to calculate area.

  • values (list) – A list of values.

  • op (Optional[str]) – The value sign. Choices are [‘gt’, ‘ge’, ‘lt’, ‘le’, ‘eq’].

  • units (Optional[str]) – The units to return. Choices are [‘km2’, ‘ha’].

  • row_chunks (Optional[int]) – The row chunk size to process in parallel.

  • col_chunks (Optional[int]) – The column chunk size to process in parallel.

  • n_workers (Optional[int]) – The number of parallel workers for scheduler.

  • n_threads (Optional[int]) – The number of parallel threads for dask.compute().

  • scheduler (Optional[str]) –

    The parallel task scheduler to use. Choices are [‘processes’, ‘threads’, ‘mpool’].

    mpool: process pool of workers using multiprocessing.Pool processes: process pool of workers using concurrent.futures threads: thread pool of workers using concurrent.futures

  • n_chunks (Optional[int]) – The chunk size of windows. If not given, equal to n_workers x 50.

Return type:

DataFrame

Returns:

pandas.DataFrame

Example

>>> import geowombat as gw
>>>
>>> # Read a land cover image with 512x512 chunks
>>> with gw.open('land_cover.tif', chunks=512) as src:
>>>
>>>     df = gw.calc_area(
>>>         src,
>>>         [1, 2, 5],        # calculate the area of classes 1, 2, and 5
>>>         units='km2',      # return area in kilometers squared
>>>         n_workers=4,
>>>         row_chunks=1024,  # iterate over larger chunks to use 512 chunks in parallel
>>>         col_chunks=1024
>>>     )
clip(data, df, query=None, mask_data=False, expand_by=0)[source]#

Clips a DataArray by vector polygon geometry.

Deprecated since version 2.1.7: Use geowombat.clip_by_polygon().

Parameters:
  • data (DataArray) – The xarray.DataArray to subset.

  • df (GeoDataFrame or str) – The geopandas.GeoDataFrame or filename to clip to.

  • query (Optional[str]) – A query to apply to df.

  • mask_data (Optional[bool]) – Whether to mask values outside of the df geometry envelope.

  • expand_by (Optional[int]) – Expand the clip array bounds by expand_by pixels on each side.

Return type:

DataArray

Returns:

xarray.DataArray

Examples

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif') as ds:
>>>     ds = gw.clip(ds, df, query="Id == 1")
>>>
>>> # or
>>>
>>> with gw.open('image.tif') as ds:
>>>     ds = ds.gw.clip(df, query="Id == 1")
clip_by_polygon(data, df, query=None, mask_data=False, expand_by=0)[source]#

Clips a DataArray by vector polygon geometry.

Parameters:
  • data (DataArray) – The xarray.DataArray to subset.

  • df (GeoDataFrame or str) – The geopandas.GeoDataFrame or filename to clip to.

  • query (Optional[str]) – A query to apply to df.

  • mask_data (Optional[bool]) – Whether to mask values outside of the df geometry envelope.

  • expand_by (Optional[int]) – Expand the clip array bounds by expand_by pixels on each side.

Return type:

DataArray

Returns:

xarray.DataArray

Examples

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif') as ds:
>>>     ds = gw.clip_by_polygon(ds, df, query="Id == 1")
static coregister(target, reference, band_names_reference=None, band_names_target=None, wkt_version=None, **kwargs)[source]#

Co-registers an image, or images, using AROSICS.

While the required inputs are DataArrays, the intermediate results are stored as NumPy arrays. Therefore, memory usage is constrained to the size of the input data. Dask is not used for any of the computation in this function.

Parameters:
  • target (DataArray or str) – The target xarray.DataArray or file name to co-register to reference.

  • reference (DataArray or str) – The reference xarray.DataArray or file name used to co-register target.

  • band_names_reference (Optional[list | tuple]) – Band names to open for the reference data.

  • band_names_target (Optional[list | tuple]) – Band names to open for the target data.

  • wkt_version (Optional[str]) – The WKT version to use with to_wkt().

  • kwargs (Optional[dict]) – Keyword arguments passed to arosics.

Return type:

DataArray

Reference:

https://pypi.org/project/arosics

Return type:

DataArray

Returns:

xarray.DataArray

Parameters:
  • target (str | Path | DataArray) –

  • reference (str | Path | DataArray) –

  • band_names_reference (Sequence[str] | None) –

  • band_names_target (Sequence[str] | None) –

  • wkt_version (str | None) –

Example

>>> import geowombat as gw
>>>
>>> # Co-register a single image to a reference image
>>> with gw.open('target.tif') as tar, gw.open('reference.tif') as ref:
>>>     results = gw.coregister(
>>>         tar, ref, q=True, ws=(512, 512), max_shift=3, CPUs=4
>>>     )
>>>
>>> # or
>>>
>>> results = gw.coregister(
>>>     'target.tif',
>>>     'reference.tif',
>>>     q=True,
>>>     ws=(512, 512),
>>>     max_shift=3,
>>>     CPUs=4
>>> )
extract(data, aoi, bands=None, time_names=None, band_names=None, frac=1.0, min_frac_area=None, all_touched=False, id_column='id', time_format='%Y%m%d', mask=None, n_jobs=8, verbose=0, n_workers=1, n_threads=-1, use_client=False, address=None, total_memory=24, processes=False, pool_kwargs=None, **kwargs)[source]#

Extracts data within an area or points of interest. Projections do not need to match, as they are handled ‘on-the-fly’.

Parameters:
  • data (DataArray) – The xarray.DataArray to extract data from.

  • aoi (str or GeoDataFrame) – A file or geopandas.GeoDataFrame to extract data frame.

  • bands (Optional[int or 1d array-like]) – A band or list of bands to extract. If not given, all bands are used. Bands should be GDAL-indexed (i.e., the first band is 1, not 0).

  • band_names (Optional[list]) – A list of band names. Length should be the same as bands.

  • time_names (Optional[list]) – A list of time names.

  • frac (Optional[float]) – A fractional subset of points to extract in each polygon feature.

  • min_frac_area (Optional[int | float]) – A minimum polygon area to use frac. Otherwise, use all samples within a polygon.

  • all_touched (Optional[bool]) – The all_touched argument is passed to rasterio.features.rasterize().

  • id_column (Optional[str]) – The id column name.

  • time_format (Optional[str]) – The datetime conversion format if time_names are datetime objects.

  • mask (Optional[GeoDataFrame or Shapely Polygon]) – A shapely.geometry.Polygon mask to subset to.

  • n_jobs (Optional[int]) – The number of features to rasterize in parallel.

  • verbose (Optional[int]) – The verbosity level.

  • n_workers (Optional[int]) – The number of process workers. Only applies when use_client = True.

  • n_threads (Optional[int]) – The number of thread workers. Only applies when use_client = True.

  • use_client (Optional[bool]) – Whether to use a dask client.

  • address (Optional[str]) – A cluster address to pass to client. Only used when use_client = True.

  • total_memory (Optional[int]) – The total memory (in GB) required when use_client = True.

  • processes (Optional[bool]) – Whether to use process workers with the dask.distributed client. Only applies when use_client = True.

  • pool_kwargs (Optional[dict]) – Keyword arguments passed to multiprocessing.Pool().imap().

  • kwargs (Optional[dict]) – Keyword arguments passed to dask.compute().

Return type:

GeoDataFrame

Returns:

geopandas.GeoDataFrame

Examples

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif') as src:
>>>     df = gw.extract(src, 'poly.gpkg')
>>>
>>> # On a cluster
>>> # Use a local cluster
>>> with gw.open('image.tif') as src:
>>>     df = gw.extract(src, 'poly.gpkg', use_client=True, n_threads=16)
>>>
>>> # Specify the client address with a local cluster
>>> with LocalCluster(
>>>     n_workers=1,
>>>     threads_per_worker=8,
>>>     scheduler_port=0,
>>>     processes=False,
>>>     memory_limit='4GB'
>>> ) as cluster:
>>>
>>>     with gw.open('image.tif') as src:
>>>         df = gw.extract(
>>>             src,
>>>             'poly.gpkg',
>>>             use_client=True,
>>>             address=cluster
>>>         )
static mask(data, dataframe, query=None, keep='in')[source]#

Masks a DataArray by vector polygon geometry.

Parameters:
  • data (DataArray) – The xarray.DataArray to mask.

  • dataframe (GeoDataFrame or str) – The geopandas.GeoDataFrame or filename to use for masking.

  • query (Optional[str]) – A query to apply to dataframe.

  • keep (Optional[str]) – If keep = ‘in’, mask values outside of the geometry (keep inside). Otherwise, if keep = ‘out’, mask values inside (keep outside).

Return type:

DataArray

Returns:

xarray.DataArray

Examples

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif') as ds:
>>>     ds = ds.gw.mask(df)
recode(data, polygon, to_replace, num_workers=1)[source]#

Recodes a DataArray with polygon mappings.

Parameters:
  • data (DataArray) – The xarray.DataArray to recode.

  • polygon (GeoDataFrame | str) – The geopandas.DataFrame or file with polygon geometry.

  • to_replace (dict) –

    How to find the values to replace. Dictionary mappings should be given as {from: to} pairs. If to_replace is an integer/string mapping, the to string should be ‘mode’.

    {1: 5}:

    recode values of 1 to 5

    {1: ‘mode’}:

    recode values of 1 to the polygon mode

  • num_workers (Optional[int]) – The number of parallel Dask workers (only used if to_replace has a ‘mode’ mapping).

Return type:

DataArray

Returns:

xarray.DataArray

Example

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif', chunks=512) as ds:
>>>     # Recode 1 with 5 within a polygon
>>>     res = gw.recode(ds, 'poly.gpkg', {1: 5})
static replace(data, to_replace)[source]#

Replace values given in to_replace with value.

Parameters:
  • data (DataArray) – The xarray.DataArray to recode.

  • to_replace (dict) –

    How to find the values to replace. Dictionary mappings should be given as {from: to} pairs. If to_replace is an integer/string mapping, the to string should be ‘mode’.

    {1: 5}:

    recode values of 1 to 5

    {1: ‘mode’}:

    recode values of 1 to the polygon mode

Return type:

DataArray

Returns:

xarray.DataArray

Example

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif', chunks=512) as ds:
>>>     # Replace 1 with 5
>>>     res = gw.replace(ds, {1: 5})
sample(data, method='random', band=None, n=None, strata=None, spacing=None, min_dist=None, max_attempts=10, num_workers=1, verbose=1, **kwargs)[source]#

Generates samples from a raster.

Parameters:
  • data (DataArray) – The xarray.DataArray to extract data from.

  • method (Optional[str]) – The sampling method. Choices are [‘random’, ‘systematic’].

  • band (Optional[int or str]) – The band name to extract from. Only required if method = ‘random’ and strata is given.

  • n (Optional[int]) – The total number of samples. Only required if method = ‘random’.

  • strata (Optional[dict]) –

    The strata to sample within. The dictionary key–>value pairs should be {‘conditional,value’: sample size}.

    E.g.,

    strata = {‘==,1’: 0.5, ‘>=,2’: 0.5} … would sample 50% of total samples within class 1 and 50% of total samples in class >= 2.

    strata = {‘==,1’: 10, ‘>=,2’: 20} … would sample 10 samples within class 1 and 20 samples in class >= 2.

  • spacing (Optional[float]) – The spacing (in map projection units) when method = ‘systematic’.

  • min_dist (Optional[float or int]) – A minimum distance allowed between samples. Only applies when method = ‘random’.

  • max_attempts (Optional[int]) – The maximum numer of attempts to sample points > min_dist from each other.

  • num_workers (Optional[int]) – The number of parallel workers for dask.compute().

  • verbose (Optional[int]) – The verbosity level.

  • kwargs (Optional[dict]) – Keyword arguments passed to geowombat.extract.

Return type:

GeoDataFrame

Returns:

geopandas.GeoDataFrame

Examples

>>> import geowombat as gw
>>>
>>> # Sample 100 points randomly across the image
>>> with gw.open('image.tif') as src:
>>>     df = gw.sample(src, n=100)
>>>
>>> # Sample points systematically (with 10km spacing) across the image
>>> with gw.open('image.tif') as src:
>>>     df = gw.sample(src, method='systematic', spacing=10000.0)
>>>
>>> # Sample 50% of 100 in class 1 and 50% in classes >= 2
>>> strata = {'==,1': 0.5, '>=,2': 0.5}
>>> with gw.open('image.tif') as src:
>>>     df = gw.sample(src, band=1, n=100, strata=strata)
>>>
>>> # Specify a per-stratum minimum allowed point distance of 1,000 meters
>>> with gw.open('image.tif') as src:
>>>     df = gw.sample(src, band=1, n=100, min_dist=1000, strata=strata)
static subset(data, left=None, top=None, right=None, bottom=None, rows=None, cols=None, center=False, mask_corners=False)[source]#

Subsets a DataArray.

Parameters:
  • data (DataArray) – The xarray.DataArray to subset.

  • left (Optional[float]) – The left coordinate.

  • top (Optional[float]) – The top coordinate.

  • right (Optional[float]) – The right coordinate.

  • bottom (Optional[float]) – The bottom coordinate.

  • rows (Optional[int]) – The number of output rows.

  • cols (Optional[int]) – The number of output rows.

  • center (Optional[bool]) – Whether to center the subset on left and top.

  • mask_corners (Optional[bool]) – Whether to mask corners (*requires pymorph).

Return type:

DataArray

Returns:

xarray.DataArray

Example

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif', chunks=512) as ds:
>>>     ds_sub = gw.subset(
>>>         ds,
>>>         left=-263529.884,
>>>         top=953985.314,
>>>         rows=2048,
>>>         cols=2048
>>>     )

geowombat.core.stac module#

class geowombat.core.stac.STACCollections(value)[source]#

Bases: Enum

An enumeration.

cop_dem_glo_30 = 'cop_dem_glo_30'#
io_lulc = 'io_lulc'#
landsat_c2_l1 = 'landsat_c2_l1'#
landsat_c2_l2 = 'landsat_c2_l2'#
landsat_l8_c2_l2 = 'landsat_l8_c2_l2'#
sentinel_3_lst = 'sentinel_3_lst'#
sentinel_s1_l1c = 'sentinel_s1_l1c'#
sentinel_s2_l1c = 'sentinel_s2_l1c'#
sentinel_s2_l2a = 'sentinel_s2_l2a'#
sentinel_s2_l2a_cogs = 'sentinel_s2_l2a_cogs'#
usda_cdl = 'usda_cdl'#
class geowombat.core.stac.STACNames(value)[source]#

Bases: Enum

STAC names.

element84_v0 = 'element84_v0'#
element84_v1 = 'element84_v1'#
microsoft_v1 = 'microsoft_v1'#
geowombat.core.stac.merge_stac(data, *other)[source]#

Merges DataArrays by time.

Parameters:
  • data (DataArray) – The DataArray to merge to.

  • other (list of DataArrays) – The DataArrays to merge.

Return type:

DataArray

Returns:

xarray.DataArray

geowombat.core.stac.open_stac(stac_catalog='microsoft_v1', collection=None, bounds=None, proj_bounds=None, start_date=None, end_date=None, cloud_cover_perc=None, bands=None, chunksize=256, mask_items=None, bounds_query=None, mask_data=False, epsg=None, resolution=None, resampling=Resampling.nearest, nodata_fill=None, view_asset_keys=False, extra_assets=None, out_path='.', max_items=100, max_extra_workers=1)[source]#

Opens a collection from a spatio-temporal asset catalog (STAC).

Parameters:
  • stac_catalog (str) – Choices are [‘element84_v0’, ‘element84_v1, ‘google’, ‘microsoft_v1’].

  • collection (str) –

    The STAC collection to open. Catalog options:

    element84_v0:

    sentinel_s2_l2a_cogs

    element84_v1:

    cop_dem_glo_30 landsat_c2_l2 sentinel_s2_l2a sentinel_s2_l1c sentinel_s1_l1c

    microsoft_v1:

    cop_dem_glo_30 landsat_c2_l1 landsat_c2_l2 landsat_l8_c2_l2 sentinel_s2_l2a sentinel_s1_l1c sentinel_3_lst io_lulc usda_cdl

  • bounds (sequence | str | Path | GeoDataFrame) – The search bounding box. This can also be given with the configuration manager (e.g., gw.config.update(ref_bounds=bounds)). The bounds CRS must be given in WGS/84 lat/lon (i.e., EPSG=4326).

  • proj_bounds (sequence) – The projected bounds to return data. If None (default), the returned bounds are the union of all collection scenes. See bounds in gjoseph92/stackstac for details.

  • start_date (str) – The start search date (yyyy-mm-dd).

  • end_date (str) – The end search date (yyyy-mm-dd).

  • cloud_cover_perc (float | int) – The maximum percentage cloud cover.

  • bands (sequence) – The bands to open.

  • chunksize (int) – The dask chunk size.

  • mask_items (sequence) – The items to mask.

  • bounds_query (Optional[str]) – A query to select bounds from the geopandas.GeoDataFrame.

  • mask_data (Optional[bool]) – Whether to mask the data. Only relevant if mask_items=True.

  • epsg (Optional[int]) – An EPSG code to warp to.

  • resolution (Optional[float | int]) – The cell resolution to resample to.

  • resampling (Optional[rasterio.enumsResampling enum]) – The resampling method.

  • nodata_fill (Optional[float | int]) – A fill value to replace ‘no data’ NaNs.

  • view_asset_keys (Optional[bool]) – Whether to view asset ids.

  • extra_assets (Optional[list]) – Extra assets (non-image assets) to download.

  • out_path (Optional[str | Path]) – The output path to save files to.

  • max_items (Optional[int]) – The maximum number of items to return from the search, even if there are more matching results, passed to pystac_client.ItemSearch. See https://pystac-client.readthedocs.io/en/latest/api.html#pystac_client.ItemSearch for details.

  • max_extra_workers (Optional[int]) – The maximum number of extra assets to download concurrently.

Return type:

DataArray

Returns:

xarray.DataArray

Examples

>>> from geowombat.core.stac import open_stac, merge_stac
>>>
>>> data_l, df_l = open_stac(
>>>     stac_catalog='microsoft_v1',
>>>     collection='landsat_c2_l2',
>>>     start_date='2020-01-01',
>>>     end_date='2021-01-01',
>>>     bounds='map.geojson',
>>>     bands=['red', 'green', 'blue', 'qa_pixel'],
>>>     mask_data=True,
>>>     extra_assets=['ang', 'mtl.txt', 'mtl.xml']
>>> )
>>>
>>> from rasterio.enums import Resampling
>>>
>>> data_s2, df_s2 = open_stac(
>>>     stac_catalog='element84_v1',
>>>     collection='sentinel_s2_l2a',
>>>     start_date='2020-01-01',
>>>     end_date='2021-01-01',
>>>     bounds='map.geojson',
>>>     bands=['blue', 'green', 'red'],
>>>     resampling=Resampling.cubic,
>>>     epsg=int(data_l.epsg.values),
>>>     extra_assets=['granule_metadata']
>>> )
>>>
>>> # Merge two temporal stacks
>>> stack = (
>>>     merge_stac(data_l, data_s2)
>>>     .sel(band='red')
>>>     .mean(dim='time')
>>> )

geowombat.core.util module#

class geowombat.core.util.Chunks[source]#

Bases: object

Methods

check_chunksize

check_chunktype

get_chunk_dim

static check_chunksize(chunksize, output='3d')[source]#
Return type:

tuple

Parameters:
  • chunksize (int) –

  • output (str) –

check_chunktype(chunksize, output='3d')[source]#
Parameters:
  • chunksize (int) –

  • output (str) –

static get_chunk_dim(chunksize)[source]#
class geowombat.core.util.MapProcesses[source]#

Bases: object

Methods

moving(data[, stat, perc, w, nodata, weights])

Applies a moving window function over Dask array blocks.

static moving(data, stat='mean', perc=50, w=3, nodata=None, weights=False)[source]#

Applies a moving window function over Dask array blocks.

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • stat (Optional[str]) – The statistic to compute. Choices are [‘mean’, ‘std’, ‘var’, ‘min’, ‘max’, ‘perc’].

  • perc (Optional[int]) – The percentile to return if stat = ‘perc’.

  • w (Optional[int]) – The moving window size (in pixels).

  • nodata (Optional[int or float]) – A ‘no data’ value to ignore.

  • weights (Optional[bool]) – Whether to weight values by distance from window center.

Return type:

DataArray

Returns:

xarray.DataArray

Examples

>>> import geowombat as gw
>>>
>>> # Calculate the mean within a 5x5 window
>>> with gw.open('image.tif') as src:
>>>     res = gw.moving(ds, stat='mean', w=5, nodata=32767.0)
>>>
>>> # Calculate the 90th percentile within a 15x15 window
>>> with gw.open('image.tif') as src:
>>>     res = gw.moving(stat='perc', w=15, perc=90, nodata=32767.0)
>>>     res.data.compute(num_workers=4)
geowombat.core.util.estimate_array_mem(ntime, nbands, nrows, ncols, dtype)[source]#

Estimates the size of an array in-memory.

Parameters:
  • ntime (int) – The number of time dimensions.

  • nbands (int) – The number of band dimensions.

  • nrows (int) – The number of row dimensions.

  • ncols (int) – The number of column dimensions.

  • dtype (str) – The data type.

Returns:

int in MB

geowombat.core.util.get_file_extension(filename)[source]#

Gets file and directory name information.

Parameters:

filename (str) – The file name.

Return type:

namedtuple

Returns:

Name information as namedtuple.

geowombat.core.util.get_geometry_info(geometry, res)[source]#

Gets information from a Shapely geometry object.

Parameters:
  • geometry (object) – A shapely.geometry object.

  • res (tuple) – The cell resolution for the affine transform.

Return type:

namedtuple

Returns:

Geometry information as namedtuple.

geowombat.core.util.lazy_wombat(func)[source]#
geowombat.core.util.n_rows_cols(pixel_index, block_size, rows_cols)[source]#

Adjusts block size for the end of image rows and columns.

Parameters:
  • pixel_index (int) – The current pixel row or column index.

  • block_size (int) – The image block size.

  • rows_cols (int) – The total number of rows or columns in the image.

Return type:

int

Returns:

Adjusted block size as int.

geowombat.core.util.parse_filename_dates(filenames)[source]#

Parses dates from file names.

Parameters:

filenames (list) – A list of files to parse.

Return type:

Sequence

Returns:

list

geowombat.core.util.parse_wildcard(string)[source]#

Parses a search wildcard from a string.

Parameters:

string (str) – The string to parse.

Return type:

Sequence

Returns:

list

geowombat.core.util.project_coords(x, y, src_crs, dst_crs, return_as='1d', **kwargs)[source]#

Projects coordinates to a new CRS.

Parameters:
  • x (1d array-like) – The x coordinates.

  • y (1d array-like) – The y coordinates.

  • src_crs (str, dict, object) – The source CRS.

  • dst_crs (str, dict, object) – The destination CRS.

  • return_as (Optional[str]) – How to return the coordinates. Choices are [‘1d’, ‘2d’].

  • kwargs (Optional[dict]) – Keyword arguments passed to rasterio.warp.reproject.

Returns:

numpy.array, numpy.array or xr.DataArray

geowombat.core.util.sample_feature(df_row, id_column, df_columns, crs, res, all_touched, meta, frac, min_frac_area, feature_array=None)[source]#

Samples polygon features.

Parameters:
  • df_row (pandas.Series) –

  • id_column (str) –

  • df_columns (list) –

  • crs (object) –

  • res (tuple) –

  • all_touched (bool) –

  • meta (namedtuple) –

  • frac (float) –

  • min_frac_area (int | float) –

  • feature_array (Optional[ndarray]) –

Returns:

geopandas.GeoDataFrame

geowombat.core.util.sort_images_by_date(image_path, image_wildcard, date_pos=0, date_start=0, date_end=8, split_by='_', date_format='%Y%m%d', file_list=None, prepend_str=None)[source]#

Sorts images by date.

Parameters:
  • image_path (Path) – The image directory.

  • image_wildcard (str) – The image search wildcard.

  • date_pos (int) – The date starting position in the file name.

  • date_start (int) – The date starting position in the split.

  • date_end (int) – The date ending position in the split.

  • split_by (Optional[str]) – How to split the file name.

  • date_format (Optional[str]) – The date format for datetime.datetime.strptime().

  • file_list (Optional[list of Paths]) – A file list of names to sort. Overrides image_path.

  • prepend_str (Optional[str]) – A string to prepend to each filename.

Return type:

OrderedDict

Returns:

collections.OrderedDict

Example

>>> from pathlib import Path
>>> from geowombat.core import sort_images_by_date
>>>
>>> # image example: LC08_L1TP_176038_20190108_20190130_01_T1.tif
>>> image_path = Path('/path/to/images')
>>>
>>> image_dict = sort_images_by_date(image_path, '*.tif', 3, 0, 8)
>>> image_names = list(image_dict.keys())
>>> time_names = list(image_dict.values())

geowombat.core.vi module#

class geowombat.core.vi.BandMath[source]#

Bases: object

Methods

avi_math(data, sensor, wavelengths[, ...])

Advanced vegetation index.

evi2_math(data, sensor, wavelengths[, ...])

Two-band enhanced vegetation index.

evi_math(data, sensor, wavelengths[, ...])

Enhanced vegetation index.

gcvi_math(data, sensor, wavelengths[, ...])

Green chlorophyll vegetation index.

kndvi_math(data, sensor, wavelengths[, ...])

Kernel normalized difference vegetation index.

mask_and_assign(data, result, band_variable, ...)

Masks an xarray.DataArray.

nbr_math(data, sensor, wavelengths[, ...])

Normalized burn ratio.

ndvi_math(data, sensor, wavelengths[, ...])

Normalized difference vegetation index.

norm_diff_math(data, b1, b2, name, sensor[, ...])

Normalized difference index --> (b2 - b1) / (b2 + b1)

wi_math(data, sensor, wavelengths[, nodata, ...])

Woody index.

scale_and_assign

avi_math(data, sensor, wavelengths, nodata=None, mask=False, scale_factor=None)[source]#

Advanced vegetation index.

Returns:

xarray.DataArray

evi2_math(data, sensor, wavelengths, nodata=None, mask=False, scale_factor=None)[source]#

Two-band enhanced vegetation index.

Returns:

xarray.DataArray

evi_math(data, sensor, wavelengths, nodata=None, mask=False, scale_factor=None)[source]#

Enhanced vegetation index.

Returns:

xarray.DataArray

gcvi_math(data, sensor, wavelengths, nodata=None, mask=False, scale_factor=None)[source]#

Green chlorophyll vegetation index.

Returns:

xarray.DataArray

kndvi_math(data, sensor, wavelengths, nodata=None, mask=False, scale_factor=None)[source]#

Kernel normalized difference vegetation index.

Returns:

xarray.DataArray

static mask_and_assign(data, result, band_variable, band_name, nodata, new_name, mask, clip_min, clip_max, scale_factor, sensor, norm=False)[source]#

Masks an xarray.DataArray.

Parameters:
  • data (DataArray or Dataset) –

  • result (DataArray) –

  • band_variable (str) –

  • band_name (str) –

  • nodata (int or float) –

  • new_name (str) –

  • mask (bool) –

  • clip_min (float | int) –

  • clip_max (float | int) –

  • scale_factor (float) –

  • sensor (str) –

  • norm (bool) –

Return type:

DataArray

Returns:

xarray.DataArray

nbr_math(data, sensor, wavelengths, nodata=None, mask=False, scale_factor=None)[source]#

Normalized burn ratio.

Returns:

xarray.DataArray

ndvi_math(data, sensor, wavelengths, nodata=None, mask=False, scale_factor=None)[source]#

Normalized difference vegetation index.

Returns:

xarray.DataArray

norm_diff_math(data, b1, b2, name, sensor, nodata=None, mask=False, scale_factor=None)[source]#

Normalized difference index –> (b2 - b1) / (b2 + b1)

Parameters:
  • data (DataArray) –

  • b1 (DataArray) – Band 1

  • b2 (DataArray) – Band 2

  • name (str) –

  • sensor (str) –

  • nodata (Optional[int]) –

  • mask (Optional[bool]) –

  • scale_factor (Optional[float]) –

Return type:

DataArray

Returns:

xarray.DataArray

static scale_and_assign(data, nodata, band_variable, scale_factor, names, new_names)[source]#
Return type:

DataArray

Parameters:
  • data (DataArray) –

  • nodata (float | int) –

  • band_variable (str) –

  • scale_factor (float) –

  • names (Sequence[Any]) –

  • new_names (Sequence[Any]) –

wi_math(data, sensor, wavelengths, nodata=None, mask=False, scale_factor=None)[source]#

Woody index.

Returns:

xarray.DataArray

class geowombat.core.vi.TasseledCap[source]#

Bases: PropertyMixin, TasseledCapLookup

Methods

check_sensor(data[, sensor, return_error])

Checks if a sensor name is provided.

check_sensor_band_names(data, sensor, band_names)

Checks if band names can be collected from a sensor's wavelength names.

tasseled_cap(data[, nodata, sensor, ...])

Applies a tasseled cap transformation

get_coefficients

tasseled_cap(data, nodata=None, sensor=None, scale_factor=None)[source]#

Applies a tasseled cap transformation

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Return type:

DataArray

Examples

>>> import geowombat as gw
>>>
>>> with gw.config.update(sensor='qb', scale_factor=0.0001):
>>>     with gw.open('image.tif', band_names=['blue', 'green', 'red', 'nir']) as ds:
>>>         tcap = gw.tasseled_cap(ds)
Return type:

DataArray

Returns:

xarray.DataArray

Parameters:
  • data (DataArray) –

  • nodata (float | int | None) –

  • sensor (str | None) –

  • scale_factor (float | None) –

References

ASTER:

See [WS05]

CBERS2:

See [SHT11]

IKONOS:

See [Per06]

Landsat ETM+:

See [HWY+02]

Landsat OLI:

See [BZST14]

MODIS:

See [LC07]

Quickbird:

See [YEK05]

RapidEye:

See [ACG+14]

class geowombat.core.vi.TasseledCapLookup[source]#

Bases: object

Methods

get_coefficients

static get_coefficients(wavelengths, sensor)[source]#
class geowombat.core.vi.VegetationIndices[source]#

Bases: PropertyMixin, BandMath

Methods

avi(data[, nodata, mask, sensor, scale_factor])

Calculates the advanced vegetation index

avi_math(data, sensor, wavelengths[, ...])

Advanced vegetation index.

check_sensor(data[, sensor, return_error])

Checks if a sensor name is provided.

check_sensor_band_names(data, sensor, band_names)

Checks if band names can be collected from a sensor's wavelength names.

evi(data[, nodata, mask, sensor, scale_factor])

Calculates the enhanced vegetation index

evi2(data[, nodata, mask, sensor, scale_factor])

Calculates the two-band modified enhanced vegetation index

evi2_math(data, sensor, wavelengths[, ...])

Two-band enhanced vegetation index.

evi_math(data, sensor, wavelengths[, ...])

Enhanced vegetation index.

gcvi(data[, nodata, mask, sensor, scale_factor])

Calculates the green chlorophyll vegetation index

gcvi_math(data, sensor, wavelengths[, ...])

Green chlorophyll vegetation index.

kndvi(data[, nodata, mask, sensor, scale_factor])

Calculates the kernel normalized difference vegetation index

kndvi_math(data, sensor, wavelengths[, ...])

Kernel normalized difference vegetation index.

mask_and_assign(data, result, band_variable, ...)

Masks an xarray.DataArray.

nbr(data[, nodata, mask, sensor, scale_factor])

Calculates the normalized burn ratio

nbr_math(data, sensor, wavelengths[, ...])

Normalized burn ratio.

ndvi(data[, nodata, mask, sensor, scale_factor])

Calculates the normalized difference vegetation index

ndvi_math(data, sensor, wavelengths[, ...])

Normalized difference vegetation index.

norm_diff(data, b1, b2[, sensor, nodata, ...])

Calculates the normalized difference band ratio

norm_diff_math(data, b1, b2, name, sensor[, ...])

Normalized difference index --> (b2 - b1) / (b2 + b1)

wi(data[, nodata, mask, sensor, scale_factor])

Calculates the woody vegetation index

wi_math(data, sensor, wavelengths[, nodata, ...])

Woody index.

scale_and_assign

avi(data, nodata=None, mask=False, sensor=None, scale_factor=None)[source]#

Calculates the advanced vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[AVI = {(NIR \times (1.0 - red) \times (NIR - red))}^{0.3334}\]
Returns:

Data range: 0 to 1

Return type:

xarray.DataArray

evi(data, nodata=None, mask=False, sensor=None, scale_factor=None)[source]#

Calculates the enhanced vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[EVI = 2.5 \times \frac{NIR - red}{NIR + 6 \times red - 7.5 \times blue + 1}\]
Returns:

Data range: 0 to 1

Return type:

xarray.DataArray

evi2(data, nodata=None, mask=False, sensor=None, scale_factor=None)[source]#

Calculates the two-band modified enhanced vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[EVI2 = 2.5 \times \frac{NIR - red}{NIR + 1 + 2.4 \times red}\]
Reference:

See [JHDM08]

Returns:

Data range: 0 to 1

Return type:

xarray.DataArray

gcvi(data, nodata=None, mask=False, sensor=None, scale_factor=None)[source]#

Calculates the green chlorophyll vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[GCVI = \frac{NIR}{green} - 1\]
Returns:

Data range: 0 to 1

Return type:

xarray.DataArray

kndvi(data, nodata=None, mask=False, sensor=None, scale_factor=None)[source]#

Calculates the kernel normalized difference vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[kNDVI = tanh({NDVI}^2)\]
Reference:

See [CVCTMMartinez+21]

Returns:

Data range: -1 to 1

Return type:

xarray.DataArray

nbr(data, nodata=None, mask=False, sensor=None, scale_factor=None)[source]#

Calculates the normalized burn ratio

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[NBR = \frac{NIR - SWIR2}{NIR + SWIR2}\]
Returns:

Data range: -1 to 1

Return type:

xarray.DataArray

ndvi(data, nodata=None, mask=False, sensor=None, scale_factor=None)[source]#

Calculates the normalized difference vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[NDVI = \frac{NIR - red}{NIR + red}\]
Returns:

Data range: -1 to 1

Return type:

xarray.DataArray

norm_diff(data, b1, b2, sensor=None, nodata=None, mask=False, scale_factor=None)[source]#

Calculates the normalized difference band ratio

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • b1 (str) – The band name of the first band.

  • b2 (str) – The band name of the second band.

  • sensor (Optional[str]) – sensor (Optional[str]): The data’s sensor. If None, the band names should reflect the index being calculated.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[{norm}_{diff} = \frac{b2 - b1}{b2 + b1}\]
Returns:

Data range: -1 to 1

Return type:

xarray.DataArray

wi(data, nodata=None, mask=False, sensor=None, scale_factor=None)[source]#

Calculates the woody vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[WI = \Biggl \lbrace { 0,\text{ if } { red + SWIR1 \ge 0.5 } \atop 1 - \frac{red + SWIR1}{0.5}, \text{ otherwise } }\]
Reference:

See [LWC+13]

Returns:

Data range: 0 to 1

Return type:

xarray.DataArray

geowombat.core.vi.linear_transform(data, bands, scale, offset)[source]#

Linearly scales bands using a scale and an offset

Parameters:
  • data (DataArray) – The xarray.DataArray to transform.

  • bands (1d array-like) – The list of bands to transform.

  • scale (1d array) – The scale coefficients.

  • offset (1d array) – The offset coefficients.

Return type:

DataArray

Equation:

\[y = scale \times band + offset\]
Return type:

DataArray

Returns:

xarray.DataArray

Parameters:
  • data (DataArray) –

  • bands (Sequence[Any]) –

  • scale (Sequence[float | int]) –

  • offset (Sequence[float | int]) –

geowombat.core.windows module#

geowombat.core.windows.get_window_offsets(n_rows, n_cols, row_chunks, col_chunks, return_as='list', padding=None)[source]#

Gets window offset indices from image dimensions and chunk sizes.

Parameters:
  • n_rows (int) – The number of rows to iterate over.

  • n_cols (int) – The number of columns to iterate over.

  • row_chunks (int) – The row chunk size.

  • col_chunks (int) – The column chunk size.

  • return_as (Optional[str]) – How to return the window information. Choices are [‘dict’, ‘list’].

  • padding (Optional[tuple]) – Padding for each window. padding should be given as a tuple of (left pad, bottom pad, right pad, top pad). If padding is given, the returned list will contain a tuple of rasterio.windows.Window objects as (w1, w2), where w1 contains the normal window offsets and w2 contains the padded window offsets.

Return type:

Union[List[Window], List[Tuple[Window, Window]], Dict[str, Window]]

Returns:

Window information as list or dict.

Module contents#

geowombat.core.apply(infile, outfile, block_func, args=None, count=1, scheduler='processes', gdal_cache=512, n_jobs=4, overwrite=False, tags=None, **kwargs)[source]#

Applies a function and writes results to file.

Parameters:
  • infile (str) – The input file to process.

  • outfile (str) – The output file.

  • block_func (func) – The user function to apply to each block. The function should always return the window, the data, and at least one argument. The block data inside the function will be a 2d array if the input image has 1 band, otherwise a 3d array.

  • args (Optional[tuple]) – Additional arguments to pass to block_func.

  • count (Optional[int]) – The band count for the output file.

  • scheduler (Optional[str]) – The concurrent.futures scheduler to use. Choices are [‘threads’, ‘processes’].

  • gdal_cache (Optional[int]) – The GDAL cache size (in MB).

  • n_jobs (Optional[int]) – The number of blocks to process in parallel.

  • overwrite (Optional[bool]) – Whether to overwrite an existing output file.

  • tags (Optional[dict]) – Image tags to write to file.

  • kwargs (Optional[dict]) – Additional keyword arguments to pass to rasterio.open.

Returns:

None, writes to outfile

Examples

>>> import geowombat as gw
>>>
>>> # Here is a function with no arguments
>>> def my_func0(w, block, arg):
>>>     return w, block
>>>
>>> gw.apply('input.tif',
>>>          'output.tif',
>>>           my_func0,
>>>           n_jobs=8)
>>>
>>> # Here is a function with 1 argument
>>> def my_func1(w, block, arg):
>>>     return w, block * arg
>>>
>>> gw.apply('input.tif',
>>>          'output.tif',
>>>           my_func1,
>>>           args=(10.0,),
>>>           n_jobs=8)
geowombat.core.array_to_polygon(data, mask=None, connectivity=4, num_workers=1)#

Converts an xarray.DataArray` to a ``geopandas.GeoDataFrame

Parameters:
  • data (DataArray) – The xarray.DataArray to convert.

  • mask (Optional[str, numpy ndarray, or rasterio Band object]) – Must evaluate to bool (rasterio.bool_ or rasterio.uint8). Values of False or 0 will be excluded from feature generation. Note well that this is the inverse sense from Numpy’s, where a mask value of True indicates invalid data in an array. If source is a Numpy masked array and mask is None, the source’s mask will be inverted and used in place of mask. If mask is equal to ‘source’, then data is used as the mask.

  • connectivity (Optional[int]) – Use 4 or 8 pixel connectivity for grouping pixels into features.

  • num_workers (Optional[int]) – The number of parallel workers to send to dask.compute().

Return type:

GeoDataFrame

Returns:

geopandas.GeoDataFrame

Example

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif') as src:
>>>
>>>     # Convert the input image to a GeoDataFrame
>>>     df = gw.array_to_polygon(
>>>         src,
>>>         mask='source',
>>>         num_workers=8
>>>     )
geowombat.core.avi(data, nodata=None, mask=False, sensor=None, scale_factor=None)#

Calculates the advanced vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[AVI = {(NIR \times (1.0 - red) \times (NIR - red))}^{0.3334}\]
Returns:

Data range: 0 to 1

Return type:

xarray.DataArray

geowombat.core.bounds_to_coords(bounds, dst_crs)#

Converts bounds from longitude and latitude to native map coordinates.

Parameters:
  • bounds (tuple | rasterio.coords.BoundingBox) – The lat/lon bounds to transform.

  • dst_crs (str, object, or DataArray) – The CRS to transform to. It can be provided as a string, a CRS instance (e.g., pyproj.crs.CRS), or a geowombat.DataArray.

Returns:

(left, bottom, right, top)

Return type:

tuple

geowombat.core.calc_area(data, values, op='eq', units='km2', row_chunks=None, col_chunks=None, n_workers=1, n_threads=1, scheduler='threads', n_chunks=100)#

Calculates the area of data values.

Parameters:
  • data (DataArray) – The xarray.DataArray to calculate area.

  • values (list) – A list of values.

  • op (Optional[str]) – The value sign. Choices are [‘gt’, ‘ge’, ‘lt’, ‘le’, ‘eq’].

  • units (Optional[str]) – The units to return. Choices are [‘km2’, ‘ha’].

  • row_chunks (Optional[int]) – The row chunk size to process in parallel.

  • col_chunks (Optional[int]) – The column chunk size to process in parallel.

  • n_workers (Optional[int]) – The number of parallel workers for scheduler.

  • n_threads (Optional[int]) – The number of parallel threads for dask.compute().

  • scheduler (Optional[str]) –

    The parallel task scheduler to use. Choices are [‘processes’, ‘threads’, ‘mpool’].

    mpool: process pool of workers using multiprocessing.Pool processes: process pool of workers using concurrent.futures threads: thread pool of workers using concurrent.futures

  • n_chunks (Optional[int]) – The chunk size of windows. If not given, equal to n_workers x 50.

Return type:

DataFrame

Returns:

pandas.DataFrame

Example

>>> import geowombat as gw
>>>
>>> # Read a land cover image with 512x512 chunks
>>> with gw.open('land_cover.tif', chunks=512) as src:
>>>
>>>     df = gw.calc_area(
>>>         src,
>>>         [1, 2, 5],        # calculate the area of classes 1, 2, and 5
>>>         units='km2',      # return area in kilometers squared
>>>         n_workers=4,
>>>         row_chunks=1024,  # iterate over larger chunks to use 512 chunks in parallel
>>>         col_chunks=1024
>>>     )
geowombat.core.clip(data, df, query=None, mask_data=False, expand_by=0)#

Clips a DataArray by vector polygon geometry.

Deprecated since version 2.1.7: Use geowombat.clip_by_polygon().

Parameters:
  • data (DataArray) – The xarray.DataArray to subset.

  • df (GeoDataFrame or str) – The geopandas.GeoDataFrame or filename to clip to.

  • query (Optional[str]) – A query to apply to df.

  • mask_data (Optional[bool]) – Whether to mask values outside of the df geometry envelope.

  • expand_by (Optional[int]) – Expand the clip array bounds by expand_by pixels on each side.

Return type:

DataArray

Returns:

xarray.DataArray

Examples

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif') as ds:
>>>     ds = gw.clip(ds, df, query="Id == 1")
>>>
>>> # or
>>>
>>> with gw.open('image.tif') as ds:
>>>     ds = ds.gw.clip(df, query="Id == 1")
geowombat.core.clip_by_polygon(data, df, query=None, mask_data=False, expand_by=0)#

Clips a DataArray by vector polygon geometry.

Parameters:
  • data (DataArray) – The xarray.DataArray to subset.

  • df (GeoDataFrame or str) – The geopandas.GeoDataFrame or filename to clip to.

  • query (Optional[str]) – A query to apply to df.

  • mask_data (Optional[bool]) – Whether to mask values outside of the df geometry envelope.

  • expand_by (Optional[int]) – Expand the clip array bounds by expand_by pixels on each side.

Return type:

DataArray

Returns:

xarray.DataArray

Examples

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif') as ds:
>>>     ds = gw.clip_by_polygon(ds, df, query="Id == 1")
geowombat.core.coords_to_indices(x, y, transform)#

Converts map coordinates to array indices.

Parameters:
  • x (float or 1d array) – The x coordinates.

  • y (float or 1d array) – The y coordinates.

  • transform (object) – The affine transform.

Returns:

(col_index, row_index)

Return type:

tuple

Example

>>> import geowombat as gw
>>> from geowombat.core import coords_to_indices
>>>
>>> with gw.open('image.tif') as src:
>>>     j, i = coords_to_indices(x, y, src)
geowombat.core.coregister(target, reference, band_names_reference=None, band_names_target=None, wkt_version=None, **kwargs)#

Co-registers an image, or images, using AROSICS.

While the required inputs are DataArrays, the intermediate results are stored as NumPy arrays. Therefore, memory usage is constrained to the size of the input data. Dask is not used for any of the computation in this function.

Parameters:
  • target (DataArray or str) – The target xarray.DataArray or file name to co-register to reference.

  • reference (DataArray or str) – The reference xarray.DataArray or file name used to co-register target.

  • band_names_reference (Optional[list | tuple]) – Band names to open for the reference data.

  • band_names_target (Optional[list | tuple]) – Band names to open for the target data.

  • wkt_version (Optional[str]) – The WKT version to use with to_wkt().

  • kwargs (Optional[dict]) – Keyword arguments passed to arosics.

Return type:

DataArray

Reference:

https://pypi.org/project/arosics

Return type:

DataArray

Returns:

xarray.DataArray

Parameters:
  • target (str | Path | DataArray) –

  • reference (str | Path | DataArray) –

  • band_names_reference (Sequence[str] | None) –

  • band_names_target (Sequence[str] | None) –

  • wkt_version (str | None) –

Example

>>> import geowombat as gw
>>>
>>> # Co-register a single image to a reference image
>>> with gw.open('target.tif') as tar, gw.open('reference.tif') as ref:
>>>     results = gw.coregister(
>>>         tar, ref, q=True, ws=(512, 512), max_shift=3, CPUs=4
>>>     )
>>>
>>> # or
>>>
>>> results = gw.coregister(
>>>     'target.tif',
>>>     'reference.tif',
>>>     q=True,
>>>     ws=(512, 512),
>>>     max_shift=3,
>>>     CPUs=4
>>> )
geowombat.core.dask_to_xarray(data, dask_data, band_names)#

Converts a Dask array to an Xarray DataArray.

Parameters:
  • data (DataArray) – The DataArray with attribute information.

  • dask_data (Dask Array) – The Dask array to convert.

  • band_names (1d array-like) – The output band names.

Return type:

DataArray

Returns:

xarray.DataArray

geowombat.core.evi(data, nodata=None, mask=False, sensor=None, scale_factor=None)#

Calculates the enhanced vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[EVI = 2.5 \times \frac{NIR - red}{NIR + 6 \times red - 7.5 \times blue + 1}\]
Returns:

Data range: 0 to 1

Return type:

xarray.DataArray

geowombat.core.evi2(data, nodata=None, mask=False, sensor=None, scale_factor=None)#

Calculates the two-band modified enhanced vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[EVI2 = 2.5 \times \frac{NIR - red}{NIR + 1 + 2.4 \times red}\]
Reference:

See [JHDM08]

Returns:

Data range: 0 to 1

Return type:

xarray.DataArray

geowombat.core.extract(data, aoi, bands=None, time_names=None, band_names=None, frac=1.0, min_frac_area=None, all_touched=False, id_column='id', time_format='%Y%m%d', mask=None, n_jobs=8, verbose=0, n_workers=1, n_threads=-1, use_client=False, address=None, total_memory=24, processes=False, pool_kwargs=None, **kwargs)#

Extracts data within an area or points of interest. Projections do not need to match, as they are handled ‘on-the-fly’.

Parameters:
  • data (DataArray) – The xarray.DataArray to extract data from.

  • aoi (str or GeoDataFrame) – A file or geopandas.GeoDataFrame to extract data frame.

  • bands (Optional[int or 1d array-like]) – A band or list of bands to extract. If not given, all bands are used. Bands should be GDAL-indexed (i.e., the first band is 1, not 0).

  • band_names (Optional[list]) – A list of band names. Length should be the same as bands.

  • time_names (Optional[list]) – A list of time names.

  • frac (Optional[float]) – A fractional subset of points to extract in each polygon feature.

  • min_frac_area (Optional[int | float]) – A minimum polygon area to use frac. Otherwise, use all samples within a polygon.

  • all_touched (Optional[bool]) – The all_touched argument is passed to rasterio.features.rasterize().

  • id_column (Optional[str]) – The id column name.

  • time_format (Optional[str]) – The datetime conversion format if time_names are datetime objects.

  • mask (Optional[GeoDataFrame or Shapely Polygon]) – A shapely.geometry.Polygon mask to subset to.

  • n_jobs (Optional[int]) – The number of features to rasterize in parallel.

  • verbose (Optional[int]) – The verbosity level.

  • n_workers (Optional[int]) – The number of process workers. Only applies when use_client = True.

  • n_threads (Optional[int]) – The number of thread workers. Only applies when use_client = True.

  • use_client (Optional[bool]) – Whether to use a dask client.

  • address (Optional[str]) – A cluster address to pass to client. Only used when use_client = True.

  • total_memory (Optional[int]) – The total memory (in GB) required when use_client = True.

  • processes (Optional[bool]) – Whether to use process workers with the dask.distributed client. Only applies when use_client = True.

  • pool_kwargs (Optional[dict]) – Keyword arguments passed to multiprocessing.Pool().imap().

  • kwargs (Optional[dict]) – Keyword arguments passed to dask.compute().

Return type:

GeoDataFrame

Returns:

geopandas.GeoDataFrame

Examples

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif') as src:
>>>     df = gw.extract(src, 'poly.gpkg')
>>>
>>> # On a cluster
>>> # Use a local cluster
>>> with gw.open('image.tif') as src:
>>>     df = gw.extract(src, 'poly.gpkg', use_client=True, n_threads=16)
>>>
>>> # Specify the client address with a local cluster
>>> with LocalCluster(
>>>     n_workers=1,
>>>     threads_per_worker=8,
>>>     scheduler_port=0,
>>>     processes=False,
>>>     memory_limit='4GB'
>>> ) as cluster:
>>>
>>>     with gw.open('image.tif') as src:
>>>         df = gw.extract(
>>>             src,
>>>             'poly.gpkg',
>>>             use_client=True,
>>>             address=cluster
>>>         )
geowombat.core.gcvi(data, nodata=None, mask=False, sensor=None, scale_factor=None)#

Calculates the green chlorophyll vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[GCVI = \frac{NIR}{green} - 1\]
Returns:

Data range: 0 to 1

Return type:

xarray.DataArray

geowombat.core.indices_to_coords(col_index, row_index, transform)#

Converts array indices to map coordinates.

Parameters:
  • col_index (float or 1d array) – The column index.

  • row_index (float or 1d array) – The row index.

  • transform (Affine, DataArray, or tuple) – The affine transform.

Returns:

(x, y)

Return type:

tuple

Example

>>> import geowombat as gw
>>> from geowombat.core import indices_to_coords
>>>
>>> with gw.open('image.tif') as src:
>>>     x, y = indices_to_coords(j, i, src)
geowombat.core.kndvi(data, nodata=None, mask=False, sensor=None, scale_factor=None)#

Calculates the kernel normalized difference vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[kNDVI = tanh({NDVI}^2)\]
Reference:

See [CVCTMMartinez+21]

Returns:

Data range: -1 to 1

Return type:

xarray.DataArray

geowombat.core.lonlat_to_xy(lon, lat, dst_crs)#

Converts from longitude and latitude to native map coordinates.

Parameters:
  • lon (float) – The longitude to convert.

  • lat (float) – The latitude to convert.

  • dst_crs (str, object, or DataArray) – The CRS to transform to. It can be provided as a string, a CRS instance (e.g., pyproj.crs.CRS), or a geowombat.DataArray.

Returns:

(x, y)

Return type:

tuple

Example

>>> import geowombat as gw
>>> from geowombat.core import lonlat_to_xy
>>>
>>> lon, lat = -55.56822206, -25.46214220
>>>
>>> with gw.open('image.tif') as src:
>>>     x, y = lonlat_to_xy(lon, lat, src)
geowombat.core.mask(data, dataframe, query=None, keep='in')#

Masks a DataArray by vector polygon geometry.

Parameters:
  • data (DataArray) – The xarray.DataArray to mask.

  • dataframe (GeoDataFrame or str) – The geopandas.GeoDataFrame or filename to use for masking.

  • query (Optional[str]) – A query to apply to dataframe.

  • keep (Optional[str]) – If keep = ‘in’, mask values outside of the geometry (keep inside). Otherwise, if keep = ‘out’, mask values inside (keep outside).

Return type:

DataArray

Returns:

xarray.DataArray

Examples

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif') as ds:
>>>     ds = ds.gw.mask(df)
geowombat.core.moving(data, stat='mean', perc=50, w=3, nodata=None, weights=False)#

Applies a moving window function over Dask array blocks.

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • stat (Optional[str]) – The statistic to compute. Choices are [‘mean’, ‘std’, ‘var’, ‘min’, ‘max’, ‘perc’].

  • perc (Optional[int]) – The percentile to return if stat = ‘perc’.

  • w (Optional[int]) – The moving window size (in pixels).

  • nodata (Optional[int or float]) – A ‘no data’ value to ignore.

  • weights (Optional[bool]) – Whether to weight values by distance from window center.

Return type:

DataArray

Returns:

xarray.DataArray

Examples

>>> import geowombat as gw
>>>
>>> # Calculate the mean within a 5x5 window
>>> with gw.open('image.tif') as src:
>>>     res = gw.moving(ds, stat='mean', w=5, nodata=32767.0)
>>>
>>> # Calculate the 90th percentile within a 15x15 window
>>> with gw.open('image.tif') as src:
>>>     res = gw.moving(stat='perc', w=15, perc=90, nodata=32767.0)
>>>     res.data.compute(num_workers=4)
geowombat.core.nbr(data, nodata=None, mask=False, sensor=None, scale_factor=None)#

Calculates the normalized burn ratio

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[NBR = \frac{NIR - SWIR2}{NIR + SWIR2}\]
Returns:

Data range: -1 to 1

Return type:

xarray.DataArray

geowombat.core.ndarray_to_xarray(data, numpy_data, band_names, row_chunks=None, col_chunks=None, y=None, x=None, attrs=None)#

Converts a NumPy array to an Xarray DataArray.

Parameters:
  • data (DataArray) – The DataArray with attribute information.

  • numpy_data (ndarray) – The ndarray to convert.

  • band_names (1d array-like) – The output band names.

  • row_chunks (Optional[int]) – Overrides data.gw.row_chunks.

  • col_chunks (Optional[int]) – Overrides data.gw.col_chunks.

  • y (Optional[1d array]) – Overrides data.y.

  • x (Optional[1d array]) – Overrides data.x.

  • attrs (Optional[dict]) – Overrides data.attrs.

Return type:

DataArray

Returns:

xarray.DataArray

geowombat.core.ndvi(data, nodata=None, mask=False, sensor=None, scale_factor=None)#

Calculates the normalized difference vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[NDVI = \frac{NIR - red}{NIR + red}\]
Returns:

Data range: -1 to 1

Return type:

xarray.DataArray

geowombat.core.norm_diff(data, b1, b2, sensor=None, nodata=None, mask=False, scale_factor=None)#

Calculates the normalized difference band ratio

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • b1 (str) – The band name of the first band.

  • b2 (str) – The band name of the second band.

  • sensor (Optional[str]) – sensor (Optional[str]): The data’s sensor. If None, the band names should reflect the index being calculated.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[{norm}_{diff} = \frac{b2 - b1}{b2 + b1}\]
Returns:

Data range: -1 to 1

Return type:

xarray.DataArray

geowombat.core.polygon_to_array(polygon, col=None, data=None, cellx=None, celly=None, band_name=None, row_chunks=512, col_chunks=512, src_res=None, fill=0, default_value=1, all_touched=True, dtype='uint8', sindex=None, tap=False, bounds_by='intersection')#

Converts a polygon geometry to an xarray.DataArray.

Parameters:
  • polygon (GeoDataFrame | str) – The geopandas.DataFrame or file with polygon geometry.

  • col (Optional[str]) – The column in polygon you want to assign values from. If not set, creates a binary raster.

  • data (Optional[DataArray]) – An xarray.DataArray to use as a reference for rasterizing.

  • cellx (Optional[float]) – The output cell x size.

  • celly (Optional[float]) – The output cell y size.

  • band_name (Optional[list]) – The xarray.DataArray band name.

  • row_chunks (Optional[int]) – The dask row chunk size.

  • col_chunks (Optional[int]) – The dask column chunk size.

  • (Optional[tuple] (src_res) – A source resolution to align to.

  • fill (Optional[int]) – Used as fill value for all areas not covered by input geometries to rasterio.features.rasterize.

  • default_value (Optional[int]) – Used as value for all geometries, if not provided in shapes to rasterio.features.rasterize.

  • all_touched (Optional[bool]) – If True, all pixels touched by geometries will be burned in. If false, only pixels whose center is within the polygon or that are selected by Bresenham’s line algorithm will be burned in. The all_touched value for rasterio.features.rasterize().

  • dtype (Optional[str | numpy data type]) – The output data type for rasterio.features.rasterize().

  • sindex (Optional[object]) – An instanced of geopandas.GeoDataFrame.sindex.

  • tap (Optional[bool]) – Whether to target align pixels.

  • bounds_by (Optional[str]) –

    How to concatenate the output extent. Choices are [‘intersection’, ‘union’, ‘reference’].

    • reference: Use the bounds of the reference image

    • intersection: Use the intersection (i.e., minimum extent) of all the image bounds

    • union: Use the union (i.e., maximum extent) of all the image bounds

  • src_res (Sequence[float] | None) –

Return type:

DataArray

Returns:

xarray.DataArray

Example

>>> import geowombat as gw
>>> import geopandas as gpd
>>>
>>> df = gpd.read_file('polygons.gpkg')
>>>
>>> # 100x100 cell size
>>> data = gw.polygon_to_array(df, 100.0, 100.0)
>>>
>>> # Align to an existing image
>>> with gw.open('image.tif') as src:
>>>     data = gw.polygon_to_array(df, data=src)
geowombat.core.polygons_to_points(data, df, frac=1.0, min_frac_area=None, all_touched=False, id_column='id', n_jobs=1, **kwargs)#

Converts polygons to points.

Parameters:
  • data (DataArray or Dataset) – The xarray.DataArray or xarray.Dataset.

  • df (GeoDataFrame) – The geopandas.GeoDataFrame containing the geometry to rasterize.

  • frac (Optional[float]) – A fractional subset of points to extract in each feature.

  • min_frac_area (Optional[int | float]) – A minimum polygon area to use frac. Otherwise, use all samples within a polygon.

  • all_touched (Optional[bool]) – The all_touched argument is passed to rasterio.features.rasterize.

  • id_column (Optional[str]) – The ‘id’ column.

  • n_jobs (Optional[int]) – The number of features to rasterize in parallel.

  • kwargs (Optional[dict]) – Keyword arguments passed to multiprocessing.Pool().imap.

Returns:

geopandas.GeoDataFrame

geowombat.core.recode(data, polygon, to_replace, num_workers=1)#

Recodes a DataArray with polygon mappings.

Parameters:
  • data (DataArray) – The xarray.DataArray to recode.

  • polygon (GeoDataFrame | str) – The geopandas.DataFrame or file with polygon geometry.

  • to_replace (dict) –

    How to find the values to replace. Dictionary mappings should be given as {from: to} pairs. If to_replace is an integer/string mapping, the to string should be ‘mode’.

    {1: 5}:

    recode values of 1 to 5

    {1: ‘mode’}:

    recode values of 1 to the polygon mode

  • num_workers (Optional[int]) – The number of parallel Dask workers (only used if to_replace has a ‘mode’ mapping).

Return type:

DataArray

Returns:

xarray.DataArray

Example

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif', chunks=512) as ds:
>>>     # Recode 1 with 5 within a polygon
>>>     res = gw.recode(ds, 'poly.gpkg', {1: 5})
geowombat.core.replace(data, to_replace)#

Replace values given in to_replace with value.

Parameters:
  • data (DataArray) – The xarray.DataArray to recode.

  • to_replace (dict) –

    How to find the values to replace. Dictionary mappings should be given as {from: to} pairs. If to_replace is an integer/string mapping, the to string should be ‘mode’.

    {1: 5}:

    recode values of 1 to 5

    {1: ‘mode’}:

    recode values of 1 to the polygon mode

Return type:

DataArray

Returns:

xarray.DataArray

Example

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif', chunks=512) as ds:
>>>     # Replace 1 with 5
>>>     res = gw.replace(ds, {1: 5})
geowombat.core.sample(data, method='random', band=None, n=None, strata=None, spacing=None, min_dist=None, max_attempts=10, num_workers=1, verbose=1, **kwargs)#

Generates samples from a raster.

Parameters:
  • data (DataArray) – The xarray.DataArray to extract data from.

  • method (Optional[str]) – The sampling method. Choices are [‘random’, ‘systematic’].

  • band (Optional[int or str]) – The band name to extract from. Only required if method = ‘random’ and strata is given.

  • n (Optional[int]) – The total number of samples. Only required if method = ‘random’.

  • strata (Optional[dict]) –

    The strata to sample within. The dictionary key–>value pairs should be {‘conditional,value’: sample size}.

    E.g.,

    strata = {‘==,1’: 0.5, ‘>=,2’: 0.5} … would sample 50% of total samples within class 1 and 50% of total samples in class >= 2.

    strata = {‘==,1’: 10, ‘>=,2’: 20} … would sample 10 samples within class 1 and 20 samples in class >= 2.

  • spacing (Optional[float]) – The spacing (in map projection units) when method = ‘systematic’.

  • min_dist (Optional[float or int]) – A minimum distance allowed between samples. Only applies when method = ‘random’.

  • max_attempts (Optional[int]) – The maximum numer of attempts to sample points > min_dist from each other.

  • num_workers (Optional[int]) – The number of parallel workers for dask.compute().

  • verbose (Optional[int]) – The verbosity level.

  • kwargs (Optional[dict]) – Keyword arguments passed to geowombat.extract.

Return type:

GeoDataFrame

Returns:

geopandas.GeoDataFrame

Examples

>>> import geowombat as gw
>>>
>>> # Sample 100 points randomly across the image
>>> with gw.open('image.tif') as src:
>>>     df = gw.sample(src, n=100)
>>>
>>> # Sample points systematically (with 10km spacing) across the image
>>> with gw.open('image.tif') as src:
>>>     df = gw.sample(src, method='systematic', spacing=10000.0)
>>>
>>> # Sample 50% of 100 in class 1 and 50% in classes >= 2
>>> strata = {'==,1': 0.5, '>=,2': 0.5}
>>> with gw.open('image.tif') as src:
>>>     df = gw.sample(src, band=1, n=100, strata=strata)
>>>
>>> # Specify a per-stratum minimum allowed point distance of 1,000 meters
>>> with gw.open('image.tif') as src:
>>>     df = gw.sample(src, band=1, n=100, min_dist=1000, strata=strata)
geowombat.core.save(data, filename, mode='w', nodata=None, overwrite=False, client=None, compute=True, tags=None, compress='none', compression=None, num_workers=1, log_progress=True, tqdm_kwargs=None, bigtiff=None)[source]#

Saves a DataArray to raster using rasterio/dask.

Parameters:
  • filename (str | Path) – The output file name to write to.

  • overwrite (Optional[bool]) – Whether to overwrite an existing file. Default is False.

  • mode (Optional[str]) – The file storage mode. Choices are [‘w’, ‘r+’].

  • nodata (Optional[float | int]) – The ‘no data’ value. If None (default), the ‘no data’ value is taken from the DataArray metadata.

  • client (Optional[Client object]) – A dask.distributed.Client client object to persist data. Default is None.

  • compute (Optinoal[bool]) – Whether to compute and write to filename. Otherwise, return the dask task graph. If True, compute and write to filename. If False, return the dask task graph. Default is True.

  • tags (Optional[dict]) – Metadata tags to write to file. Default is None.

  • compress (Optional[str]) – The file compression type. Default is ‘none’, or no compression.

  • compression (Optional[str]) –

    The file compression type. Default is ‘none’, or no compression.

    Deprecated since version 2.1.4: Use ‘compress’ – ‘compression’ will be removed in >=2.2.0.

  • num_workers (Optional[int]) – The number of dask workers (i.e., chunks) to write concurrently. Default is 1.

  • log_progress (Optional[bool]) – Whether to log the progress bar during writing. Default is True.

  • tqdm_kwargs (Optional[dict]) – Keyword arguments to pass to tqdm.

  • bigtiff (Optional[str]) – A GDAL BIGTIFF flag. Choices are [“YES”, “NO”, “IF_NEEDED”, “IF_SAFER”].

  • data (DataArray) –

Returns:

None, writes to filename

Example

>>> import geowombat as gw
>>>
>>> with gw.open('file.tif') as src:
>>>     result = ...
>>>     gw.save(result, 'output.tif', compress='lzw', num_workers=8)
>>>
>>> # Create delayed write tasks and compute later
>>> tasks = [gw.save(array, 'output.tif', compute=False) for array in array_list]
>>> # Write and close files
>>> dask.compute(tasks, num_workers=8)
geowombat.core.sort_images_by_date(image_path, image_wildcard, date_pos=0, date_start=0, date_end=8, split_by='_', date_format='%Y%m%d', file_list=None, prepend_str=None)[source]#

Sorts images by date.

Parameters:
  • image_path (Path) – The image directory.

  • image_wildcard (str) – The image search wildcard.

  • date_pos (int) – The date starting position in the file name.

  • date_start (int) – The date starting position in the split.

  • date_end (int) – The date ending position in the split.

  • split_by (Optional[str]) – How to split the file name.

  • date_format (Optional[str]) – The date format for datetime.datetime.strptime().

  • file_list (Optional[list of Paths]) – A file list of names to sort. Overrides image_path.

  • prepend_str (Optional[str]) – A string to prepend to each filename.

Return type:

OrderedDict

Returns:

collections.OrderedDict

Example

>>> from pathlib import Path
>>> from geowombat.core import sort_images_by_date
>>>
>>> # image example: LC08_L1TP_176038_20190108_20190130_01_T1.tif
>>> image_path = Path('/path/to/images')
>>>
>>> image_dict = sort_images_by_date(image_path, '*.tif', 3, 0, 8)
>>> image_names = list(image_dict.keys())
>>> time_names = list(image_dict.values())
geowombat.core.subset(data, left=None, top=None, right=None, bottom=None, rows=None, cols=None, center=False, mask_corners=False)#

Subsets a DataArray.

Parameters:
  • data (DataArray) – The xarray.DataArray to subset.

  • left (Optional[float]) – The left coordinate.

  • top (Optional[float]) – The top coordinate.

  • right (Optional[float]) – The right coordinate.

  • bottom (Optional[float]) – The bottom coordinate.

  • rows (Optional[int]) – The number of output rows.

  • cols (Optional[int]) – The number of output rows.

  • center (Optional[bool]) – Whether to center the subset on left and top.

  • mask_corners (Optional[bool]) – Whether to mask corners (*requires pymorph).

Return type:

DataArray

Returns:

xarray.DataArray

Example

>>> import geowombat as gw
>>>
>>> with gw.open('image.tif', chunks=512) as ds:
>>>     ds_sub = gw.subset(
>>>         ds,
>>>         left=-263529.884,
>>>         top=953985.314,
>>>         rows=2048,
>>>         cols=2048
>>>     )
geowombat.core.tasseled_cap(data, nodata=None, sensor=None, scale_factor=None)#

Applies a tasseled cap transformation

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Return type:

DataArray

Examples

>>> import geowombat as gw
>>>
>>> with gw.config.update(sensor='qb', scale_factor=0.0001):
>>>     with gw.open('image.tif', band_names=['blue', 'green', 'red', 'nir']) as ds:
>>>         tcap = gw.tasseled_cap(ds)
Return type:

DataArray

Returns:

xarray.DataArray

Parameters:
  • data (DataArray) –

  • nodata (float | int | None) –

  • sensor (str | None) –

  • scale_factor (float | None) –

References

ASTER:

See [WS05]

CBERS2:

See [SHT11]

IKONOS:

See [Per06]

Landsat ETM+:

See [HWY+02]

Landsat OLI:

See [BZST14]

MODIS:

See [LC07]

Quickbird:

See [YEK05]

RapidEye:

See [ACG+14]

geowombat.core.to_netcdf(data, filename, overwrite=False, compute=True, *args, **kwargs)[source]#

Writes an Xarray DataArray to a NetCDF file.

Parameters:
  • data (DataArray) – The xarray.DataArray to write.

  • filename (str) – The output file name to write to.

  • overwrite (Optional[bool]) – Whether to overwrite an existing file. Default is False.

  • compute (Optinoal[bool]) – Whether to compute and write to filename. Otherwise, return the dask task graph. Default is True.

  • args (DataArray) – Additional DataArrays to stack.

  • kwargs (dict) – Encoding arguments.

Returns:

None, writes to filename

Examples

>>> import geowombat as gw
>>> import xarray as xr
>>>
>>> # Write a single DataArray to a .nc file
>>> with gw.config.update(sensor='l7'):
>>>     with gw.open('LC08_L1TP_225078_20200219_20200225_01_T1.tif') as src:
>>>         gw.to_netcdf(src, 'filename.nc', zlib=True, complevel=5)
>>>
>>> # Add extra layers
>>> with gw.config.update(sensor='l7'):
>>>     with gw.open(
>>>         'LC08_L1TP_225078_20200219_20200225_01_T1.tif'
>>>     ) as src, gw.open(
>>>         'LC08_L1TP_225078_20200219_20200225_01_T1_angles.tif',
>>>         band_names=['zenith', 'azimuth']
>>>     ) as ang:
>>>         src = (
>>>             xr.where(
>>>                 src == 0, -32768, src
>>>             )
>>>             .astype('int16')
>>>             .assign_attrs(**src.attrs)
>>>         )
>>>
>>>         gw.to_netcdf(
>>>             src,
>>>             'filename.nc',
>>>             ang.astype('int16'),
>>>             zlib=True,
>>>             complevel=5,
>>>             _FillValue=-32768
>>>         )
>>>
>>> # Open the data and convert to a DataArray
>>> with xr.open_dataset(
>>>     'filename.nc', engine='h5netcdf', chunks=256
>>> ) as ds:
>>>     src = ds.to_array(dim='band')
geowombat.core.to_raster(data, filename, readxsize=None, readysize=None, separate=False, out_block_type='gtiff', keep_blocks=False, verbose=0, overwrite=False, gdal_cache=512, scheduler='mpool', n_jobs=1, n_workers=None, n_threads=None, n_chunks=None, padding=None, tags=None, tqdm_kwargs=None, **kwargs)[source]#

Writes a dask array to a raster file.

Note

We advise using save() in place of this method.

Parameters:
  • data (DataArray) – The xarray.DataArray to write.

  • filename (str) – The output file name to write to.

  • readxsize (Optional[int]) – The size of column chunks to read. If not given, readxsize defaults to Dask chunk size.

  • readysize (Optional[int]) – The size of row chunks to read. If not given, readysize defaults to Dask chunk size.

  • separate (Optional[bool]) – Whether to write blocks as separate files. Otherwise, write to a single file.

  • out_block_type (Optional[str]) – The output block type. Choices are [‘gtiff’, ‘zarr’]. Only used if separate = True.

  • keep_blocks (Optional[bool]) – Whether to keep the blocks stored on disk. Only used if separate = True.

  • verbose (Optional[int]) – The verbosity level.

  • overwrite (Optional[bool]) – Whether to overwrite an existing file.

  • gdal_cache (Optional[int]) – The GDAL cache size (in MB).

  • scheduler (Optional[str]) –

    The parallel task scheduler to use. Choices are [‘processes’, ‘threads’, ‘mpool’].

    mpool: process pool of workers using multiprocessing.Pool processes: process pool of workers using concurrent.futures threads: thread pool of workers using concurrent.futures

  • n_jobs (Optional[int]) – The total number of parallel jobs.

  • n_workers (Optional[int]) – The number of process workers.

  • n_threads (Optional[int]) – The number of thread workers.

  • n_chunks (Optional[int]) – The chunk size of windows. If not given, equal to n_workers x 50.

  • overviews (Optional[bool or list]) – Whether to build overview layers.

  • resampling (Optional[str]) – The resampling method for overviews when overviews is True or a list. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].

  • padding (Optional[tuple]) – Padding for each window. padding should be given as a tuple of (left pad, bottom pad, right pad, top pad). If padding is given, the returned list will contain a tuple of rasterio.windows.Window objects as (w1, w2), where w1 contains the normal window offsets and w2 contains the padded window offsets.

  • tags (Optional[dict]) – Image tags to write to file.

  • tqdm_kwargs (Optional[dict]) – Additional keyword arguments to pass to tqdm.

  • kwargs (Optional[dict]) – Additional keyword arguments to pass to rasterio.write.

Returns:

None, writes to filename

Examples

>>> import geowombat as gw
>>>
>>> # Use 8 parallel workers
>>> with gw.open('input.tif') as ds:
>>>     gw.to_raster(ds, 'output.tif', n_jobs=8)
>>>
>>> # Use 4 process workers and 2 thread workers
>>> with gw.open('input.tif') as ds:
>>>     gw.to_raster(ds, 'output.tif', n_workers=4, n_threads=2)
>>>
>>> # Control the window chunks passed to concurrent.futures
>>> with gw.open('input.tif') as ds:
>>>     gw.to_raster(ds, 'output.tif', n_workers=4, n_threads=2, n_chunks=16)
>>>
>>> # Compress the output and build overviews
>>> with gw.open('input.tif') as ds:
>>>     gw.to_raster(ds, 'output.tif', n_jobs=8, overviews=True, compress='lzw')
geowombat.core.to_vrt(data, filename, overwrite=False, resampling=None, nodata=None, init_dest_nodata=True, warp_mem_limit=128)[source]#

Writes a file to a VRT file.

Parameters:
  • data (DataArray) – The xarray.DataArray to write.

  • filename (str) – The output file name to write to.

  • overwrite (Optional[bool]) – Whether to overwrite an existing VRT file.

  • resampling (Optional[object]) – The resampling algorithm for rasterio.vrt.WarpedVRT. Default is ‘nearest’.

  • nodata (Optional[float or int]) – The ‘no data’ value for rasterio.vrt.WarpedVRT.

  • init_dest_nodata (Optional[bool]) – Whether or not to initialize output to nodata for rasterio.vrt.WarpedVRT.

  • warp_mem_limit (Optional[int]) – The GDAL memory limit for rasterio.vrt.WarpedVRT.

Returns:

None, writes to filename

Examples

>>> import geowombat as gw
>>> from rasterio.enums import Resampling
>>>
>>> # Transform a CRS and save to VRT
>>> with gw.config.update(ref_crs=102033):
>>>     with gw.open('image.tif') as src:
>>>         gw.to_vrt(
>>>             src,
>>>             'output.vrt',
>>>             resampling=Resampling.cubic,
>>>             warp_mem_limit=256
>>>         )
>>>
>>> # Load multiple files set to a common geographic extent
>>> bounds = (left, bottom, right, top)
>>> with gw.config.update(ref_bounds=bounds):
>>>     with gw.open(
>>>         ['image1.tif', 'image2.tif'], mosaic=True
>>>     ) as src:
>>>         gw.to_vrt(src, 'output.vrt')
geowombat.core.transform_crs(data_src, dst_crs=None, dst_res=None, dst_width=None, dst_height=None, dst_bounds=None, src_nodata=None, dst_nodata=None, coords_only=False, resampling='nearest', warp_mem_limit=512, num_threads=1)[source]#

Transforms a DataArray to a new coordinate reference system.

Parameters:
  • data_src (DataArray) – The data to transform.

  • dst_crs (Optional[CRS | int | dict | str]) – The destination CRS.

  • dst_res (Optional[tuple]) – The destination resolution.

  • dst_width (Optional[int]) – The destination width. Cannot be used with dst_res.

  • dst_height (Optional[int]) – The destination height. Cannot be used with dst_res.

  • dst_bounds (Optional[BoundingBox | tuple]) – The destination bounds, as a rasterio.coords.BoundingBox or as a tuple of (left, bottom, right, top).

  • src_nodata (Optional[int | float]) – The source nodata value. Pixels with this value will not be used for interpolation. If not set, it will default to the nodata value of the source image if a masked ndarray or rasterio band, if available.

  • dst_nodata (Optional[int | float]) – The nodata value used to initialize the destination; it will remain in all areas not covered by the reprojected source. Defaults to the nodata value of the destination image (if set), the value of src_nodata, or 0 (GDAL default).

  • coords_only (Optional[bool]) – Whether to return transformed coordinates. If coords_only = True then the array is not warped and the size is unchanged. It also avoids in-memory computations.

  • resampling (Optional[str]) – The resampling method if filename is a list. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].

  • warp_mem_limit (Optional[int]) – The warp memory limit.

  • num_threads (Optional[int]) – The number of parallel threads.

Returns:

xarray.DataArray

geowombat.core.wi(data, nodata=None, mask=False, sensor=None, scale_factor=None)#

Calculates the woody vegetation index

Parameters:
  • data (DataArray) – The xarray.DataArray to process.

  • nodata (Optional[int or float]) – A ‘no data’ value to fill NAs with. If None, the ‘no data’ value is taken from the xarray.DataArray attributes.

  • mask (Optional[bool]) – Whether to mask the results.

  • sensor (Optional[str]) – The data’s sensor. If None, the band names should reflect the index being calculated.

  • scale_factor (Optional[float]) – A scale factor to apply to the data. If None, the scale value is taken from the xarray.DataArray attributes.

Equation:

\[WI = \Biggl \lbrace { 0,\text{ if } { red + SWIR1 \ge 0.5 } \atop 1 - \frac{red + SWIR1}{0.5}, \text{ otherwise } }\]
Reference:

See [LWC+13]

Returns:

Data range: 0 to 1

Return type:

xarray.DataArray

geowombat.core.xy_to_lonlat(x, y, dst_crs)#

Converts from native map coordinates to longitude and latitude.

Parameters:
  • x (float) – The x coordinate to convert.

  • y (float) – The y coordinate to convert.

  • dst_crs (str, object, or DataArray) – The CRS to transform to. It can be provided as a string, a CRS instance (e.g., pyproj.crs.CRS), or a geowombat.DataArray.

Returns:

(longitude, latitude)

Return type:

tuple

Example

>>> import geowombat as gw
>>> from geowombat.core import xy_to_lonlat
>>>
>>> x, y = 643944.6956113526, 7183104.984484519
>>>
>>> with gw.open('image.tif') as src:
>>>     lon, lat = xy_to_lonlat(x, y, src)