User functions#
User apply#
With functions that release the GIL (e.g., many NumPy functions, Cython), one can use rasterio
to write concurrently.
The example below applies a custom function concurrently over an image, where each block of data is multiplied by arg
.
Note
GeoWombat will not handle image alignment with the geowombat.apply()
function.
def my_func(w, block, arg):
return w, block * arg
import geowombat as gw
gw.apply('input.tif', 'output.tif', my_func, args=(10.0,), n_jobs=4)
User functions as DataArray attributes#
User functions that do not use a dask
task graph can be passed as attributes. Unlike the example above, the
example below has guaranteed image alignment. Functions and arguments can be passed as Xarray attributes.
Here is an example that uses one user argument.
import geowombat as gw
# Function with one argument
def user_func(block, n):
return block * n
with gw.open('input.tif') as ds:
# Functions are given as 'apply'
ds.attrs['apply'] = user_func
# Function arguments (n) are given as 'apply_args'
ds.attrs['apply_args'] = [10.0]
ds.gw.save(
'output.tif',
num_workers=2,
overwrite=True,
compress='lzw'
)
In this example, a keyword argument is also used.
# Function with one argument and one keyword argument
def user_func(block, n, divider=1.0):
return (block * n) / divider
with gw.open('input.tif') as ds:
# Functions are given as 'apply'
ds.attrs['apply'] = user_func
# Function arguments are given as 'apply_args'
# *Note that arguments should always be a list
ds.attrs['apply_args'] = [10.0]
# Function keyword arguments are given as 'apply_kwargs'
# *Note that keyword arguments should always be a dictionary
ds.attrs['apply_kwargs'] = {'divider': 2.3}
ds.gw.save(
'output.tif',
num_workers=2,
overwrite=True,
compress='lzw'
)
Applying in-memory GeoWombat functions lazily#
Several geowombat
functions execute in-memory, and are therefore not optimized for large datasets. However, these
functions can be applied at the block level for dask
-like out-of-memory processing using the user function framework.
In the example below, geowombat.polygon_to_array()
is applied at the raster block level.
import geowombat as gw
import geopandas as gpd
# Confirm that the GeoWombat function is supported for block-level lazy processing
print(hasattr(gw.polygon_to_array, 'wombat_func_'))
with gw.open('input.tif') as src:
# We can load the geometry spatial index once and pass it to the block level.
# However, be sure that the CRS matches the raster CRS.
df = gpd.gpd.read_file('vector.gpkg').to_crs(src.crs)
sindex = df.sindex
src.attrs['apply'] = gw.polygon_to_array
# All arguments must be passed as keyword arguments
src.attrs['apply_kwargs'] = {
'polygon': df,
'sindex': sindex,
'all_touched': False
}
src.gw.save(
'output.tif',
num_workers=2,
compress='lzw'
)
By default, user functions expect a NumPy array as the first argument. It might be desirable to combine a geowombat
function that operates on an xarray.DataArray
. To achieve this, we can decorate the function.
import geowombat as gw
from geowombat.core.util import lazy_wombat
@lazy_wombat
def user_func(data=None, polygon=None, sindex=None, all_touched=None):
"""Converts a polygon to an array and then masks the array"""
mask = gw.polygon_to_array(polygon=polygon, data=data, sindex=sindex, all_touched=all_touched)
return (mask * data).astype('float64')
with gw.open('input.tif') as src:
df = gpd.gpd.read_file('vector.gpkg').to_crs(src.crs)
sindex = df.sindex
src.attrs['apply'] = user_func
# All arguments must be passed as keyword arguments
src.attrs['apply_kwargs'] = {
'polygon': df,
'sindex': sindex,
'all_touched': False
}
src.gw.save(
'output.tif',
num_workers=2,
compress='lzw'
)
The above example is similar to the following with the geowombat.mask()
function.
import geowombat as gw
with gw.open('input.tif') as src:
df = gpd.gpd.read_file('vector.gpkg').to_crs(src.crs)
sindex = df.sindex
src.attrs['apply'] = gw.mask
# All arguments must be passed as keyword arguments
src.attrs['apply_kwargs'] = {
'dataframe': df,
'keep': 'in'
}
src.gw.save(
'output.tif',
num_workers=2,
compress='lzw'
)