Editing rasters#

Setting ‘no data’ values#

By default, geowombat (using rasterio and xarray) will load the ‘no data’ value from the file metadata, if it is available. For example:

In [1]: import geowombat as gw

In [2]: from geowombat.data import l8_224078_20200518

In [3]: with gw.open(l8_224078_20200518) as src:
   ...:     print(src)
   ...: 
<xarray.DataArray (band: 3, y: 1860, x: 2041)> Size: 23MB
dask.array<open_rasterio-5fde4e5bd81dff166b908415f12aeea6<this-array>, shape=(3, 1860, 2041), dtype=uint16, chunksize=(3, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * band     (band) int64 24B 1 2 3
  * x        (x) float64 16kB 7.174e+05 7.174e+05 ... 7.785e+05 7.786e+05
  * y        (y) float64 15kB -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
Attributes: (12/13)
    transform:           (30.0, 0.0, 717345.0, 0.0, -30.0, -2776995.0)
    crs:                 32621
    res:                 (30.0, 30.0)
    is_tiled:            1
    nodatavals:          (nan, nan, nan)
    _FillValue:          nan
    ...                  ...
    offsets:             (0.0, 0.0, 0.0)
    AREA_OR_POINT:       Area
    filename:            /home/docs/checkouts/readthedocs.org/user_builds/geo...
    resampling:          nearest
    _data_are_separate:  0
    _data_are_stacked:   0

Note the xarray.DataArray attributes nodatavals and _FillValue. The former, nodatavals, is geowombat.backends.xarray_rasterio_.open_rasterio() (originally from xarray.open_rasterio()) convention. This attribute is a tuple of length DataArray.gw.nbands, describing the ‘no data’ value for each band. Typically, satellite imagery will have the same ‘no data’ value across all bands. The other ‘no data’ attribute, _FillValue, is an attribute used by xarray.open_dataset() to flag ‘no data’ values. This attribute is an int or float. We store both attributes when opening data.

We can see in the opened image that the ‘no data’ value is nan (i.e., ‘nodatavals’ = (nan, nan, nan) and _FillValue = nan).

In [4]: import geowombat as gw

In [5]: from geowombat.data import l8_224078_20200518

In [6]: with gw.open(l8_224078_20200518) as src:
   ...:     print('dtype =', src.dtype)
   ...:     print(src.squeeze().values[0])
   ...: 
dtype = uint16
[[   0    0    0 ...    0    0    0]
 [   0    0    0 ...    0    0    0]
 [   0    0    0 ...    0    0    0]
 ...
 [7692 7518 7513 ... 7440 7432 7415]
 [7586 7590 7610 ... 7440 7411 7425]
 [7576 7743 7770 ... 7464 7443 7406]]

However, nan being set as the ‘no data’ is actually an error because this particular raster file does not contain information about ‘no data’ values. If there is no existing ‘no data’ information, rasterio will set ‘no data’ as nan. In this image, nans do not exist, and we can see that because the dtype is ‘uint16’, whereas nans require data as floating point numbers.

Let’s save a temporary file below and specify the ‘no data’ value as 0. Then, when we open the temporary file the ‘no data’ attributes should be set as 0.

In [7]: import tempfile

In [8]: from pathlib import Path

In [9]: import geowombat as gw

In [10]: from geowombat.data import l8_224078_20200518

In [11]: with gw.open(l8_224078_20200518) as src:
   ....:     with tempfile.TemporaryDirectory() as tmp:
   ....:         tmp_file = Path(tmp) / 'tmp_raster.tif'
   ....:         src.gw.save(tmp_file, nodata=0)
   ....:         with gw.open(tmp_file) as src_nodata:
   ....:             print(src_nodata)
   ....:             print(src_nodata.squeeze().values[0])
   ....: 
<xarray.DataArray (band: 3, y: 1860, x: 2041)> Size: 23MB
dask.array<open_rasterio-b151ecddf5fb945fcab2c4abbeb7420b<this-array>, shape=(3, 1860, 2041), dtype=uint16, chunksize=(3, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * band     (band) int64 24B 1 2 3
  * x        (x) float64 16kB 7.174e+05 7.174e+05 ... 7.785e+05 7.786e+05
  * y        (y) float64 15kB -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
Attributes: (12/13)
    transform:           (30.0, 0.0, 717345.0, 0.0, -30.0, -2776995.0)
    crs:                 32621
    res:                 (30.0, 30.0)
    is_tiled:            1
    nodatavals:          (0.0, 0.0, 0.0)
    _FillValue:          0.0
    ...                  ...
    offsets:             (0.0, 0.0, 0.0)
    AREA_OR_POINT:       Area
    filename:            /tmp/tmpvpx5vfwa/tmp_raster.tif
    resampling:          nearest
    _data_are_separate:  0
    _data_are_stacked:   0
[[   0    0    0 ...    0    0    0]
 [   0    0    0 ...    0    0    0]
 [   0    0    0 ...    0    0    0]
 ...
 [7692 7518 7513 ... 7440 7432 7415]
 [7586 7590 7610 ... 7440 7411 7425]
 [7576 7743 7770 ... 7464 7443 7406]]

Note

We are not modifying any data – we are only updating the xarray.DataArray metadata. Thus, the printout of the data above reflect changes in the xarray.DataArray ‘no data’ attributes but the printed array values remained unchanged.

But what if we want to modify the ‘no data’ value when opening the file (instead of re-saving)? We can pass nodata to the opener as shown below.

In [12]: import geowombat as gw

In [13]: from geowombat.data import l8_224078_20200518

In [14]: with gw.open(l8_224078_20200518, nodata=0) as src:
   ....:     print(src)
   ....:     print(src.squeeze().values[0])
   ....: 
<xarray.DataArray (band: 3, y: 1860, x: 2041)> Size: 23MB
dask.array<open_rasterio-5fde4e5bd81dff166b908415f12aeea6<this-array>, shape=(3, 1860, 2041), dtype=uint16, chunksize=(3, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * band     (band) int64 24B 1 2 3
  * x        (x) float64 16kB 7.174e+05 7.174e+05 ... 7.785e+05 7.786e+05
  * y        (y) float64 15kB -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
Attributes: (12/13)
    transform:           (30.0, 0.0, 717345.0, 0.0, -30.0, -2776995.0)
    crs:                 32621
    res:                 (30.0, 30.0)
    is_tiled:            1
    nodatavals:          (0, 0, 0)
    _FillValue:          0
    ...                  ...
    offsets:             (0.0, 0.0, 0.0)
    AREA_OR_POINT:       Area
    filename:            /home/docs/checkouts/readthedocs.org/user_builds/geo...
    resampling:          nearest
    _data_are_separate:  0
    _data_are_stacked:   0
[[   0    0    0 ...    0    0    0]
 [   0    0    0 ...    0    0    0]
 [   0    0    0 ...    0    0    0]
 ...
 [7692 7518 7513 ... 7440 7432 7415]
 [7586 7590 7610 ... 7440 7411 7425]
 [7576 7743 7770 ... 7464 7443 7406]]

We can also set ‘no data’ using the configuration manager like:

In [15]: import geowombat as gw

In [16]: from geowombat.data import l8_224078_20200518

In [17]: with gw.config.update(nodata=0):
   ....:     with gw.open(l8_224078_20200518) as src:
   ....:         print(src)
   ....:         print(src.squeeze().values[0])
   ....: 
<xarray.DataArray (band: 3, y: 1860, x: 2041)> Size: 23MB
dask.array<open_rasterio-5fde4e5bd81dff166b908415f12aeea6<this-array>, shape=(3, 1860, 2041), dtype=uint16, chunksize=(3, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * band     (band) int64 24B 1 2 3
  * x        (x) float64 16kB 7.174e+05 7.174e+05 ... 7.785e+05 7.786e+05
  * y        (y) float64 15kB -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
Attributes: (12/13)
    transform:           (30.0, 0.0, 717345.0, 0.0, -30.0, -2776995.0)
    crs:                 32621
    res:                 (30.0, 30.0)
    is_tiled:            1
    nodatavals:          (0, 0, 0)
    _FillValue:          0
    ...                  ...
    offsets:             (0.0, 0.0, 0.0)
    AREA_OR_POINT:       Area
    filename:            /home/docs/checkouts/readthedocs.org/user_builds/geo...
    resampling:          nearest
    _data_are_separate:  0
    _data_are_stacked:   0
[[   0    0    0 ...    0    0    0]
 [   0    0    0 ...    0    0    0]
 [   0    0    0 ...    0    0    0]
 ...
 [7692 7518 7513 ... 7440 7432 7415]
 [7586 7590 7610 ... 7440 7411 7425]
 [7576 7743 7770 ... 7464 7443 7406]]

Masking ‘no data’ values#

As mentioned above, the array data are not automatically modified by the ‘no data’ value. If we want to mask our ‘no data’ values (i.e., exclude them from any calculations), we simply need to convert the array values to nans. GeoWombat provides a method called xarray.DataArray.gw.mask_nodata() to do this that uses the metadata.

In [18]: import geowombat as gw

In [19]: from geowombat.data import l8_224078_20200518

In [20]: with gw.open(l8_224078_20200518, nodata=0) as src:
   ....:     print('No masking:')
   ....:     print(src.sel(band=1).values)
   ....:     print("\n'No data' values masked:")
   ....:     print(src.gw.mask_nodata().sel(band=1).values)
   ....: 
No masking:
[[   0    0    0 ...    0    0    0]
 [   0    0    0 ...    0    0    0]
 [   0    0    0 ...    0    0    0]
 ...
 [7692 7518 7513 ... 7440 7432 7415]
 [7586 7590 7610 ... 7440 7411 7425]
 [7576 7743 7770 ... 7464 7443 7406]]

'No data' values masked:
[[  nan   nan   nan ...   nan   nan   nan]
 [  nan   nan   nan ...   nan   nan   nan]
 [  nan   nan   nan ...   nan   nan   nan]
 ...
 [7692. 7518. 7513. ... 7440. 7432. 7415.]
 [7586. 7590. 7610. ... 7440. 7411. 7425.]
 [7576. 7743. 7770. ... 7464. 7443. 7406.]]

The xarray.DataArray.gw.mask_nodata() function uses xarray.DataArray.where() logic, as demonstrated by the example below.

import geowombat as gw
from geowombat.data import l8_224078_20200518

# Zeros are replaced with nans
with gw.open(l8_224078_20200518) as src:
    data = src.where(src != 0)

Setting ‘no data’ values with scaling#

In geowombat, we use xarray.DataArray.where() along with optional scaling in the xarray.DataArray.gw.set_nodata() function. In this example, we set zeros as nan and scale all other values from a [0,10000] range to [0,1] (i.e., x 1e-4).

In [21]: import geowombat as gw

In [22]: from geowombat.data import l8_224078_20200518

In [23]: import numpy as np

# Set the 'no data' value and scale all other values
In [24]: with gw.open(l8_224078_20200518, dtype='float64') as src:
   ....:     print(src.sel(band=1).values)
   ....:     data = src.gw.set_nodata(
   ....:         src_nodata=0, dst_nodata=np.nan, dtype='float64', scale_factor=1e-4
   ....:     )
   ....:     print(data.sel(band=1).values)
   ....: 
[[   0.    0.    0. ...    0.    0.    0.]
 [   0.    0.    0. ...    0.    0.    0.]
 [   0.    0.    0. ...    0.    0.    0.]
 ...
 [7692. 7518. 7513. ... 7440. 7432. 7415.]
 [7586. 7590. 7610. ... 7440. 7411. 7425.]
 [7576. 7743. 7770. ... 7464. 7443. 7406.]]
[[   nan    nan    nan ...    nan    nan    nan]
 [   nan    nan    nan ...    nan    nan    nan]
 [   nan    nan    nan ...    nan    nan    nan]
 ...
 [0.7692 0.7518 0.7513 ... 0.744  0.7432 0.7415]
 [0.7586 0.759  0.761  ... 0.744  0.7411 0.7425]
 [0.7576 0.7743 0.777  ... 0.7464 0.7443 0.7406]]

Replace values#

The xarray.DataArray.gw.replace() function mimics pandas.DataFrame.replace().

import geowombat as gw
from geowombat.data import l8_224078_20200518

# Replace 1 with 10
with gw.open(l8_224078_20200518) as src:
    data = src.gw.replace({1: 10})

Note

The xarray.DataArray.gw.replace() function is typically used with thematic data.