Editing rasters#
Setting ‘no data’ values#
By default, geowombat
(using rasterio
and xarray
) will load the ‘no data’ value from the
file metadata, if it is available. For example:
In [1]: import geowombat as gw
In [2]: from geowombat.data import l8_224078_20200518
In [3]: with gw.open(l8_224078_20200518) as src:
...: print(src)
...:
<xarray.DataArray (band: 3, y: 1860, x: 2041)> Size: 23MB
dask.array<open_rasterio-5fde4e5bd81dff166b908415f12aeea6<this-array>, shape=(3, 1860, 2041), dtype=uint16, chunksize=(3, 256, 256), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 24B 1 2 3
* x (x) float64 16kB 7.174e+05 7.174e+05 ... 7.785e+05 7.786e+05
* y (y) float64 15kB -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
Attributes: (12/13)
transform: (30.0, 0.0, 717345.0, 0.0, -30.0, -2776995.0)
crs: 32621
res: (30.0, 30.0)
is_tiled: 1
nodatavals: (nan, nan, nan)
_FillValue: nan
... ...
offsets: (0.0, 0.0, 0.0)
AREA_OR_POINT: Area
filename: /home/docs/checkouts/readthedocs.org/user_builds/geo...
resampling: nearest
_data_are_separate: 0
_data_are_stacked: 0
Note the xarray.DataArray
attributes nodatavals and _FillValue. The former, nodatavals, is
geowombat.backends.xarray_rasterio_.open_rasterio()
(originally from xarray.open_rasterio()
)
convention. This attribute is a tuple
of length DataArray.gw.nbands
, describing the ‘no data’ value
for each band. Typically, satellite imagery will have the same ‘no data’ value across all bands. The other
‘no data’ attribute, _FillValue, is an attribute used by xarray.open_dataset()
to flag ‘no data’ values.
This attribute is an int
or float
. We store both attributes when opening data.
We can see in the opened image that the ‘no data’ value is nan
(i.e., ‘nodatavals’ = (nan
, nan
, nan
)
and _FillValue = nan
).
In [4]: import geowombat as gw
In [5]: from geowombat.data import l8_224078_20200518
In [6]: with gw.open(l8_224078_20200518) as src:
...: print('dtype =', src.dtype)
...: print(src.squeeze().values[0])
...:
dtype = uint16
[[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
...
[7692 7518 7513 ... 7440 7432 7415]
[7586 7590 7610 ... 7440 7411 7425]
[7576 7743 7770 ... 7464 7443 7406]]
However, nan
being set as the ‘no data’ is actually an error because this particular raster file does not
contain information about ‘no data’ values. If there is no existing ‘no data’ information, rasterio
will
set ‘no data’ as nan
. In this image, nans
do not exist, and we can see that because the dtype is
‘uint16’, whereas nans
require data as floating point numbers.
Let’s save a temporary file below and specify the ‘no data’ value as 0. Then, when we open the temporary file the ‘no data’ attributes should be set as 0.
In [7]: import tempfile
In [8]: from pathlib import Path
In [9]: import geowombat as gw
In [10]: from geowombat.data import l8_224078_20200518
In [11]: with gw.open(l8_224078_20200518) as src:
....: with tempfile.TemporaryDirectory() as tmp:
....: tmp_file = Path(tmp) / 'tmp_raster.tif'
....: src.gw.save(tmp_file, nodata=0)
....: with gw.open(tmp_file) as src_nodata:
....: print(src_nodata)
....: print(src_nodata.squeeze().values[0])
....:
<xarray.DataArray (band: 3, y: 1860, x: 2041)> Size: 23MB
dask.array<open_rasterio-b151ecddf5fb945fcab2c4abbeb7420b<this-array>, shape=(3, 1860, 2041), dtype=uint16, chunksize=(3, 256, 256), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 24B 1 2 3
* x (x) float64 16kB 7.174e+05 7.174e+05 ... 7.785e+05 7.786e+05
* y (y) float64 15kB -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
Attributes: (12/13)
transform: (30.0, 0.0, 717345.0, 0.0, -30.0, -2776995.0)
crs: 32621
res: (30.0, 30.0)
is_tiled: 1
nodatavals: (0.0, 0.0, 0.0)
_FillValue: 0.0
... ...
offsets: (0.0, 0.0, 0.0)
AREA_OR_POINT: Area
filename: /tmp/tmpvpx5vfwa/tmp_raster.tif
resampling: nearest
_data_are_separate: 0
_data_are_stacked: 0
[[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
...
[7692 7518 7513 ... 7440 7432 7415]
[7586 7590 7610 ... 7440 7411 7425]
[7576 7743 7770 ... 7464 7443 7406]]
Note
We are not modifying any data – we are only updating the xarray.DataArray
metadata. Thus, the printout of
the data above reflect changes in the xarray.DataArray
‘no data’ attributes but the printed array values
remained unchanged.
But what if we want to modify the ‘no data’ value when opening the file (instead of re-saving)? We can pass nodata to the opener as shown below.
In [12]: import geowombat as gw
In [13]: from geowombat.data import l8_224078_20200518
In [14]: with gw.open(l8_224078_20200518, nodata=0) as src:
....: print(src)
....: print(src.squeeze().values[0])
....:
<xarray.DataArray (band: 3, y: 1860, x: 2041)> Size: 23MB
dask.array<open_rasterio-5fde4e5bd81dff166b908415f12aeea6<this-array>, shape=(3, 1860, 2041), dtype=uint16, chunksize=(3, 256, 256), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 24B 1 2 3
* x (x) float64 16kB 7.174e+05 7.174e+05 ... 7.785e+05 7.786e+05
* y (y) float64 15kB -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
Attributes: (12/13)
transform: (30.0, 0.0, 717345.0, 0.0, -30.0, -2776995.0)
crs: 32621
res: (30.0, 30.0)
is_tiled: 1
nodatavals: (0, 0, 0)
_FillValue: 0
... ...
offsets: (0.0, 0.0, 0.0)
AREA_OR_POINT: Area
filename: /home/docs/checkouts/readthedocs.org/user_builds/geo...
resampling: nearest
_data_are_separate: 0
_data_are_stacked: 0
[[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
...
[7692 7518 7513 ... 7440 7432 7415]
[7586 7590 7610 ... 7440 7411 7425]
[7576 7743 7770 ... 7464 7443 7406]]
We can also set ‘no data’ using the configuration manager like:
In [15]: import geowombat as gw
In [16]: from geowombat.data import l8_224078_20200518
In [17]: with gw.config.update(nodata=0):
....: with gw.open(l8_224078_20200518) as src:
....: print(src)
....: print(src.squeeze().values[0])
....:
<xarray.DataArray (band: 3, y: 1860, x: 2041)> Size: 23MB
dask.array<open_rasterio-5fde4e5bd81dff166b908415f12aeea6<this-array>, shape=(3, 1860, 2041), dtype=uint16, chunksize=(3, 256, 256), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 24B 1 2 3
* x (x) float64 16kB 7.174e+05 7.174e+05 ... 7.785e+05 7.786e+05
* y (y) float64 15kB -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
Attributes: (12/13)
transform: (30.0, 0.0, 717345.0, 0.0, -30.0, -2776995.0)
crs: 32621
res: (30.0, 30.0)
is_tiled: 1
nodatavals: (0, 0, 0)
_FillValue: 0
... ...
offsets: (0.0, 0.0, 0.0)
AREA_OR_POINT: Area
filename: /home/docs/checkouts/readthedocs.org/user_builds/geo...
resampling: nearest
_data_are_separate: 0
_data_are_stacked: 0
[[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
...
[7692 7518 7513 ... 7440 7432 7415]
[7586 7590 7610 ... 7440 7411 7425]
[7576 7743 7770 ... 7464 7443 7406]]
Masking ‘no data’ values#
As mentioned above, the array data are not automatically modified by the ‘no data’ value. If we want to
mask our ‘no data’ values (i.e., exclude them from any calculations), we simply need to convert the
array values to nans
. GeoWombat provides a method called xarray.DataArray.gw.mask_nodata()
to do this
that uses the metadata.
In [18]: import geowombat as gw
In [19]: from geowombat.data import l8_224078_20200518
In [20]: with gw.open(l8_224078_20200518, nodata=0) as src:
....: print('No masking:')
....: print(src.sel(band=1).values)
....: print("\n'No data' values masked:")
....: print(src.gw.mask_nodata().sel(band=1).values)
....:
No masking:
[[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
...
[7692 7518 7513 ... 7440 7432 7415]
[7586 7590 7610 ... 7440 7411 7425]
[7576 7743 7770 ... 7464 7443 7406]]
'No data' values masked:
[[ nan nan nan ... nan nan nan]
[ nan nan nan ... nan nan nan]
[ nan nan nan ... nan nan nan]
...
[7692. 7518. 7513. ... 7440. 7432. 7415.]
[7586. 7590. 7610. ... 7440. 7411. 7425.]
[7576. 7743. 7770. ... 7464. 7443. 7406.]]
The xarray.DataArray.gw.mask_nodata()
function uses xarray.DataArray.where()
logic, as
demonstrated by the example below.
import geowombat as gw
from geowombat.data import l8_224078_20200518
# Zeros are replaced with nans
with gw.open(l8_224078_20200518) as src:
data = src.where(src != 0)
Setting ‘no data’ values with scaling#
In geowombat
, we use xarray.DataArray.where()
along with optional
scaling in the xarray.DataArray.gw.set_nodata()
function. In this example, we set zeros as
nan
and scale all other values from a [0,10000] range to [0,1] (i.e., x 1e-4).
In [21]: import geowombat as gw
In [22]: from geowombat.data import l8_224078_20200518
In [23]: import numpy as np
# Set the 'no data' value and scale all other values
In [24]: with gw.open(l8_224078_20200518, dtype='float64') as src:
....: print(src.sel(band=1).values)
....: data = src.gw.set_nodata(
....: src_nodata=0, dst_nodata=np.nan, dtype='float64', scale_factor=1e-4
....: )
....: print(data.sel(band=1).values)
....:
[[ 0. 0. 0. ... 0. 0. 0.]
[ 0. 0. 0. ... 0. 0. 0.]
[ 0. 0. 0. ... 0. 0. 0.]
...
[7692. 7518. 7513. ... 7440. 7432. 7415.]
[7586. 7590. 7610. ... 7440. 7411. 7425.]
[7576. 7743. 7770. ... 7464. 7443. 7406.]]
[[ nan nan nan ... nan nan nan]
[ nan nan nan ... nan nan nan]
[ nan nan nan ... nan nan nan]
...
[0.7692 0.7518 0.7513 ... 0.744 0.7432 0.7415]
[0.7586 0.759 0.761 ... 0.744 0.7411 0.7425]
[0.7576 0.7743 0.777 ... 0.7464 0.7443 0.7406]]
Replace values#
The xarray.DataArray.gw.replace()
function mimics pandas.DataFrame.replace()
.
import geowombat as gw
from geowombat.data import l8_224078_20200518
# Replace 1 with 10
with gw.open(l8_224078_20200518) as src:
data = src.gw.replace({1: 10})
Note
The xarray.DataArray.gw.replace()
function is typically used with thematic data.