Raster I/O#

File opening with geowombat uses the geowombat.open() function to open raster files.

# Import GeoWombat
In [1]: import geowombat as gw

# Load image names
In [2]: from geowombat.data import l8_224077_20200518_B2, l8_224077_20200518_B3, l8_224077_20200518_B4

In [3]: from geowombat.data import l8_224078_20200518_B2, l8_224078_20200518_B3, l8_224078_20200518_B4, l8_224078_20200518

In [4]: from pathlib import Path

In [5]: import matplotlib.pyplot as plt

In [6]: import matplotlib.patheffects as pe

To open individual images, geowombat uses xarray.open_rasterio() and rasterio.vrt.WarpedVRT.

In [7]: fig, ax = plt.subplots(dpi=200)

In [8]: with gw.open(l8_224078_20200518) as src:
   ...:     src.where(src != 0).sel(band=[3, 2, 1]).gw.imshow(robust=True, ax=ax)
   ...: 

In [9]: plt.tight_layout(pad=1)
_images/rgb_plot.png

Open a raster as an xarray.DataArray.

In [10]: with gw.open(l8_224078_20200518) as src:
   ....:     print(src)
   ....: 
<xarray.DataArray (band: 3, y: 1860, x: 2041)> Size: 23MB
dask.array<open_rasterio-5fde4e5bd81dff166b908415f12aeea6<this-array>, shape=(3, 1860, 2041), dtype=uint16, chunksize=(3, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * band     (band) int64 24B 1 2 3
  * x        (x) float64 16kB 7.174e+05 7.174e+05 ... 7.785e+05 7.786e+05
  * y        (y) float64 15kB -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
Attributes: (12/13)
    transform:           (30.0, 0.0, 717345.0, 0.0, -30.0, -2776995.0)
    crs:                 32621
    res:                 (30.0, 30.0)
    is_tiled:            1
    nodatavals:          (nan, nan, nan)
    _FillValue:          nan
    ...                  ...
    offsets:             (0.0, 0.0, 0.0)
    AREA_OR_POINT:       Area
    filename:            /home/docs/checkouts/readthedocs.org/user_builds/geo...
    resampling:          nearest
    _data_are_separate:  0
    _data_are_stacked:   0

Force the output data type.

In [11]: with gw.open(l8_224078_20200518, dtype='float64') as src:
   ....:     print(src.dtype)
   ....: 
float64

Specify band names.

In [12]: with gw.open(l8_224078_20200518, band_names=['blue', 'green', 'red']) as src:
   ....:     print(src.band)
   ....: 
<xarray.DataArray 'band' (band: 3)> Size: 60B
array(['blue', 'green', 'red'], dtype='<U5')
Coordinates:
  * band     (band) <U5 60B 'blue' 'green' 'red'

Use the sensor name to set band names.

In [13]: with gw.config.update(sensor='bgr'):
   ....:     with gw.open(l8_224078_20200518) as src:
   ....:         print(src.band)
   ....: 
<xarray.DataArray 'band' (band: 3)> Size: 60B
array(['blue', 'green', 'red'], dtype='<U5')
Coordinates:
  * band     (band) <U5 60B 'blue' 'green' 'red'

To open multiple images stacked by bands, use a list of files with stack_dim='band'.

Open a list of files as an xarray.DataArray, with all bands stacked.

In [14]: with gw.open([l8_224078_20200518_B2, l8_224078_20200518_B3, l8_224078_20200518_B4],
   ....:              band_names=['b', 'g', 'r'],
   ....:              stack_dim='band') as src:
   ....:     print(src)
   ....: 
<xarray.DataArray (band: 3, y: 1860, x: 2041)> Size: 23MB
dask.array<concatenate, shape=(3, 1860, 2041), dtype=uint16, chunksize=(1, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) float64 16kB 7.174e+05 7.174e+05 ... 7.785e+05 7.786e+05
  * y        (y) float64 15kB -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
  * band     (band) <U1 12B 'b' 'g' 'r'
Attributes: (12/13)
    transform:           (30.0, 0.0, 717345.0, 0.0, -30.0, -2776995.0)
    crs:                 32621
    res:                 (30.0, 30.0)
    is_tiled:            1
    nodatavals:          (nan,)
    _FillValue:          nan
    ...                  ...
    offsets:             (0.0,)
    AREA_OR_POINT:       Point
    filename:            ['LC08_L1TP_224078_20200518_20200518_01_RT_B2.TIF', ...
    resampling:          nearest
    _data_are_separate:  1
    _data_are_stacked:   1

To open multiple images as a time stack, change the input to a list of files.

Open a list of files as an xarray.DataArray.

In [15]: with gw.open([l8_224078_20200518, l8_224078_20200518],
   ....:              band_names=['blue', 'green', 'red'],
   ....:              time_names=['t1', 't2']) as src:
   ....:     print(src)
   ....: 
<xarray.DataArray (time: 2, band: 3, y: 1860, x: 2041)> Size: 46MB
dask.array<concatenate, shape=(2, 3, 1860, 2041), dtype=uint16, chunksize=(1, 3, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) float64 16kB 7.174e+05 7.174e+05 ... 7.785e+05 7.786e+05
  * y        (y) float64 15kB -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
  * time     (time) <U2 16B 't1' 't2'
  * band     (band) <U5 60B 'blue' 'green' 'red'
Attributes: (12/13)
    transform:           (30.0, 0.0, 717345.0, 0.0, -30.0, -2776995.0)
    crs:                 32621
    res:                 (30.0, 30.0)
    is_tiled:            1
    nodatavals:          (nan, nan, nan)
    _FillValue:          nan
    ...                  ...
    offsets:             (0.0, 0.0, 0.0)
    AREA_OR_POINT:       Area
    filename:            ['LC08_L1TP_224078_20200518_20200518_01_RT.TIF', 'LC...
    resampling:          nearest
    _data_are_separate:  1
    _data_are_stacked:   1

Note

Xarray will handle alignment of images of varying sizes as long as the the resolutions are “target aligned”. If images are not target aligned, Xarray might not concatenate a stack of images. With GeoWombat, we can use a context manager and a reference image to handle image alignment.

In the example below, we specify a reference image using the geowombat configuration manager.

Note

The two images in this example are identical. The point here is just to illustrate the use of the configuration manager.

# Use an image as a reference for grid alignment and CRS-handling
#
# Within the configuration context, every image
# in concat_list will conform to the reference grid.
In [16]: filenames = [l8_224078_20200518, l8_224078_20200518]

In [17]: with gw.config.update(ref_image=l8_224077_20200518_B2):
   ....:     with gw.open(
   ....:         filenames,
   ....:         band_names=['blue', 'green', 'red'],
   ....:         time_names=['t1', 't2']
   ....:     ) as src:
   ....:         print(src)
   ....: 
<xarray.DataArray (time: 2, band: 3, y: 1515, x: 2006)> Size: 36MB
dask.array<concatenate, shape=(2, 3, 1515, 2006), dtype=uint16, chunksize=(1, 3, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) float64 16kB 6.94e+05 6.94e+05 ... 7.541e+05 7.542e+05
  * y        (y) float64 12kB -2.767e+06 -2.767e+06 ... -2.812e+06 -2.812e+06
  * time     (time) <U2 16B 't1' 't2'
  * band     (band) <U5 60B 'blue' 'green' 'red'
Attributes: (12/13)
    transform:           (30.0, 0.0, 694005.0, 0.0, -30.0, -2766615.0)
    crs:                 32621
    res:                 (30.0, 30.0)
    is_tiled:            1
    nodatavals:          (nan, nan, nan)
    _FillValue:          nan
    ...                  ...
    offsets:             (0.0, 0.0, 0.0)
    filename:            ['LC08_L1TP_224078_20200518_20200518_01_RT.TIF', 'LC...
    resampling:          nearest
    AREA_OR_POINT:       Area
    _data_are_separate:  1
    _data_are_stacked:   1

In [18]: with gw.config.update(ref_image=l8_224078_20200518_B2):
   ....:     with gw.open(
   ....:         filenames,
   ....:         band_names=['blue', 'green', 'red'],
   ....:         time_names=['t1', 't2']
   ....:     ) as src:
   ....:         print(src)
   ....: 
<xarray.DataArray (time: 2, band: 3, y: 1860, x: 2041)> Size: 46MB
dask.array<concatenate, shape=(2, 3, 1860, 2041), dtype=uint16, chunksize=(1, 3, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) float64 16kB 7.174e+05 7.174e+05 ... 7.785e+05 7.786e+05
  * y        (y) float64 15kB -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
  * time     (time) <U2 16B 't1' 't2'
  * band     (band) <U5 60B 'blue' 'green' 'red'
Attributes: (12/13)
    transform:           (30.0, 0.0, 717345.0, 0.0, -30.0, -2776995.0)
    crs:                 32621
    res:                 (30.0, 30.0)
    is_tiled:            1
    nodatavals:          (nan, nan, nan)
    _FillValue:          nan
    ...                  ...
    offsets:             (0.0, 0.0, 0.0)
    AREA_OR_POINT:       Area
    filename:            ['LC08_L1TP_224078_20200518_20200518_01_RT.TIF', 'LC...
    resampling:          nearest
    _data_are_separate:  1
    _data_are_stacked:   1

Stack the intersection of all images.

In [19]: fig, ax = plt.subplots(dpi=200)

In [20]: filenames = [l8_224077_20200518_B2, l8_224078_20200518_B2]

In [21]: with gw.open(
   ....:     filenames,
   ....:     band_names=['blue'],
   ....:     mosaic=True,
   ....:     bounds_by='intersection'
   ....: ) as src:
   ....:     src.where(src != 0).sel(band='blue').gw.imshow(robust=True, ax=ax)
   ....: 

In [22]: plt.tight_layout(pad=1)
_images/blue_intersection_plot.png

Stack the union of all images.

In [23]: fig, ax = plt.subplots(dpi=200)

In [24]: filenames = [l8_224077_20200518_B2, l8_224078_20200518_B2]

In [25]: with gw.open(
   ....:     filenames,
   ....:     band_names=['blue'],
   ....:     mosaic=True,
   ....:     bounds_by='union'
   ....: ) as src:
   ....:     src.where(src != 0).sel(band='blue').gw.imshow(robust=True, ax=ax)
   ....: 

In [26]: plt.tight_layout(pad=1)
_images/blue_union_plot.png

When multiple images have matching dates, the arrays are merged into one layer.

In [27]: filenames = [l8_224077_20200518_B2, l8_224078_20200518_B2]

In [28]: band_names = ['blue']

In [29]: time_names = ['t1', 't1']

In [30]: with gw.open(filenames, band_names=band_names, time_names=time_names) as src:
   ....:     print(src)
   ....: 
<xarray.DataArray (time: 1, band: 1, y: 1515, x: 2006)> Size: 24MB
dask.array<broadcast_to, shape=(1, 1, 1515, 2006), dtype=float64, chunksize=(1, 1, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) float64 16kB 6.94e+05 6.94e+05 ... 7.541e+05 7.542e+05
  * y        (y) float64 12kB -2.767e+06 -2.767e+06 ... -2.812e+06 -2.812e+06
  * time     (time) <U2 8B 't1'
  * band     (band) <U4 16B 'blue'
Attributes: (12/14)
    scales:              (1.0,)
    offsets:             (0.0,)
    nodatavals:          (nan,)
    _FillValue:          nan
    transform:           (30.0, 0.0, 694005.0, 0.0, -30.0, -2766615.0)
    crs:                 32621
    ...                  ...
    AREA_OR_POINT:       Point
    resampling:          nearest
    geometries:          [<POLYGON ((754185 -2812065, 754185 -2766615, 694005...
    filename:            ['LC08_L1TP_224077_20200518_20200518_01_RT_B2.TIF', ...
    _data_are_separate:  1
    _data_are_stacked:   1

Image mosaicking#

Mosaic the two subsets into a single xarray.DataArray. If the images in the mosaic list have the same CRS, no configuration is needed.

In [31]: with gw.open(
   ....:     [l8_224077_20200518_B2, l8_224078_20200518_B2],
   ....:     band_names=['b'],
   ....:     mosaic=True
   ....: ) as src:
   ....:     print(src)
   ....: 
<xarray.DataArray (band: 1, y: 1515, x: 2006)> Size: 24MB
dask.array<where, shape=(1, 1515, 2006), dtype=float64, chunksize=(1, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) float64 16kB 6.94e+05 6.94e+05 ... 7.541e+05 7.542e+05
  * y        (y) float64 12kB -2.767e+06 -2.767e+06 ... -2.812e+06 -2.812e+06
  * band     (band) <U1 4B 'b'
Attributes: (12/13)
    scales:              (1.0,)
    offsets:             (0.0,)
    nodatavals:          (nan,)
    _FillValue:          nan
    transform:           (30.0, 0.0, 694005.0, 0.0, -30.0, -2766615.0)
    crs:                 32621
    ...                  ...
    is_tiled:            1
    AREA_OR_POINT:       Point
    resampling:          nearest
    geometries:          [<POLYGON ((754185 -2812065, 754185 -2766615, 694005...
    _data_are_separate:  1
    _data_are_stacked:   0

If the images in the mosaic list have different CRSs, use a context manager to warp to a common grid.

Note

The two images in this example have the same CRS. The point here is just to illustrate the use of the configuration manager.

# Use a reference CRS
In [32]: with gw.config.update(ref_image=l8_224077_20200518_B2):
   ....:     with gw.open(
   ....:         [l8_224077_20200518_B2, l8_224078_20200518_B2],
   ....:         band_names=['b'],
   ....:         mosaic=True
   ....:     ) as src:
   ....:         print(src)
   ....: 
<xarray.DataArray (band: 1, y: 1515, x: 2006)> Size: 24MB
dask.array<where, shape=(1, 1515, 2006), dtype=float64, chunksize=(1, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) float64 16kB 6.94e+05 6.94e+05 ... 7.541e+05 7.542e+05
  * y        (y) float64 12kB -2.767e+06 -2.767e+06 ... -2.812e+06 -2.812e+06
  * band     (band) <U1 4B 'b'
Attributes: (12/13)
    scales:              (1.0,)
    offsets:             (0.0,)
    nodatavals:          (nan,)
    _FillValue:          nan
    transform:           (30.0, 0.0, 694005.0, 0.0, -30.0, -2766615.0)
    crs:                 32621
    ...                  ...
    is_tiled:            1
    AREA_OR_POINT:       Point
    resampling:          nearest
    geometries:          [<POLYGON ((754185 -2812065, 754185 -2766615, 694005...
    _data_are_separate:  1
    _data_are_stacked:   0

Setup a plot function#

In [33]: def plot(bounds_by, ref_image=None, cmap='viridis'):
   ....:     fig, ax = plt.subplots(figsize=(10, 7), dpi=200)
   ....:     with gw.config.update(ref_image=ref_image):
   ....:         with gw.open(
   ....:             [l8_224077_20200518_B4, l8_224078_20200518_B4],
   ....:             band_names=['nir'],
   ....:             chunks=256,
   ....:             mosaic=True,
   ....:             bounds_by=bounds_by,
   ....:             persist_filenames=True
   ....:         ) as srca:
   ....:             srca.where(srca != 0).sel(band='nir').gw.imshow(robust=True, cbar_kwargs={'label': 'DN'}, ax=ax)
   ....:             srca.gw.chunk_grid.plot(color='none', edgecolor='k', ls='-', lw=0.5, ax=ax)
   ....:             srca.gw.footprint_grid.plot(color='none', edgecolor='orange', lw=2, ax=ax)
   ....:             for row in srca.gw.footprint_grid.itertuples(index=False):
   ....:                 ax.scatter(
   ....:                     row.geometry.centroid.x,
   ....:                     row.geometry.centroid.y,
   ....:                     s=50, color='red', edgecolor='white', lw=1
   ....:                 )
   ....:                 ax.annotate(
   ....:                     row.footprint.replace('.TIF', ''),
   ....:                     (row.geometry.centroid.x, row.geometry.centroid.y),
   ....:                     color='black',
   ....:                     size=8,
   ....:                     ha='center',
   ....:                     va='center',
   ....:                     path_effects=[pe.withStroke(linewidth=1, foreground='white')]
   ....:                 )
   ....:             ax.set_ylim(
   ....:                 srca.gw.footprint_grid.total_bounds[1]-10,
   ....:                 srca.gw.footprint_grid.total_bounds[3]+10
   ....:             )
   ....:             ax.set_xlim(
   ....:                 srca.gw.footprint_grid.total_bounds[0]-10,
   ....:                 srca.gw.footprint_grid.total_bounds[2]+10
   ....:             )
   ....:     title = f'Image {bounds_by}' if bounds_by else str(Path(ref_image).name.split('.')[0]) + ' as reference'
   ....:     size = 12 if bounds_by else 8
   ....:     ax.set_title(title, size=size)
   ....: 

Mosaic by the union of images#

The two plots below illustrate how two images can be mosaicked. The orange grids highlight the image footprints while the black grids illustrate the xarray.DataArray chunks.

In [34]: plot('union')
_images/union_example.png
In [35]: plot('intersection')
_images/intersection_example.png
In [36]: plot(None, l8_224077_20200518_B4)
_images/ref1_example.png
In [37]: plot(None, l8_224078_20200518_B4)
_images/ref2_example.png

Writing DataArrays to file#

GeoWombat’s file writing can be accessed through the geowombat to_vrt(), to_raster(), and save() functions. These functions use rasterio.write() and Dask.array.store() as I/O backends. In the examples below, src is an xarray.DataArray with the necessary transform and coordinate reference system (CRS) information to write to an image file.

Note

Should I use geowombat.to_raster() or geowombat.save() when writing a raster file? First, a bit of background.

In the early days of geowombat development, direct computation calls using dask (more on that with geowombat.save()) were tested on large raster files (i.e., width and height on the order of tens of thousands). It was determined that the overhead of generating the dask task graph was too large and outweighed the actual computation. To address this, the geowombat.to_raster() method was designed to iterate over raster chunks/blocks using concurrent.futures, reading and computing each block when requested. This removed any large overhead but also negated the efficiency of dask as the underlying ‘delayed’ array. The geowombat.to_raster() can be used on data of any size, but comes with its own overhead. For example, when working with arrays that fit into memory, such as a standard satellite scene, dask (i.e., geowombat.save()) works quite well. To give an example, instead of slicing an xarray.DataArray chunk and writing/computing that chunk (i.e., geowombat.to_raster() approach), we can also compute the entire xarray.DataArray using dask (i.e., geowombat.save()) and let dask handle the concurrency. This is where geowombat.save() comes in to play. The geowombat.save() method (or also xarray.DataArray.gw.save()) submits the data to Dask.array.store() and each chunk is written to file using rasterio.

The recommended method to use for saving raster files is geowombat.save(). We welcome feedback for both methods, particularly if geowombat.save() is determined to be more efficient than geowombat.to_raster(), regardless of the data size.

Write to a raster file using Dask and geowombat.save()#

import geowombat as gw

with gw.open(l8_224077_20200518_B4, chunks=1024) as src:
    # Xarray drops attributes
    attrs = src.attrs.copy()
    # Apply operations on the DataArray
    src = (src * 10.0).assign_attrs(**attrs)
    # Write the data to a GeoTiff
    src.gw.save(
        'output.tif',
        num_workers=4  # these workers are used as Dask.compute(num_workers=num_workers)
    )

Write to a raster file using concurrent.futures and geowombat.to_raster()#

import geowombat as gw

with gw.open(l8_224077_20200518_B4, chunks=1024) as src:
    # Xarray drops attributes
    attrs = src.attrs.copy()
    # Apply operations on the DataArray
    src = (src * 10.0).assign_attrs(**attrs)
    # Write the data to a GeoTiff
    src.gw.to_raster(
        'output.tif',
        verbose=1,
        n_workers=4,    # number of process workers sent to ``concurrent.futures``
        n_threads=2,    # number of thread workers sent to ``dask.compute``
        n_chunks=200    # number of window chunks to send as concurrent futures
    )

Write to a VRT file#

The GDAL VRT file format is a nice way to save data to file as a lightweight pointer to data on disk. A VRT file is an XML file that contains information about the image file, or files, needed to transform and display data (e.g., in a GIS).

Note

GeoWombat saves data to a VRT by re-opening the DataArray file or files (using rasterio.vrt.WarpedVRT) and borrowing the DataArray attributes needed to correctly save the data. Therefore, because we cannot currently pass a DataArray directly to the rasterio VRT warper, any warping already applied using geowombat (e.g., with geowombat.config.update) would be duplicated when writing the data to a VRT.

import geowombat as gw

# Transform the data to lat/lon
with gw.config.update(ref_crs=4326):
    with gw.open(l8_224077_20200518_B4, chunks=1024) as src:
        # Write the data to a VRT
        src.gw.to_vrt('lat_lon_file.vrt')

See Distributed processing for more examples describing concurrent file writing with GeoWombat.