Opening rasters#

GeoWombat’s file opening is meant to mimic Xarray and Rasterio. That is, rasters are typically opened with a context manager using the function geowombat.open(). GeoWombat uses xarray.open_rasterio to load data into an xarray.DataArray. In GeoWombat, the data are always chunked, meaning the data are always loaded as Dask arrays. As with xarray.open_rasterio, the opened DataArrays always have at least 1 band.

Opening a single image#

Opening an image with default settings looks similar to xarray.open_rasterio and rasterio.open. geowombat.open() expects a file name (str or pathlib.Path).

In [1]: import geowombat as gw

In [2]: from geowombat.data import l8_224078_20200518

In [3]: with gw.open(l8_224078_20200518) as src:
   ...:     print(src)
   ...: 
<xarray.DataArray (band: 3, y: 1860, x: 2041)> Size: 23MB
dask.array<open_rasterio-60be97b66c7cb4ca025fbefed0ddb65f<this-array>, shape=(3, 1860, 2041), dtype=uint16, chunksize=(3, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * band     (band) int64 24B 1 2 3
  * x        (x) float64 16kB 7.174e+05 7.174e+05 ... 7.785e+05 7.786e+05
  * y        (y) float64 15kB -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
Attributes: (12/13)
    transform:           (30.0, 0.0, 717345.0, 0.0, -30.0, -2776995.0)
    crs:                 32621
    res:                 (30.0, 30.0)
    is_tiled:            1
    nodatavals:          (nan, nan, nan)
    _FillValue:          nan
    ...                  ...
    offsets:             (0.0, 0.0, 0.0)
    filename:            /home/docs/checkouts/readthedocs.org/user_builds/geo...
    resampling:          nearest
    AREA_OR_POINT:       Area
    _data_are_separate:  0
    _data_are_stacked:   0

In the example above, src is an xarray.DataArray. Thus, printing the object will display the underlying dask.array.Array dimensions and chunks, the xarray.DataArray named coordinates, and the xarray.DataArray attributes.

Opening multiple bands as a stack#

Often, satellite bands will be stored in separate raster files. To open the files as one xarray.DataArray, specify a list instead of a file name.

In [4]: from geowombat.data import l8_224078_20200518_B2, l8_224078_20200518_B3, l8_224078_20200518_B4

In [5]: with gw.open([l8_224078_20200518_B2, l8_224078_20200518_B3, l8_224078_20200518_B4]) as src:
   ...:     print(src)
   ...: 
<xarray.DataArray (time: 3, band: 1, y: 1860, x: 2041)> Size: 23MB
dask.array<concatenate, shape=(3, 1, 1860, 2041), dtype=uint16, chunksize=(1, 1, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * band     (band) int64 8B 1
  * x        (x) float64 16kB 7.174e+05 7.174e+05 ... 7.785e+05 7.786e+05
  * y        (y) float64 15kB -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
  * time     (time) int64 24B 1 2 3
Attributes: (12/13)
    transform:           (30.0, 0.0, 717345.0, 0.0, -30.0, -2776995.0)
    crs:                 32621
    res:                 (30.0, 30.0)
    is_tiled:            1
    nodatavals:          (nan,)
    _FillValue:          nan
    ...                  ...
    offsets:             (0.0,)
    filename:            ['LC08_L1TP_224078_20200518_20200518_01_RT_B2.TIF', ...
    resampling:          nearest
    AREA_OR_POINT:       Point
    _data_are_separate:  1
    _data_are_stacked:   1

By default, geowombat will stack multiple files by time. So, to stack multiple bands with the same timestamp, change the stack_dim keyword.

In [6]: from geowombat.data import l8_224078_20200518_B2, l8_224078_20200518_B3, l8_224078_20200518_B4

In [7]: with gw.open(
   ...:     [
   ...:         l8_224078_20200518_B2, l8_224078_20200518_B3, l8_224078_20200518_B4
   ...:     ],
   ...:     stack_dim='band'
   ...: ) as src:
   ...:     print(src)
   ...: 
<xarray.DataArray (band: 3, y: 1860, x: 2041)> Size: 23MB
dask.array<concatenate, shape=(3, 1860, 2041), dtype=uint16, chunksize=(1, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * band     (band) int64 24B 1 1 1
  * x        (x) float64 16kB 7.174e+05 7.174e+05 ... 7.785e+05 7.786e+05
  * y        (y) float64 15kB -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
Attributes: (12/13)
    transform:           (30.0, 0.0, 717345.0, 0.0, -30.0, -2776995.0)
    crs:                 32621
    res:                 (30.0, 30.0)
    is_tiled:            1
    nodatavals:          (nan,)
    _FillValue:          nan
    ...                  ...
    offsets:             (0.0,)
    filename:            ['LC08_L1TP_224078_20200518_20200518_01_RT_B2.TIF', ...
    resampling:          nearest
    AREA_OR_POINT:       Point
    _data_are_separate:  1
    _data_are_stacked:   1

Note

If time names are not specified with stack_dim = ‘time’, geowombat will attempt to parse dates from the file names. This could incur some overhead when the file list is long. Therefore, it is good practice to specify the time names.

Overhead required to parse file names

with gw.open(long_file_list, stack_dim='time') as src:
    ...

No file parsing overhead

with gw.open(long_file_list, time_names=my_time_names, stack_dim='time') as src:
    ...

Opening multiple bands as a mosaic#

When a list of files are given, geowombat will stack the data by default. To mosaic multiple files into the same band coordinate, use the mosaic keyword.

In [8]: from geowombat.data import l8_224077_20200518_B2, l8_224078_20200518_B2

In [9]: with gw.open(
   ...:     [
   ...:         l8_224077_20200518_B2, l8_224078_20200518_B2
   ...:     ],
   ...:     mosaic=True
   ...: ) as src:
   ...:     print(src)
   ...: 
<xarray.DataArray (band: 1, y: 1515, x: 2006)> Size: 24MB
dask.array<where, shape=(1, 1515, 2006), dtype=float64, chunksize=(1, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * band     (band) int64 8B 1
  * x        (x) float64 16kB 6.94e+05 6.94e+05 ... 7.541e+05 7.542e+05
  * y        (y) float64 12kB -2.767e+06 -2.767e+06 ... -2.812e+06 -2.812e+06
Attributes: (12/13)
    transform:           (30.0, 0.0, 694005.0, 0.0, -30.0, -2766615.0)
    crs:                 32621
    res:                 (30.0, 30.0)
    is_tiled:            1
    nodatavals:          (nan,)
    _FillValue:          nan
    ...                  ...
    offsets:             (0.0,)
    AREA_OR_POINT:       Point
    resampling:          nearest
    geometries:          [<POLYGON ((754185 -2812065, 754185 -2766615, 694005...
    _data_are_separate:  1
    _data_are_stacked:   0

See Raster I/O for more examples illustrating file opening.