Data extraction#

import geowombat as gw
from geowombat.data import rgbn

Subsetting rasters#

Either a rasterio.window.Window object or tuple can be used with geowombat.open().

Slice a subset using a rasterio.window.Window.

from rasterio.windows import Window
w = Window(row_off=0, col_off=0, height=100, width=100)

bounds = (793475.76, 2049033.03, 794222.03, 2049527.24)

with gw.open(
    rgbn,
    band_names=['blue', 'green', 'red'],
    num_workers=8,
    indexes=[1, 2, 3],
    window=w,
    out_dtype='float32'
) as src:
    print(src)

Slice a subset using a tuple of bounded coordinates.

with gw.open(
    rgbn,
    band_names=['green', 'red', 'nir'],
    num_workers=8,
    indexes=[2, 3, 4],
    bounds=bounds,
    out_dtype='float32'
) as src:
    print(src)

The configuration manager provides an alternative method to subset rasters. See Configuration manager for more details.

with gw.config.update(ref_bounds=bounds):

    with gw.open(rgbn) as src:
        print(src)

By default, the subset will be returned by the upper left coordinates of the bounds, potentially shifting cell alignment with the reference raster. To subset a raster and align it to the same grid, use the ref_tar keyword.

with gw.config.update(ref_bounds=bounds, ref_tar=rgbn):

    with gw.open(rgbn) as src:
        print(src)

Clipping to bounds#

GeoWombat’s geowombat.clip_by_polygon() is an alternative method to geowombat.config.update. The geowombat.clip_by_polygon() method limits the bounds of the image to match a polygon, where the polygon can be a geopandas.GeoDataFrame, or a path to a file readable with geopandas.read_file. You can augment the clip by using the argument query on the polygon attributes, and if multiple polygons are present you can use mask_data to fill nans where polygons are not present, or expand the clip array bounds by setting expand_by=<n pixels> on each side.

import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
import geopandas as gpd

polys = gpd.read_file(l8_224078_20200518_polygons)

with gw.open(l8_224078_20200518) as src:
    print(src)
    clipped = src.gw.clip_by_polygon(
        df,
        query="name == water",
        mask_data=True,
        expand_by=1
    )
    print(clipped)

Extracting data with coordinates#

To extract values at a coordinate pair, translate the coordinates into array indices.

In [1]: import geowombat as gw

In [2]: from geowombat.data import l8_224078_20200518

# Coordinates in map projection units
In [3]: y, x = -2823031.15, 761592.60

In [4]: with gw.open(l8_224078_20200518) as src:
   ...:     j, i = gw.coords_to_indices(x, y, src)
   ...:     data = src[:, i, j].data.compute()
   ...: 

In [5]: print(data.flatten())
[7448 6882 6090]

A latitude/longitude pair can be extracted after converting to the map projection.

In [6]: import geowombat as gw

In [7]: from geowombat.data import l8_224078_20200518

# Coordinates in latitude/longitude
In [8]: lat, lon = -25.50142964, -54.39756038

In [9]: with gw.open(l8_224078_20200518) as src:
   ...:     x, y = gw.lonlat_to_xy(lon, lat, src)
   ...:     j, i = gw.coords_to_indices(x, y, src)
   ...:     data = src[:, i, j].data.compute()
   ...: 

In [10]: print(data.flatten())
[7448 6882 6090]

Extracting data with point geometry#

In the example below, l8_224078_20200518_points is a GeoPackage of point locations, and the output df is a geopandas.GeoDataFrame. To extract the raster values at the point locations, use geowombat.extract().

In [11]: import geowombat as gw

In [12]: from geowombat.data import l8_224078_20200518, l8_224078_20200518_points

In [13]: with gw.open(l8_224078_20200518) as src:
   ....:     df = src.gw.extract(l8_224078_20200518_points)
   ....: 

In [14]: print(df)
        name                         geometry  id     1     2     3
0      water  POINT (741522.314 -2811204.698)   0  7966  7326  6254
1       crop  POINT (736140.845 -2806478.364)   1  8030  7490  8080
2       tree  POINT (745919.508 -2805168.579)   2  7561  6874  6106
3  developed  POINT (739056.735 -2811710.662)   3  8302  8202  8111
4      water  POINT (737802.183 -2818016.412)   4  8277  7982  7341
5       tree  POINT (759209.443 -2828566.230)   5  7398  6711  6007

Note

The line df = src.gw.extract(l8_224078_20200518_points) could also have been written as df = gw.extract(src, l8_224078_20200518_points).

In the previous example, the point vector had a CRS that matched the raster (i.e., EPSG=32621, or UTM zone 21N). If the CRS had not matched, the geowombat.extract() function would have transformed the CRS on-the-fly.

In [15]: import geowombat as gw

In [16]: from geowombat.data import l8_224078_20200518, l8_224078_20200518_points

In [17]: import geopandas as gpd

In [18]: point_df = gpd.read_file(l8_224078_20200518_points)

In [19]: print(point_df.crs)
EPSG:32621

# Transform the CRS to WGS84 lat/lon
In [20]: point_df = point_df.to_crs('epsg:4326')

In [21]: print(point_df.crs)
epsg:4326

In [22]: with gw.open(l8_224078_20200518) as src:
   ....:     df = src.gw.extract(point_df)
   ....: 

In [23]: print(df)
        name                         geometry  id     1     2     3
0      water  POINT (741522.314 -2811204.698)   0  7966  7326  6254
1       crop  POINT (736140.845 -2806478.364)   1  8030  7490  8080
2       tree  POINT (745919.508 -2805168.579)   2  7561  6874  6106
3  developed  POINT (739056.735 -2811710.662)   3  8302  8202  8111
4      water  POINT (737802.183 -2818016.412)   4  8277  7982  7341
5       tree  POINT (759209.443 -2828566.230)   5  7398  6711  6007

Set the data band names.

In [24]: import geowombat as gw

In [25]: from geowombat.data import l8_224078_20200518, l8_224078_20200518_points

In [26]: with gw.config.update(sensor='bgr'):
   ....:     with gw.open(l8_224078_20200518) as src:
   ....:         df = src.gw.extract(
   ....:             l8_224078_20200518_points,
   ....:             band_names=src.band.values.tolist()
   ....:         )
   ....: 

In [27]: print(df)
        name                         geometry  id  blue  green   red
0      water  POINT (741522.314 -2811204.698)   0  7966   7326  6254
1       crop  POINT (736140.845 -2806478.364)   1  8030   7490  8080
2       tree  POINT (745919.508 -2805168.579)   2  7561   6874  6106
3  developed  POINT (739056.735 -2811710.662)   3  8302   8202  8111
4      water  POINT (737802.183 -2818016.412)   4  8277   7982  7341
5       tree  POINT (759209.443 -2828566.230)   5  7398   6711  6007

Extracting data with polygon geometry#

To extract values within polygons, use the same geowombat.extract() function.

In [28]: from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons

In [29]: with gw.config.update(sensor='bgr'):
   ....:     with gw.open(l8_224078_20200518) as src:
   ....:         df = src.gw.extract(
   ....:             l8_224078_20200518_polygons,
   ....:             band_names=src.band.values.tolist()
   ....:         )
   ....: 

In [30]: print(df)
     id  point                         geometry       name   blue  green    red
0     0      0  POINT (737559.502 -2795247.772)      water   7994   7423   6272
1     0      1  POINT (737589.502 -2795247.772)      water   8017   7428   6292
2     0      2  POINT (737619.502 -2795247.772)      water   8008   7446   6292
3     0      3  POINT (737649.502 -2795247.772)      water   8008   7412   6291
4     0      4  POINT (737679.502 -2795247.772)      water   8018   7398   6250
..   ..    ...                              ...        ...    ...    ...    ...
667   3    667  POINT (739038.667 -2811819.677)  developed   8567   8564   8447
668   3    668  POINT (739068.667 -2811819.677)  developed   8099   7676   7332
669   3    669  POINT (739098.667 -2811819.677)  developed  10151   9651  10153
670   3    670  POINT (739128.667 -2811819.677)  developed   8065   7735   7501
671   3    671  POINT (739158.667 -2811819.677)  developed   9343   8987   9247

[672 rows x 7 columns]