Data extraction#
import geowombat as gw
from geowombat.data import rgbn
Subsetting rasters#
Either a rasterio.window.Window
object or tuple can be used with geowombat.open()
.
Slice a subset using a rasterio.window.Window.
from rasterio.windows import Window
w = Window(row_off=0, col_off=0, height=100, width=100)
bounds = (793475.76, 2049033.03, 794222.03, 2049527.24)
with gw.open(
rgbn,
band_names=['blue', 'green', 'red'],
num_workers=8,
indexes=[1, 2, 3],
window=w,
out_dtype='float32'
) as src:
print(src)
Slice a subset using a tuple of bounded coordinates.
with gw.open(
rgbn,
band_names=['green', 'red', 'nir'],
num_workers=8,
indexes=[2, 3, 4],
bounds=bounds,
out_dtype='float32'
) as src:
print(src)
The configuration manager provides an alternative method to subset rasters. See Configuration manager for more details.
with gw.config.update(ref_bounds=bounds):
with gw.open(rgbn) as src:
print(src)
By default, the subset will be returned by the upper left coordinates of the bounds, potentially shifting cell alignment with the reference raster. To subset a raster and align it to the same grid, use the ref_tar keyword.
with gw.config.update(ref_bounds=bounds, ref_tar=rgbn):
with gw.open(rgbn) as src:
print(src)
Clipping to bounds#
GeoWombat’s geowombat.clip_by_polygon()
is an alternative method to geowombat.config.update
. The
geowombat.clip_by_polygon()
method limits the bounds of the image to match a polygon, where the polygon
can be a geopandas.GeoDataFrame,
or a path to a file readable with geopandas.read_file.
You can augment the clip by using the argument query
on the polygon attributes, and if multiple polygons
are present you can use mask_data
to fill nans
where polygons are not present, or expand the clip
array bounds by setting expand_by=<n pixels>
on each side.
import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
import geopandas as gpd
polys = gpd.read_file(l8_224078_20200518_polygons)
with gw.open(l8_224078_20200518) as src:
print(src)
clipped = src.gw.clip_by_polygon(
df,
query="name == water",
mask_data=True,
expand_by=1
)
print(clipped)
Extracting data with coordinates#
To extract values at a coordinate pair, translate the coordinates into array indices.
In [1]: import geowombat as gw
In [2]: from geowombat.data import l8_224078_20200518
# Coordinates in map projection units
In [3]: y, x = -2823031.15, 761592.60
In [4]: with gw.open(l8_224078_20200518) as src:
...: j, i = gw.coords_to_indices(x, y, src)
...: data = src[:, i, j].data.compute()
...:
In [5]: print(data.flatten())
[7448 6882 6090]
A latitude/longitude pair can be extracted after converting to the map projection.
In [6]: import geowombat as gw
In [7]: from geowombat.data import l8_224078_20200518
# Coordinates in latitude/longitude
In [8]: lat, lon = -25.50142964, -54.39756038
In [9]: with gw.open(l8_224078_20200518) as src:
...: x, y = gw.lonlat_to_xy(lon, lat, src)
...: j, i = gw.coords_to_indices(x, y, src)
...: data = src[:, i, j].data.compute()
...:
In [10]: print(data.flatten())
[7448 6882 6090]
Extracting data with point geometry#
In the example below, l8_224078_20200518_points
is a GeoPackage of point
locations, and the output df
is a geopandas.GeoDataFrame.
To extract the raster values at the point locations, use geowombat.extract()
.
In [11]: import geowombat as gw
In [12]: from geowombat.data import l8_224078_20200518, l8_224078_20200518_points
In [13]: with gw.open(l8_224078_20200518) as src:
....: df = src.gw.extract(l8_224078_20200518_points)
....:
In [14]: print(df)
name geometry id 1 2 3
0 water POINT (741522.314 -2811204.698) 0 7966 7326 6254
1 crop POINT (736140.845 -2806478.364) 1 8030 7490 8080
2 tree POINT (745919.508 -2805168.579) 2 7561 6874 6106
3 developed POINT (739056.735 -2811710.662) 3 8302 8202 8111
4 water POINT (737802.183 -2818016.412) 4 8277 7982 7341
5 tree POINT (759209.443 -2828566.230) 5 7398 6711 6007
Note
The line df = src.gw.extract(l8_224078_20200518_points)
could also have been written as
df = gw.extract(src, l8_224078_20200518_points)
.
In the previous example, the point vector had a CRS that matched the raster (i.e., EPSG=32621, or UTM zone 21N).
If the CRS had not matched, the geowombat.extract()
function would have transformed the CRS on-the-fly.
In [15]: import geowombat as gw
In [16]: from geowombat.data import l8_224078_20200518, l8_224078_20200518_points
In [17]: import geopandas as gpd
In [18]: point_df = gpd.read_file(l8_224078_20200518_points)
In [19]: print(point_df.crs)
EPSG:32621
# Transform the CRS to WGS84 lat/lon
In [20]: point_df = point_df.to_crs('epsg:4326')
In [21]: print(point_df.crs)
epsg:4326
In [22]: with gw.open(l8_224078_20200518) as src:
....: df = src.gw.extract(point_df)
....:
In [23]: print(df)
name geometry id 1 2 3
0 water POINT (741522.314 -2811204.698) 0 7966 7326 6254
1 crop POINT (736140.845 -2806478.364) 1 8030 7490 8080
2 tree POINT (745919.508 -2805168.579) 2 7561 6874 6106
3 developed POINT (739056.735 -2811710.662) 3 8302 8202 8111
4 water POINT (737802.183 -2818016.412) 4 8277 7982 7341
5 tree POINT (759209.443 -2828566.230) 5 7398 6711 6007
Set the data band names.
In [24]: import geowombat as gw
In [25]: from geowombat.data import l8_224078_20200518, l8_224078_20200518_points
In [26]: with gw.config.update(sensor='bgr'):
....: with gw.open(l8_224078_20200518) as src:
....: df = src.gw.extract(
....: l8_224078_20200518_points,
....: band_names=src.band.values.tolist()
....: )
....:
In [27]: print(df)
name geometry id blue green red
0 water POINT (741522.314 -2811204.698) 0 7966 7326 6254
1 crop POINT (736140.845 -2806478.364) 1 8030 7490 8080
2 tree POINT (745919.508 -2805168.579) 2 7561 6874 6106
3 developed POINT (739056.735 -2811710.662) 3 8302 8202 8111
4 water POINT (737802.183 -2818016.412) 4 8277 7982 7341
5 tree POINT (759209.443 -2828566.230) 5 7398 6711 6007
Extracting data with polygon geometry#
To extract values within polygons, use the same geowombat.extract()
function.
In [28]: from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
In [29]: with gw.config.update(sensor='bgr'):
....: with gw.open(l8_224078_20200518) as src:
....: df = src.gw.extract(
....: l8_224078_20200518_polygons,
....: band_names=src.band.values.tolist()
....: )
....:
In [30]: print(df)
id point geometry name blue green red
0 0 0 POINT (737559.502 -2795247.772) water 7994 7423 6272
1 0 1 POINT (737589.502 -2795247.772) water 8017 7428 6292
2 0 2 POINT (737619.502 -2795247.772) water 8008 7446 6292
3 0 3 POINT (737649.502 -2795247.772) water 8008 7412 6291
4 0 4 POINT (737679.502 -2795247.772) water 8018 7398 6250
.. .. ... ... ... ... ... ...
667 3 667 POINT (739038.667 -2811819.677) developed 8567 8564 8447
668 3 668 POINT (739068.667 -2811819.677) developed 8099 7676 7332
669 3 669 POINT (739098.667 -2811819.677) developed 10151 9651 10153
670 3 670 POINT (739128.667 -2811819.677) developed 8065 7735 7501
671 3 671 POINT (739158.667 -2811819.677) developed 9343 8987 9247
[672 rows x 7 columns]