Object detection#
GeoWombat ships object detectors that operate on georeferenced rasters
and return GeoDataFrame outputs in the source CRS. Everything stays
inside the familiar with gw.open(...) as src: / src.gw.<method>
pattern, with module-level wrappers in gw.detect that mirror
fit / predict / fit_predict from gw.ml for classification.
See also
Live, executed companion notebook — Object Detection with geowombat walks the full real-world workflow on NAIP aerial imagery with OpenStreetMap building footprints: dataset construction, pretrained inference, fine-tuning, accuracy comparison, the QGIS review export, and SAM-based polygon refinement. Outputs (plots, metrics, training logs) are baked into the notebook so you can scan the full pipeline without running anything yourself.
Three detectors are included:
Detector |
Backend |
Notes |
|---|---|---|
|
Ultralytics YOLO |
Axis-aligned + oriented boxes. DOTA-v1 OBB weights recommended for aerial / satellite imagery. License: AGPL-3.0. |
|
TorchVision Faster R-CNN / RetinaNet |
Optional TorchGeo pretrained weights (e.g. xView). |
|
Segment Anything |
Refines bounding boxes to polygon masks. |
Recommended setup for aerial / satellite imagery
Pair DOTA-v1 pretrained weights (Ultralytics yolov8*-obb.pt)
with oriented bounding boxes (oriented=True). DOTA-v1 is the
standard aerial OBB benchmark — 15 classes (planes, ships, vehicles,
storage tanks, sports fields, …) — and its rotated boxes capture
objects at arbitrary heading, which is the norm in overhead imagery.
See YOLO weight families for weight choices and Convert polygon
labels to boxes below for how to generate matching training labels.
The default COCO weights (yolov8n.pt) used in the introductory
snippets are there to exercise the plumbing on the bundled Landsat
scene — they are intentionally not the right model for real
aerial work.
Setup#
Install the detection extras:
pip install "geowombat[detect]"
For SAM refinement:
pip install "geowombat[sam]"
The detector classes load their model state lazily inside __init__,
so importing geowombat.detect itself stays light — you only pay
for torch / ultralytics when you actually instantiate a detector.
Public API at a glance#
``.gw`` accessor (use these from inside ``with gw.open(…) as src:``)
src.gw.detect(detector, ...)— tiled, georeferenced inference.src.gw.to_yolo_dataset(labels, class_col=..., out_dir=...)— write a YOLO training corpus on disk.
Module-level (in ``geowombat.detect``)
predict(src, detector, **kwargs)— functional form of the accessor.fit(detector, dataset_yaml, **kwargs)— fine-tune a detector on a YOLO dataset.fit_predict(src, detector, labels, class_col, out_dir, ...)— end-to-end: build dataset → fine-tune → predict.build_dataset(...)— function form ofsrc.gw.to_yolo_dataset(alias forbuild_yolo_dataset).boxes_from_polygons(gdf, oriented=False)— polygon labels to axis-aligned or oriented bounding boxes.detection_accuracy(predictions, truth, class_col, iou_thresholds)— per-class precision / recall / F1 / AP plus a review-ready GeoDataFrame.export_for_review/recompute_from_review— QGIS review round-trip via GeoPackage.plot_detections(src, predictions, truth, ax=...)— matplotlib rendering colored by TP / FP / FN.
Examples below use the bundled Landsat 8 test scene and label polygons so they are self-contained — no downloads required:
import warnings
warnings.filterwarnings('ignore')
import geopandas as gpd
import matplotlib.pyplot as plt
import geowombat as gw
from geowombat.data import (
l8_224078_20200518,
l8_224078_20200518_polygons,
)
from geowombat.detect import (
YOLODetector,
boxes_from_polygons,
build_dataset,
detection_accuracy,
fit_predict,
plot_detections,
predict,
)
# The bundled polygon set has a `name` column with land-cover classes.
# Detection treats it as a generic class column; we rename for clarity.
labels = gpd.read_file(l8_224078_20200518_polygons)
labels['class_name'] = labels['name']
print(sorted(labels['class_name'].unique()))
# ['crop', 'developed', 'tree', 'water']
Run a pretrained detector#
The simplest workflow: open the raster, hand a detector instance to
src.gw.detect. The result is a GeoDataFrame in the raster’s CRS.
# Load the model once — weights file is auto-downloaded on first use.
det = YOLODetector(weights='yolov8n.pt')
with gw.config.update(sensor='bgr', ref_res=300):
with gw.open(l8_224078_20200518, nodata=0) as src:
preds = src.gw.detect(
det,
tile_size=320, # tile size for inference
overlap=0.0, # overlap between tiles (0–0.9)
conf=0.05, # keep detections above this confidence
scale=(0, 10000), # rescale pixel values to 0–255 (skip for 8-bit input)
)
print(f'{len(preds)} detections')
print(preds.columns.tolist())
# ['geometry', 'class_id', 'class_name', 'score', 'tile_id']
Note
The bundled scene is Landsat 8 surface-reflectance, not aerial imagery, and the YOLO weights here are pretrained on COCO. Expect few — or no — meaningful detections; this example is exercising the plumbing, not producing useful labels. The fine-tuning section below shows the realistic flow.
The three call shapes below are equivalent — pick whichever reads best in your code:
# 1. Accessor (recommended inside `with gw.open(...) as src:`)
preds = src.gw.detect(det, conf=0.05, scale=(0, 10000))
# 2. Module-level function (parallels gw.ml.predict)
preds = predict(src, det, conf=0.05, scale=(0, 10000))
# 3. Calling the detector directly
preds = det.predict(src, conf=0.05, scale=(0, 10000))
Sensor config drives band indices#
YOLO and TorchGeo detectors consume an RGB image per tile. When
gw.config.update(sensor=...) is active, src.gw.detect reads
src.band.values and picks the R / G / B triplet automatically — no
need to pass band_indices per call. Explicit band_indices=[...]
still wins.
with gw.config.update(sensor='bgr', ref_res=300):
with gw.open(l8_224078_20200518, nodata=0) as src:
print(src.band.values.tolist()) # ['blue', 'green', 'red']
preds = src.gw.detect(det, conf=0.05, scale=(0, 10000))
# band_indices automatically resolved to [2, 1, 0]
If your raster has unnamed bands, src.gw.detect falls back to
[0, 1, 2] for 3+-band rasters or broadcasts band 0 across RGB for
single-band rasters.
Convert polygon labels to boxes#
Detection works on bounding boxes. boxes_from_polygons replaces
polygon geometries with either axis-aligned envelopes (AABB) or
minimum rotated rectangles (OBB):
AABB (
oriented=False, default) — sides parallel to the image axes. Use for objects that line up with the grid: buildings in nadir aerial imagery, parking-lot cars, parcels.OBB (
oriented=True) — rotated rectangles. Recommended for most aerial / satellite work, because real-world objects appear at arbitrary heading. Pair OBB labels with DOTA-v1 pretrained weights (yolov8*-obb.pt) — see YOLO weight families.
aabb = boxes_from_polygons(labels, oriented=False)
obb = boxes_from_polygons(labels, oriented=True)
print(aabb['_box_kind'].unique(), obb['_box_kind'].unique())
# ['aabb'] ['obb']
You don’t have to call this yourself — build_dataset and
src.gw.to_yolo_dataset do it internally when you pass
oriented=True.
Digitizing polygons for high-quality OBB labels#
oriented=True uses shapely.minimum_rotated_rectangle under the
hood, which finds the smallest rotated rectangle enclosing the
polygon’s extreme points. The OBB is therefore only as good as the
polygon you feed it:
Trace the object tightly, especially along its long axis. A rectangle traced around a ship’s hull yields a clean OBB; a loose blob around the same ship yields a rectangle rotated by the blob’s noise, not the ship’s heading.
Prefer 4–8 vertex polygons that follow the object’s outline. Extra vertices off the silhouette pull the minimum rotated rectangle off-axis.
In QGIS, enable snapping and use the “Add Polygon Feature” tool on top of an orthophoto basemap; or digitize a rectangle directly with the “Rectangles from Center and a Point” / “Rectangles from 3 Points” tools when objects are near-rectangular.
For large label sets, generate loose polygons cheaply (manual or scripted) and refine them with
SAMRefinerbefore passing toboxes_from_polygons(oriented=True)— SAM masks hug the object outline, so the resulting OBB tracks the true heading.
Build a YOLO training dataset#
src.gw.to_yolo_dataset tiles the raster + label GeoDataFrame into
an Ultralytics-layout directory on disk:
out_dir/
data.yaml
images/{train,val}/tile_r####_c####.jpg
labels/{train,val}/tile_r####_c####.txt
from pathlib import Path
import tempfile
with tempfile.TemporaryDirectory() as td:
out_dir = Path(td) / 'yolo_lc'
with gw.config.update(sensor='bgr', ref_res=300):
with gw.open(l8_224078_20200518, nodata=0) as src:
info = src.gw.to_yolo_dataset(
labels, # vector labels
class_col='class_name', # column with the class name
out_dir=out_dir, # where to write the dataset
tile_size=128, # tile size in pixels
overlap=0.0, # overlap between tiles (0–0.9)
val_split=0.25, # fraction of tiles used for validation
min_box_pixels=2, # drop boxes smaller than this
scale=(0, 10000), # rescale pixel values to 0–255
background_ratio=0.0, # keep some empty tiles as negatives (0 = none)
)
print(info)
# {'out_dir': '...', 'classes': ['crop','developed','tree','water'],
# 'n_train': 3, 'n_val': 1, 'n_boxes': 4, ...}
Equivalent module-level form:
with gw.open(l8_224078_20200518, nodata=0) as src:
info = build_dataset(
src, labels, class_col='class_name',
out_dir=out_dir, tile_size=128,
scale=(0, 10000), min_box_pixels=2,
)
Key parameters:
tile_size: square tile edge in pixels. Match this to your detector’s training image size.overlap: fractional overlap between adjacent tiles (0..0.9). Useful for detection because objects on tile seams otherwise get cut.min_box_pixels: drop boxes smaller than this after tile clipping.background_ratio: fraction of empty tiles (0..1) to keep as hard negatives.0drops them all.scale=(lo, hi): linear stretch applied before writing 8-bit imagery. Required for non-uint8 rasters (Landsat / Sentinel DN).oriented=True: write OBB labels (8 corner coords per box).
End-to-end: build, fine-tune, predict#
fit_predict does all three in one call, matching the
fit_predict shape from the classification API. Useful for
notebook-style exploration:
with tempfile.TemporaryDirectory() as td:
det = YOLODetector(weights='yolov8n.pt')
with gw.config.update(sensor='bgr', ref_res=300):
with gw.open(l8_224078_20200518, nodata=0) as src:
preds, summary = fit_predict(
src,
det,
labels,
class_col='class_name',
out_dir=Path(td) / 'ds', # where the training dataset goes
tile_size=128, # tile size (training + inference)
overlap=0.0, # tile overlap
epochs=1, # training epochs (1 here for a quick demo)
min_box_pixels=2, # drop boxes smaller than this
scale=(0, 10000), # rescale pixel values to 0–255
val_split=0.5, # bundled set is tiny — keep half for validation
seed=42, # reproducible split
predict_kwargs={'conf': 0.05}, # passed through to inference
)
print(summary['classes'], summary['n_boxes'], 'training boxes')
print(len(preds), 'predictions')
Note
The bundled label set has only 4 polygons — far too few for a real
training run. Use this snippet to confirm the pipeline works, then
point at a larger dataset. With ~100s of training boxes, epochs=50
and yolov8s.pt/yolov8m.pt are reasonable starting points.
For finer-grained control — for example, to inspect or save the
fine-tuned weights between training and inference — call the steps
separately. Note fit writes Ultralytics runs/ output under
the current working directory.
from geowombat.detect import fit
with tempfile.TemporaryDirectory() as td:
ds_dir = Path(td) / 'ds'
# 1. Build the YOLO dataset
with gw.config.update(sensor='bgr', ref_res=300):
with gw.open(l8_224078_20200518, nodata=0) as src:
summary = src.gw.to_yolo_dataset(
labels, class_col='class_name', out_dir=ds_dir,
tile_size=128, scale=(0, 10000), min_box_pixels=2,
val_split=0.5, seed=42,
)
# 2. Fine-tune.
det = YOLODetector(weights='yolov8n.pt')
fit(
det,
dataset_yaml=ds_dir / 'data.yaml', # built in step 1
epochs=1, # training epochs
imgsz=128, # training image size
)
# 3. Predict — use the same tile size + scaling as in training.
with gw.config.update(sensor='bgr', ref_res=300):
with gw.open(l8_224078_20200518, nodata=0) as src:
preds = src.gw.detect(
det, conf=0.05, tile_size=128, scale=(0, 10000),
)
Accuracy assessment#
detection_accuracy computes per-class precision / recall / F1 / AP
at one or more IoU thresholds and returns:
metrics— a multi-indexDataFrameindexed by(iou_threshold, class)with columnsap,precision,recall,f1,tp,fp,fn,support.summary— a dict withmAP@{iou}keys.matched— a review-readyGeoDataFrame: every truth and every prediction tagged with status (TP,FP,FP_class,FN).
det = YOLODetector(weights='yolov8n.pt')
with gw.config.update(sensor='bgr', ref_res=300):
with gw.open(l8_224078_20200518, nodata=0) as src:
preds = src.gw.detect(det, tile_size=320, overlap=0.0,
conf=0.05, scale=(0, 10000))
# COCO classes ≠ our land-cover classes; re-tag to compare spatially.
preds = preds.copy()
preds['class_name'] = 'developed'
results = detection_accuracy(
predictions=preds,
truth=labels[['class_name', 'geometry']],
class_col='class_name',
iou_thresholds=(0.3, 0.5),
)
print(results['metrics'])
print(results['summary']) # e.g. {'mAP@0.3': ..., 'mAP@0.5': ...}
Glossary for the columns:
TP: prediction overlaps truth with IoU ≥ threshold and has the right class.
FP: prediction with no matching truth at the IoU threshold.
FP_class: prediction overlaps a truth box but the class is wrong.
FN: truth that no prediction matched.
precision = TP / (TP + FP); recall = TP / (TP + FN); F1 is their harmonic mean.
AP: integrated precision-recall curve (per class). The summary’s
mAP@{iou}is the mean of those AP values.
Visualize TP / FP / FN#
plot_detections overlays the raster with predictions and truth,
color-coded by status. Pass it the matched GeoDataFrame from
detection_accuracy to keep colors consistent with the metrics.
fig, ax = plt.subplots(figsize=(10, 10))
with gw.config.update(sensor='bgr', ref_res=300):
with gw.open(l8_224078_20200518, nodata=0) as src:
plot_detections(
src,
predictions=results['matched'],
truth=labels,
ax=ax,
scale=(0, 10000),
)
plt.show()
QGIS review round-trip#
export_for_review writes a GeoPackage you can step through
feature-by-feature in QGIS (e.g. with the GoToNextFeature3+ plugin).
After a human fills in the reviewer_label field,
recompute_from_review re-derives metrics from that human-corrected
file:
from geowombat.detect import export_for_review, recompute_from_review
export_for_review(results['matched'], './review.gpkg')
# ... open in QGIS, edit reviewer_label, save ...
final_metrics = recompute_from_review('./review.gpkg')
Refine boxes to polygons with SAM#
SAMRefiner uses each predicted bounding box as a prompt to Segment
Anything and replaces the box with a polygon mask in the same CRS. The
SAM checkpoint must be downloaded once from
Meta’s SAM page.
from geowombat.detect import SAMRefiner
refiner = SAMRefiner(checkpoint='sam_vit_b.pth', model_type='vit_b')
with gw.open(l8_224078_20200518, nodata=0) as src:
polygons = refiner.refine(src, preds, scale=(0, 10000))
polygons.to_file('refined_polygons.gpkg')
The returned GeoDataFrame has the same columns as the detector output
but geometry is now a polygon rather than a rectangle.
Choosing a backend#
YOLO is the right default for aerial / drone imagery and any case where AGPL-3.0 is acceptable. See YOLO weight families below for the trained-on options — picking the right weights matters far more than picking the right model size.
TorchGeo (
TorchGeoDetector) wraps Faster R-CNN / RetinaNet and accepts TorchGeo pretrained weights such asFASTERRCNN_RESNET50_FPN_XVIEW(60 aerial-imagery classes). Pick this when xView’s class set fits better than DOTA’s, or when the AGPL license on Ultralytics is a non-starter.SAM (
SAMRefiner) is not a detector — it polishes detector outputs into precise vector polygons. Pair it with either of the above.
YOLO weight families#
Ultralytics ships several pretrained YOLO weight families. The right
choice depends on what you’re detecting, not just how fast you
need it to run. Default yolov8n.pt is trained on COCO (person,
car, dog, …) and is the wrong tool for overhead imagery — most
COCO classes never appear from above.
Weight family |
Trained on |
Classes |
When to use |
|---|---|---|---|
|
COCO |
80 everyday classes (person, car, dog, …) |
Ground-level photos. Wrong tool for overhead imagery. |
|
COCO |
Same 80 classes |
Newer architecture, same training set. Same caveat. |
|
DOTA-v1 (aerial) |
15 aerial classes: plane, ship, storage tank, baseball diamond, tennis court, basketball court, ground track field, harbor, bridge, large vehicle, small vehicle, helicopter, roundabout, soccer ball field, swimming pool |
Best out-of-the-box choice for satellite/aerial imagery. Produces oriented (rotated) boxes. |
|
LVIS + grounding |
Open-vocabulary — pass text prompts at inference time |
When DOTA doesn’t cover what you need. Use
|
Custom-trained checkpoint |
Your dataset |
Whatever you trained on |
After fine-tuning — see End-to-end: build, fine-tune, predict above. |
Size suffix (n < s < m < l < x) trades inference
speed for accuracy. Default to n or s for prototyping, ``m``+
for production.
For OBB weights, geowombat auto-detects orientation from the filename
ending in -obb.pt — you don’t need to pass oriented=True
explicitly:
from geowombat.detect import YOLODetector
# Predictions come back as rotated polygons instead of axis-aligned
# boxes. `oriented=True` is inferred from the `-obb.pt` filename.
det = YOLODetector(weights='yolov8n-obb.pt')
with gw.open('aerial.tif') as src:
preds = src.gw.detect(
det,
conf=0.25, # this model is confident — a higher threshold works well
)
print(preds.geometry.iloc[0]) # rotated 4-corner Polygon
To fine-tune a DOTA-v1 OBB model on your own data, generate OBB labels
following Digitizing polygons for high-quality OBB labels and pass oriented=True to
build_dataset / src.gw.to_yolo_dataset.
See also the notebook#
notebooks/object_detection.ipynb runs the full real-world flow:
NAIP aerial imagery from Microsoft Planetary Computer, OpenStreetMap
building footprints, dataset construction, fine-tuning, before/after
metric comparison, and the QGIS review export.