Object Detection with geowombat#

This notebook walks through the geowombat object-detection module:

  1. Quickstart — run a pretrained YOLO model on a public satellite detection benchmark (NWPU VHR-10 via TorchGeo) and score its accuracy.

  2. Geowombat-native workflow — fetch a NAIP aerial image from Microsoft Planetary Computer plus OpenStreetMap building footprints, build a YOLO training dataset, run inference, score it, and export a GeoPackage you can review feature-by-feature in QGIS (e.g. with the GoToNextFeature3+ plugin).

  3. Optional fine-tuning — fine-tune YOLO on the dataset built in step 2 so detections are actually useful for buildings.

  4. Refine boxes to polygons with SAM — turn the fine-tuned bounding-box detections into precise vector footprints using Meta’s Segment Anything Model.

Required installs:

pip install geowombat[dl,detect,sam,stac]
pip install osmnx torchgeo

Setup#

[1]:
import warnings
from pathlib import Path

import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np

import geowombat as gw
from geowombat.detect import (
    YOLODetector,
    boxes_from_polygons,
    build_dataset,
    detection_accuracy,
    export_for_review,
    plot_detections,
    predict,
)

warnings.filterwarnings('ignore')
WORK_DIR = Path('object_detection_demo')
WORK_DIR.mkdir(exist_ok=True)
/home/mmann1123/miniconda3/envs/geowombat_dev/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

1. Quickstart: pretrained YOLO on NWPU VHR-10#

NWPU VHR-10 is a small (~715 images, 10 classes) public satellite-imagery detection benchmark. TorchGeo auto-downloads it. We’ll run a pretrained Ultralytics YOLO model on a few images and score the result.

Caveat: YOLO’s default weights are trained on COCO, which only overlaps with VHR-10 on a couple of classes (e.g. airplane, ship, vehicle). Expect low mAP — this section is about exercising the API, not about state-of-the-art accuracy. The geowombat-native section below shows the full train-and-score loop.

[2]:
from torchgeo.datasets import VHR10

vhr_dir = WORK_DIR / 'vhr10'
ds = VHR10(root=str(vhr_dir), split='positive', download=True)
print(f'Dataset size: {len(ds)} positive images')
print('Classes:', ds.categories[:5], '...')
Files already downloaded and verified
loading annotations into memory...
Done (t=0.02s)
creating index...
index created!
Dataset size: 650 positive images
Classes: ('background', 'airplane', 'ships', 'storage tank', 'baseball diamond') ...
[3]:
# Inspect a single sample. TorchGeo 0.9 keys: image, label, bbox_xyxy, mask
sample = ds[0]
img = sample['image'].permute(1, 2, 0).numpy().astype(np.uint8)
boxes = sample['bbox_xyxy'].numpy()
labels = sample['label'].numpy()

fig, ax = plt.subplots(figsize=(8, 8))
ax.imshow(img)
for (x1, y1, x2, y2), lbl in zip(boxes, labels):
    ax.add_patch(plt.Rectangle((x1, y1), x2-x1, y2-y1,
                               fill=False, edgecolor='yellow', lw=2))
    ax.text(x1, y1-3, ds.categories[lbl], color='yellow', fontsize=8)
ax.set_title('NWPU VHR-10 sample with ground-truth boxes')
ax.axis('off')
plt.show()
_images/object_detection_5_0.png

Picking a YOLO variant#

Ultralytics ships several pretrained YOLO weight families. The right choice depends on what you’re detecting, not just how fast you need it to run.

Weight family

Trained on

Classes

When to use

yolov8{n,s,m,l,x}.pt

COCO

80 everyday classes (person, car, dog, …)

Ground-level photos. Wrong tool for overhead imagery — most COCO classes never appear from above.

yolo11{n,s,m,l,x}.pt

COCO

Same 80 classes

Newer architecture, same training set as YOLOv8 COCO. Same caveat for aerial data.

yolov8{n,s,m,l,x}-obb.pt

DOTA-v1 (aerial)

15 aerial classes: plane, ship, storage tank, baseball diamond, tennis court, basketball court, ground track field, harbor, bridge, large vehicle, small vehicle, helicopter, roundabout, soccer ball field, swimming pool

Best out-of-the-box choice for satellite/aerial imagery. Predicts rotated boxes (OBB).

yolov8{s,m,l,x}-worldv2.pt

LVIS + grounding

Open-vocabulary — you pass text prompts at inference time

When DOTA classes don’t cover what you need. Use model.set_classes(['airplane', 'truck', ...]) before predicting.

Custom-trained checkpoint

Your dataset

Whatever you trained on

After fine-tuning — see Section 3 below.

Size suffix (n < s < m < l < x) trades inference speed for accuracy. Default to n or s for prototyping, m+ for production.

Non-YOLO option: geowombat.detect.TorchGeoDetector wraps torchvision Faster R-CNN / RetinaNet and can load TorchGeo’s xView pretrained weights — a richer 60-class aerial label set. Use it when the AGPL license on Ultralytics is a problem or when xView’s class set is a better fit than DOTA’s.

[4]:
# 1. Default `yolov8n.pt` — trained on COCO, the wrong tool for overhead
# imagery. We expect garbage detections (a 'train' box covering the
# entire chip, etc.) because COCO classes don't generalize.
from ultralytics import YOLO

# Load the smallest YOLOv8 weights (auto-downloaded on first run).
yolo_coco = YOLO('yolov8n.pt')
result = yolo_coco.predict(
    source=img,        # the image to run on
    conf=0.1,          # keep detections above this confidence
    verbose=False,
)[0]                   # one image in, one result out

fig, ax = plt.subplots(figsize=(8, 8))
ax.imshow(img)
if result.boxes is not None and len(result.boxes) > 0:
    for b, c, s in zip(result.boxes.xyxy.cpu().numpy(),
                       result.boxes.cls.cpu().numpy().astype(int),
                       result.boxes.conf.cpu().numpy()):
        x1, y1, x2, y2 = b
        ax.add_patch(plt.Rectangle((x1, y1), x2-x1, y2-y1,
                                    fill=False, edgecolor='red', lw=2))
        ax.text(x1, y1-3, f'{yolo_coco.names[c]} {s:.2f}',
                color='red', fontsize=8)
ax.set_title('YOLOv8n (COCO weights) on VHR-10 — note the misfires')
ax.axis('off')
plt.show()
_images/object_detection_7_0.png

Improved: DOTA-pretrained YOLO-OBB#

Same model architecture, different training data. yolov8n-obb.pt was trained on DOTA-v1, a satellite/aerial benchmark with the 15 classes listed above. It produces oriented bounding boxes (four corner points each) instead of axis-aligned rectangles, which is what you want for objects on a runway, ship in a harbor, etc.

On the same VHR-10 chip you should now see clean plane detections with reasonable confidence — no fine-tuning required.

[5]:
# 2. yolov8n-obb.pt — DOTA-trained. Same call, different weights.
import matplotlib.patheffects as pe

# `-obb.pt` weights predict rotated boxes instead of axis-aligned ones.
yolo_obb = YOLO('yolov8n-obb.pt')
result_obb = yolo_obb.predict(
    source=img,
    conf=0.25,     # this model is more confident, so we can raise the threshold
    verbose=False,
)[0]

fig, ax = plt.subplots(figsize=(8, 8))
ax.imshow(img)
if result_obb.obb is not None and len(result_obb.obb) > 0:
    quads = result_obb.obb.xyxyxyxy.cpu().numpy()  # (N, 4, 2)
    clses = result_obb.obb.cls.cpu().numpy().astype(int)
    confs = result_obb.obb.conf.cpu().numpy()
    for quad, cls, conf in zip(quads, clses, confs):
        polygon = plt.Polygon(quad, fill=False, edgecolor='lime', lw=2)
        ax.add_patch(polygon)
        cx, cy = quad.mean(axis=0)
        ax.text(cx, cy, f'{yolo_obb.names[cls]} {conf:.2f}',
                color='lime', fontsize=10, ha='center',
                path_effects=[pe.withStroke(linewidth=2,
                                            foreground='black')])
ax.set_title('YOLOv8n-OBB (DOTA weights) on VHR-10 — proper aerial classes')
ax.axis('off')
plt.show()
_images/object_detection_9_0.png

Same YOLODetector interface, just point weights= at the OBB checkpoint and set oriented=True (auto-detected from the filename ending in -obb.pt). It then plugs into src.gw.detect exactly like the COCO variant — but the resulting GeoDataFrame carries rotated polygon geometries instead of axis-aligned boxes:

from geowombat.detect import YOLODetector

det = YOLODetector(weights='yolov8n-obb.pt')   # oriented=True inferred
with gw.open('aerial.tif') as src:
    preds = src.gw.detect(det, conf=0.25)

2. Geowombat-native workflow: NAIP + OpenStreetMap buildings#

Now the full pipeline using a georeferenced raster and georeferenced vector labels:

  1. Pull a NAIP scene from Microsoft Planetary Computer for a small AOI.

  2. Pull OSM building footprints for the same AOI with osmnx.

  3. Build a YOLO-format training dataset with build_yolo_dataset.

  4. Run inference; observe that COCO-pretrained YOLO won’t see buildings.

  5. Score with detection_accuracy and export a review GeoPackage.

  6. Optional: fine-tune YOLO on the dataset from step 3.

[6]:
# AOI: ~2 km square over Capitol Hill, Washington DC. Row-house
# residential gives the fine-tune cell good signal (dense, regular,
# visually distinctive buildings). Adjust the bounds for any other
# city — wider bounds give more training data but a slower download.
from shapely.geometry import box as shapely_box

AOI_BOUNDS = (-77.012, 38.882, -77.000, 38.898)  # west, south, east, north (~1 km x 1.8 km, ~half the original AOI)
aoi = gpd.GeoDataFrame(
    {'name': ['demo']},
    geometry=[shapely_box(*AOI_BOUNDS)],
    crs='EPSG:4326',
)
aoi.plot()
plt.title('Demo AOI: Capitol Hill, Washington DC')
plt.show()
_images/object_detection_12_0.png
[7]:
# Fetch a recent NAIP scene via geowombat's STAC interface. This
# searches Planetary Computer, signs the URL, clips to the AOI, and
# returns both the DataArray and a GeoDataFrame of the matching
# STAC items (with each item's true scene polygon as its geometry).
from geowombat.core.stac import open_stac

naip_local = WORK_DIR / 'naip_aoi.tif'

data, naip_items = open_stac(
    stac_catalog='microsoft_v1',
    collection='naip',
    bounds=AOI_BOUNDS,            # lat/lon (left, bottom, right, top)
    start_date='2021-01-01',
    end_date='2023-12-31',
    max_items=1,                  # take the most recent scene
    epsg=26918,                   # NAD83 / UTM 18N for the DC area
    resolution=0.6,               # NAIP native ground sample distance
    compute=True,                 # eagerly download into memory
)

# Materialize to disk so later cells can re-open with chunked I/O.
data.gw.save(naip_local, num_workers=4, overwrite=True)
naip_crs = data.gw.crs_to_pyproj
print(f'NAIP scene: {naip_items["id"].iloc[0]}')
print(f'Wrote local AOI clip: {naip_local}')

# The STAC item carries the actual scene polygon (not just a bbox).
# Intersect with the AOI to get the *valid* coverage area — buildings
# outside this polygon would sit on no-data pixels.
from shapely.geometry import box as shapely_box
aoi_poly_4326 = shapely_box(*AOI_BOUNDS)
valid_polygon_4326 = naip_items.geometry.iloc[0].intersection(aoi_poly_4326)
Searching microsoft_v1 for naip...
Found 1 items.
NAIP scene: va_m_3807708_se_18_060_20231113_20240103
Wrote local AOI clip: object_detection_demo/naip_aoi.tif
[8]:
# Fetch OSM buildings within the NAIP coverage polygon (not the raw
# AOI bbox), so we don't ingest labels in areas with no imagery.
import osmnx as ox

tags = {'building': True}
buildings = ox.features_from_polygon(valid_polygon_4326, tags=tags)
buildings = buildings[buildings.geometry.type.isin(['Polygon', 'MultiPolygon'])].copy()
buildings['class_name'] = 'building'
print(f'Fetched {len(buildings)} building footprints inside NAIP coverage')
buildings.head(3)
Fetched 2061 building footprints inside NAIP coverage
[8]:
geometry addr:state building ele gnis:feature_id name source wikidata wikipedia fax ... dcgis:ssl dcgis:update_date name:br year_of_construction name:etymology contact:flickr contact:pinterest contact:twitter ref:isil class_name
element id
relation 286501 POLYGON ((-77.00392 38.89077, -77.00375 38.890... DC office NaN NaN Supreme Court of the United States NaN Q11201 en:Supreme Court of the United States NaN ... 0728 0843 Mon Mar 21 00:00:00 UTC 2005 Lez-veur ar Stadoù-Unanet 1935 NaN NaN NaN NaN NaN building
286503 POLYGON ((-77.00614 38.88635, -77.00614 38.886... DC government NaN NaN Cannon House Office Building NaN Q1033452 en:Cannon House Office Building NaN ... 0690 0800 Mon Mar 21 00:00:00 UTC 2005 NaN NaN NaN NaN NaN NaN NaN building
554408 POLYGON ((-77.00188 38.89778, -77.0019 38.8977... NaN apartments NaN NaN NaN dcgis NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN building

3 rows × 150 columns

[9]:
# Reproject buildings into the raster CRS and double-check they fall
# inside the NAIP scene polygon.
buildings_proj = buildings.to_crs(naip_crs)
valid_polygon_naip_crs = (
    gpd.GeoSeries([valid_polygon_4326], crs='EPSG:4326')
      .to_crs(naip_crs).iloc[0]
)
buildings_proj = buildings_proj[
    buildings_proj.intersects(valid_polygon_naip_crs)
].copy()

# sensor='naip' tells geowombat NAIP's bands are red/green/blue/nir
# (4-band RGBN). Inside this context we can select bands by name,
# and detection helpers also resolve `band_indices` themselves — no
# need to pass `band_indices=[0, 1, 2]` per call.
with gw.config.update(sensor='naip'):
    with gw.open(naip_local, chunks=512) as src:
        print(f'NAIP scene: {src.gw.nrows} x {src.gw.ncols}, '
              f'{src.gw.nbands} bands, {src.gw.cellx:.2f} m')
        fig, ax = plt.subplots(figsize=(10, 10))
        # `.gw.imshow` handles the (band, y, x) -> (y, x, band) reorder
        # and uses the raster's geographic extent automatically.
        (src.sel(band=['red', 'green', 'blue'])
            .gw.imshow(ax=ax, robust=True))
        buildings_proj.boundary.plot(ax=ax, color='yellow', lw=1)
        ax.set_title('NAIP + OSM buildings (truth labels)')
        plt.show()
NAIP scene: 2997 x 1799, 4 bands, 0.60 m
_images/object_detection_15_1.png
[10]:
# Build a YOLO-format training dataset from the NAIP raster + buildings.
# NAIP bands 0,1,2 are R,G,B already 8-bit — no scaling needed. We use
# the `.gw.to_yolo_dataset(...)` accessor so the call stays inside the
# `with gw.open(...) as src:` block, like every other geowombat workflow.
yolo_dir = WORK_DIR / 'naip_buildings_yolo'

with gw.config.update(sensor='naip'):
    with gw.open(naip_local, chunks=512) as src:
        info = src.gw.to_yolo_dataset(
            labels=buildings_proj,    # vector labels to use
            class_col='class_name',   # column with the class name
            out_dir=yolo_dir,         # where to write the dataset
            tile_size=640,            # tile size in pixels
            overlap=0.1,              # how much tiles overlap (0–0.9)
            val_split=0.2,            # fraction of tiles used for validation
            min_box_pixels=10,        # drop boxes smaller than this
            background_ratio=0.1,     # keep some empty tiles as negatives
            scale=None,               # NAIP is already 8-bit, no rescaling
        )
print(info)
{'out_dir': 'object_detection_demo/naip_buildings_yolo', 'classes': ['building'], 'n_train': 17, 'n_val': 7, 'n_boxes': 2296, 'empty_kept': 0, 'empty_skipped': 0}

Inference with pretrained YOLO#

We run YOLODetector directly on the NAIP scene. Pretrained COCO weights have no ‘building’ class, so we expect very few or no useful detections — but this verifies the inference pipeline (tile windowing, cross-tile NMS, pixel→CRS box conversion) end-to-end.

[11]:
# `YOLODetector` holds the loaded model. `src.gw.detect(detector, ...)`
# is the geowombat-native call site — same shape as `src.gw.predict()`
# for classification.
# Load the model once and reuse it.
det = YOLODetector(weights='yolov8n.pt')

with gw.config.update(sensor='naip'):
    with gw.open(naip_local, chunks=512) as src:
        preds = src.gw.detect(
            det,
            tile_size=640,            # tile size for inference
            overlap=0.2,              # overlap so objects on tile edges aren't missed
            conf=0.10,                # low threshold: COCO weights don't know buildings
            scale=None,
            nms_iou=0.5,              # how much overlap before duplicates are merged
            progress=True,
        )
print(f'{len(preds)} detections from pretrained YOLO')
preds.head()
YOLODetector: 100%|██████████| 24/24 [00:01<00:00, 21.15it/s]
32 detections from pretrained YOLO

[11]:
geometry class_id class_name score tile_id
0 POLYGON ((326168.456 4306310.848, 326168.456 4... 74 clock 0.621648 13
1 POLYGON ((325846.515 4305680.666, 325846.515 4... 74 clock 0.437652 20
2 POLYGON ((326326.227 4305785.563, 326326.227 4... 9 traffic light 0.375044 22
3 POLYGON ((326341.092 4306383.421, 326341.092 4... 7 truck 0.351716 10
4 POLYGON ((325744.14 4306727.808, 325744.14 430... 2 car 0.338790 4

Reading the accuracy output#

Before we compute metrics, a quick glossary — these terms appear in the tables below and in the per-feature status column of the review GeoPackage.

Per-detection / per-truth labels

  • TP — True Positive. A predicted box that overlaps a ground-truth box with IoU ≥ the threshold and has the correct class. A hit.

  • FP — False Positive. A predicted box with no matching truth at the chosen IoU. A hallucination.

  • FP_class — Wrong-class False Positive. Predicted box landed in about the right place (IoU ≥ threshold against some truth) but the class label was wrong. Useful to separate ‘model can’t see it’ from ‘model sees it but mislabels it’.

  • FN — False Negative. A ground-truth box no detection matched. A miss.

IoU — Intersection over Union. The area of overlap divided by the area of union between two boxes. 1.0 = perfect overlap, 0.0 = no overlap. The IoU threshold (0.3, 0.5, …) controls how strict the spatial agreement needs to be to count as a match. 0.5 is the standard PASCAL VOC threshold; 0.3 is more lenient and useful for noisy or very small targets.

Aggregate metrics

  • precision = TP / (TP + FP) — of what the model predicted, what fraction was right? Penalizes false alarms.

  • recall = TP / (TP + FN) — of all the real objects, what fraction did the model find? Penalizes misses.

  • F1 = 2 · (precision · recall) / (precision + recall) — harmonic mean of the two; one number that drops if either side is poor.

  • AP — Average Precision. Sweeps the confidence threshold from high to low and integrates the precision-recall curve. A single per-class number that captures the precision/recall tradeoff over all confidence levels.

  • mAP@0.5 — the mean of per-class AP at IoU ≥ 0.5. The standard single-number summary for a detector.

  • mAP@[.5:.95] — average of mAP at IoU thresholds 0.5, 0.55, …, 0.95. COCO’s stricter measure; rewards tightly localized boxes.

  • support — number of ground-truth boxes for that class (the denominator for recall). Small support = noisy estimate.

Higher is better for everything except the bucket counts (where higher is better only for TP).

[12]:
# Accuracy assessment vs. OSM truth. We re-tag predictions as 'building'
# so the comparison is meaningful — pretrained YOLO labels things
# 'car', 'truck', etc., but we just want to see what it *boxed*.
preds_as_building = preds.copy()
preds_as_building['class_name'] = 'building'

results_pretrained = detection_accuracy(
    predictions=preds_as_building,
    truth=buildings_proj[['class_name', 'geometry']],
    class_col='class_name',
    iou_thresholds=(0.3, 0.5),
)
print('Per-class metrics (pretrained YOLO, no fine-tuning):')
print(results_pretrained['metrics'])
print()
print('Summary:', results_pretrained['summary'])
results = results_pretrained  # alias kept for the cells below
Per-class metrics (pretrained YOLO, no fine-tuning):
                              ap  precision    recall        f1  tp  fp    fn  \
iou_threshold class
0.3           building  0.022727    0.09375  0.001456  0.002867   3  29  2058
0.5           building  0.022727    0.09375  0.001456  0.002867   3  29  2058

                        support
iou_threshold class
0.3           building     2061
0.5           building     2061

Summary: {'mAP@0.3': 0.022727272727272728, 'mAP@0.5': 0.022727272727272728}
[13]:
# Visualize TP / FP / FN
with gw.config.update(sensor='naip'):
    with gw.open(naip_local, chunks=512) as src:
        fig, ax = plt.subplots(figsize=(12, 12))
        plot_detections(
            src,
            predictions=results['matched'],
            truth=buildings_proj,
            ax=ax,
            scale=None,
        )
        plt.show()
_images/object_detection_21_0.png
[14]:
# Export the review GeoPackage. Open this in QGIS, use the attribute
# form or GoToNextFeature3+ to step through each detection, and fill in
# `reviewer_label` (TP/FP/FN/unclear). Then call recompute_from_review.
review_path = WORK_DIR / 'review.gpkg'
export_for_review(results['matched'], review_path)
print(f'Review file: {review_path.resolve()}')
print('  In QGIS: Open layer → switch to Form view → use',
      '"GoToNextFeature3+" or built-in next-feature shortcut.')
Review file: /home/mmann1123/Documents/github/geowombat/notebooks/object_detection_demo/review.gpkg
  In QGIS: Open layer → switch to Form view → use "GoToNextFeature3+" or built-in next-feature shortcut.

3. Fine-tune YOLO on the building dataset#

Pretrained COCO YOLO can’t recognize buildings. Fine-tuning teaches it to. We train yolov8n (the smallest variant, ~3M parameters) on the dataset we just built. Settings chosen for CPU-friendliness: small imgsz, small batch, modest epochs. On a typical laptop CPU expect a few minutes total; on GPU, well under one minute.

We send Ultralytics’ runs/ output into WORK_DIR so the demo leaves no clutter at the repo root.

[15]:
from ultralytics import YOLO

# Start from COCO weights and fine-tune on our buildings dataset.
yolo_train = YOLO('yolov8n.pt')
_ = yolo_train.train(
    data=str(yolo_dir.resolve() / 'data.yaml'),  # the dataset we just built
    epochs=15,           # how many passes over the data
    imgsz=416,           # training image size
    batch=4,             # batch size (small for CPU)
    name='gw_buildings', # folder name for this run
    exist_ok=True,       # overwrite previous run if present
    verbose=False,
    plots=False,
)

# Ultralytics writes runs/ relative to its own cwd regardless of the
# project= kwarg, so we read the actual save_dir from the trainer.
best_weights = Path(yolo_train.trainer.save_dir) / 'weights' / 'best.pt'
print(f'Best weights: {best_weights}')
print(f'Exists: {best_weights.exists()}')
New https://pypi.org/project/ultralytics/8.4.52 available 😃 Update with 'pip install -U ultralytics'
Ultralytics 8.4.51 🚀 Python-3.11.14 torch-2.10.0+cu128 CUDA:0 (NVIDIA GeForce RTX 3080 Ti, 11911MiB)
engine/trainer: agnostic_nms=False, amp=True, angle=1.0, augment=False, auto_augment=randaugment, batch=4, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, cls_pw=0.0, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=/home/mmann1123/Documents/github/geowombat/notebooks/object_detection_demo/naip_buildings_yolo/data.yaml, degrees=0.0, deterministic=True, device=0, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, end2end=None, epochs=15, erasing=0.4, exist_ok=True, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=416, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=0.0, name=gw_buildings, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=False, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, rle=1.0, save=True, save_conf=False, save_crop=False, save_dir=/home/mmann1123/Documents/github/geowombat/notebooks/runs/detect/gw_buildings, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Overriding model.yaml nc=80 with nc=1

                   from  n    params  module                                       arguments
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
  2                  -1  1      7360  ultralytics.nn.modules.block.C2f             [32, 32, 1, True]
  3                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
  4                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]
  5                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
  6                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]
  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
  8                  -1  1    460288  ultralytics.nn.modules.block.C2f             [256, 256, 1, True]
  9                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 12                  -1  1    148224  ultralytics.nn.modules.block.C2f             [384, 128, 1]
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 15                  -1  1     37248  ultralytics.nn.modules.block.C2f             [192, 64, 1]
 16                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]
 17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 18                  -1  1    123648  ultralytics.nn.modules.block.C2f             [192, 128, 1]
 19                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]
 20             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 21                  -1  1    493056  ultralytics.nn.modules.block.C2f             [384, 256, 1]
 22        [15, 18, 21]  1    751507  ultralytics.nn.modules.head.Detect           [1, 16, None, [64, 128, 256]]
Model summary: 130 layers, 3,011,043 parameters, 3,011,027 gradients, 8.2 GFLOPs

Transferred 319/355 items from pretrained weights
Freezing layer 'model.22.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks...
AMP: checks passed ✅
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4462.3±2704.2 MB/s, size: 107.7 KB)
train: Scanning /home/mmann1123/Documents/github/geowombat/notebooks/object_detection_demo/naip_buildings_yolo/labels/train.cache... 17 images, 0 backgrounds, 0 corrupt: 100% ━━━━━━━━━━━━ 17/17 6.5Mit/s 0.0s
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4503.8±2277.7 MB/s, size: 104.8 KB)
val: Scanning /home/mmann1123/Documents/github/geowombat/notebooks/object_detection_demo/naip_buildings_yolo/labels/val.cache... 7 images, 0 backgrounds, 0 corrupt: 100% ━━━━━━━━━━━━ 7/7 699.1Kit/s 0.0s
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.002, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 416 train, 416 val
Using 8 dataloader workers
Logging results to /home/mmann1123/Documents/github/geowombat/notebooks/runs/detect/gw_buildings
Starting training for 15 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       1/15     0.586G      3.617      3.546      2.075        122        416: 100% ━━━━━━━━━━━━ 5/5 5.2it/s 1.0s0.1s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 13.7it/s 0.1s
                   all          7        513       0.01     0.0409    0.00541    0.00165

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       2/15     0.789G      3.402      3.535      1.934         80        416: 100% ━━━━━━━━━━━━ 5/5 35.8it/s 0.1s.2s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 47.8it/s 0.0s
                   all          7        513     0.0148     0.0604    0.00527    0.00191

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       3/15     0.789G      3.555      3.409      1.876        184        416: 100% ━━━━━━━━━━━━ 5/5 41.5it/s 0.1s0.1s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 46.4it/s 0.0s
                   all          7        513     0.0157     0.0643      0.007    0.00328

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       4/15     0.789G      3.327      3.314       1.64        141        416: 100% ━━━━━━━━━━━━ 5/5 35.5it/s 0.1s.2s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 48.6it/s 0.0s
                   all          7        513     0.0219     0.0897     0.0211     0.0108

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       5/15     0.789G      3.456      3.183      1.571        239        416: 100% ━━━━━━━━━━━━ 5/5 40.8it/s 0.1s.2s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 48.1it/s 0.0s
                   all          7        513     0.0214     0.0877     0.0237     0.0115
Closing dataloader mosaic

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       6/15     0.789G        3.1      2.863      1.485         16        416: 100% ━━━━━━━━━━━━ 5/5 41.2it/s 0.1s.2s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 41.7it/s 0.0s
                   all          7        513      0.021     0.0858     0.0286     0.0129

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       7/15     0.789G      3.471      2.822      1.402        180        416: 100% ━━━━━━━━━━━━ 5/5 42.3it/s 0.1s0.1s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 24.4it/s 0.0s
                   all          7        513     0.0238     0.0975     0.0245      0.012

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       8/15     0.789G      3.232      2.558      1.404         16        416: 100% ━━━━━━━━━━━━ 5/5 30.0it/s 0.2s.6s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 41.2it/s 0.0s
                   all          7        513     0.0238     0.0975     0.0245      0.012

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       9/15     0.789G       3.38      2.591      1.383        172        416: 100% ━━━━━━━━━━━━ 5/5 38.4it/s 0.1s.2s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 41.0it/s 0.0s
                   all          7        513     0.0271      0.111     0.0265     0.0113

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      10/15     0.789G      3.297      2.386      1.309         96        416: 100% ━━━━━━━━━━━━ 5/5 43.3it/s 0.1s0.1s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 42.9it/s 0.0s
                   all          7        513     0.0367       0.15     0.0326    0.00991

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      11/15     0.789G      3.157      2.331      1.327         53        416: 100% ━━━━━━━━━━━━ 5/5 48.0it/s 0.1s0.1s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 42.5it/s 0.0s
                   all          7        513     0.0367       0.15     0.0326    0.00991

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      12/15     0.789G      3.354      2.286      1.309        172        416: 100% ━━━━━━━━━━━━ 5/5 42.8it/s 0.1s0.1s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 42.3it/s 0.0s
                   all          7        513     0.0652      0.267     0.0723     0.0203

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      13/15     0.789G       3.11      2.249      1.361         11        416: 100% ━━━━━━━━━━━━ 5/5 44.1it/s 0.1s0.1s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 44.0it/s 0.0s
                   all          7        513     0.0652      0.267     0.0723     0.0203

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      14/15     0.789G      3.139       2.37      1.408         96        416: 100% ━━━━━━━━━━━━ 5/5 45.9it/s 0.1s0.1s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 44.2it/s 0.0s
                   all          7        513     0.0652      0.267     0.0723     0.0203

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      15/15     0.789G      3.321      2.229      1.319         96        416: 100% ━━━━━━━━━━━━ 5/5 44.1it/s 0.1s0.1s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 43.7it/s 0.0s
                   all          7        513      0.079      0.324      0.107     0.0297

15 epochs completed in 0.002 hours.
Optimizer stripped from /home/mmann1123/Documents/github/geowombat/notebooks/runs/detect/gw_buildings/weights/last.pt, 6.2MB
Optimizer stripped from /home/mmann1123/Documents/github/geowombat/notebooks/runs/detect/gw_buildings/weights/best.pt, 6.2MB

Validating /home/mmann1123/Documents/github/geowombat/notebooks/runs/detect/gw_buildings/weights/best.pt...
Ultralytics 8.4.51 🚀 Python-3.11.14 torch-2.10.0+cu128 CUDA:0 (NVIDIA GeForce RTX 3080 Ti, 11911MiB)
Model summary (fused): 73 layers, 3,005,843 parameters, 0 gradients, 8.1 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 66.6it/s 0.0s
                   all          7        513     0.0786      0.322      0.106     0.0298
Speed: 0.1ms preprocess, 0.4ms inference, 0.0ms loss, 0.8ms postprocess per image
Best weights: /home/mmann1123/Documents/github/geowombat/notebooks/runs/detect/gw_buildings/weights/best.pt
Exists: True

Re-run inference with the fine-tuned model#

Same YOLODetector API, just point it at our new weights. We use a low confidence threshold (0.05) because with only ~17 training tiles and 15 epochs the model is heavily under-trained and hesitant — production weights would justify a much higher threshold like 0.25.

The point of this section is to show the fine-tuning pipeline works end-to-end. The numbers themselves will improve substantially with more data (a wider AOI), more epochs (50–100), and a larger backbone (yolov8s.pt or yolov8m.pt).

[16]:
det_ft = YOLODetector(
    weights=str(best_weights),   # our fine-tuned weights
    classes=['building'],        # class names for the output
)

with gw.config.update(sensor='naip'):
    with gw.open(naip_local, chunks=512) as src:
        preds_ft = src.gw.detect(
            det_ft,
            tile_size=416,           # match the training image size
            overlap=0.2,
            conf=0.05,               # short training run, so use a low threshold
            scale=None,
            nms_iou=0.5,
            progress=True,
        )
print(f'{len(preds_ft)} detections from fine-tuned YOLO',
      f'(was {len(preds)} pretrained)')
YOLODetector: 100%|██████████| 54/54 [00:01<00:00, 31.32it/s]
119 detections from fine-tuned YOLO (was 32 pretrained)
[17]:
# Score the fine-tuned predictions
results_ft = detection_accuracy(
    predictions=preds_ft,
    truth=buildings_proj[['class_name', 'geometry']],
    class_col='class_name',
    iou_thresholds=(0.3, 0.5),
)
print('Per-class metrics (fine-tuned YOLO):')
print(results_ft['metrics'])
print()
print('Summary:', results_ft['summary'])
Per-class metrics (fine-tuned YOLO):
                              ap  precision    recall        f1  tp  fp    fn  \
iou_threshold class
0.3           building  0.090909   0.596639  0.034449  0.065138  71  48  1990
0.5           building  0.090909   0.252101  0.014556  0.027523  30  89  2031

                        support
iou_threshold class
0.3           building     2061
0.5           building     2061

Summary: {'mAP@0.3': 0.09090909090909091, 'mAP@0.5': 0.09090909090909091}

Before vs. after#

Side-by-side comparison of the pretrained and fine-tuned runs at IoU ≥ 0.3. Look at TP/FN going up/down for recall and FP for precision — the F1 column is the single-number summary.

[18]:
import pandas as pd

iou = 0.3  # use the more lenient threshold for the comparison
row_pre = results_pretrained['metrics'].loc[(iou, 'building')]
row_ft = results_ft['metrics'].loc[(iou, 'building')]
compare = pd.DataFrame(
    {'pretrained': row_pre, 'fine-tuned': row_ft},
)
compare['delta'] = compare['fine-tuned'] - compare['pretrained']
print(f'Comparison at IoU >= {iou}:')
print(compare.round(3))
Comparison at IoU >= 0.3:
           pretrained  fine-tuned   delta
ap              0.023       0.091   0.068
precision       0.094       0.597   0.503
recall          0.001       0.034   0.033
f1              0.003       0.065   0.062
tp              3.000      71.000  68.000
fp             29.000      48.000  19.000
fn           2058.000    1990.000 -68.000
support      2061.000    2061.000   0.000
[19]:
# Visualize TP / FP / FN from the fine-tuned run
with gw.config.update(sensor='naip'):
    with gw.open(naip_local, chunks=512) as src:
        fig, ax = plt.subplots(figsize=(12, 12))
        plot_detections(
            src,
            predictions=results_ft['matched'],
            truth=buildings_proj,
            ax=ax,
            scale=None,
        )
        ax.set_title('Fine-tuned YOLO: TP (lime) / FP (red) / FN (magenta)')
        plt.show()
_images/object_detection_30_0.png

4. Refine boxes to polygons with SAM#

Detector output is bounding boxes — fine for counting and georeferencing, but downstream GIS work usually wants the actual footprint: a polygon that hugs the roof outline, the ship hull, the storage-tank circle. SAMRefiner takes each detected box as a prompt to Meta’s Segment Anything Model, returns a tight mask, and polygonizes it back into a vector geometry in the source CRS.

Requirements:

pip install geowombat[sam]

Plus a one-time checkpoint download. We use sam_vit_b (~375 MB) for speed in this notebook; sam_vit_l and sam_vit_h are slower but produce slightly cleaner masks on tricky targets.

[20]:
# One-time download of the SAM ViT-B checkpoint (~375 MB).
# Cached under WORK_DIR so subsequent runs skip the download.
import urllib.request

SAM_URL = ('https://dl.fbaipublicfiles.com/segment_anything/'
           'sam_vit_b_01ec64.pth')
sam_ckpt = WORK_DIR / 'sam_vit_b.pth'
if not sam_ckpt.exists():
    print(f'Downloading SAM checkpoint to {sam_ckpt} (~375 MB) ...')
    urllib.request.urlretrieve(SAM_URL, sam_ckpt)
    print('Done.')
else:
    print(f'Using cached checkpoint at {sam_ckpt}')
Using cached checkpoint at object_detection_demo/sam_vit_b.pth
[21]:
# Refine the fine-tuned YOLO boxes to building polygons with SAM.
# We feed the top-confidence detections so SAM has clean prompts —
# noisy boxes give noisy masks.
from geowombat.detect import SAMRefiner

refiner = SAMRefiner(checkpoint=sam_ckpt, model_type='vit_b')

prompts = (preds_ft
           .sort_values('score', ascending=False)
           .head(30)
           .reset_index(drop=True))

with gw.config.update(sensor='naip'):
    with gw.open(naip_local, chunks=512) as src:
        polygons = refiner.refine(
            src,
            prompts,
            pad_pixels=12,           # context around each box for SAM
            simplify_tolerance=0.3,  # polygon smoothing, in CRS units (meters)
        )

print(f'{len(polygons)} refined polygons '
      f'(from {len(prompts)} input boxes)')
polygons.head(3)
30 refined polygons (from 30 input boxes)
[21]:
geometry class_id class_name score tile_id
0 POLYGON ((326038.425 4305873.359, 326038.425 4... 0 building 0.389653 44
1 POLYGON ((325657.425 4307063.159, 325657.425 4... 0 building 0.292006 6
2 POLYGON ((326541.825 4305962.759, 326541.825 4... 0 building 0.289485 41
[22]:
# Side-by-side: YOLO rectangle vs. SAM polygon for the same set of
# detections. Look for polygons that track the roof outline instead
# of the bounding rectangle.
with gw.config.update(sensor='naip'):
    with gw.open(naip_local, chunks=512) as src:
        fig, axes = plt.subplots(1, 2, figsize=(20, 10))
        for ax, gdf, title in [
            (axes[0], prompts, 'YOLO boxes (input prompts)'),
            (axes[1], polygons, 'SAM polygons (refined footprints)'),
        ]:
            (src.sel(band=['red', 'green', 'blue'])
                .gw.imshow(ax=ax, robust=True))
            gdf.boundary.plot(ax=ax, color='lime', lw=1.5)
            ax.set_title(title)
            ax.set_axis_off()
        plt.tight_layout()
        plt.show()
_images/object_detection_34_0.png
[23]:
# Save the refined polygons for GIS downstream (QGIS, ArcGIS, etc.).
refined_path = WORK_DIR / 'naip_buildings_sam.gpkg'
polygons.to_file(refined_path, driver='GPKG')
print(f'Wrote {refined_path}')
Wrote object_detection_demo/naip_buildings_sam.gpkg

Tips for getting clean SAM masks

  • Prompt quality matters. SAM uses the box as a prompt, so loose, oversized YOLO boxes will produce loose masks. Filter by score (as above) or use the tighter-fit oriented=True OBB output from yolov8*-obb.pt weights for better prompts.

  • ``pad_pixels`` controls how much context SAM sees around each box. Too small and SAM may snap to a sub-part of the object; too large and it may grab a neighbor. 8–16 px is a good range.

  • ``simplify_tolerance`` is in CRS units (meters here). Raise it for cleaner-looking outlines, lower it (or set to 0) to preserve exact mask edges.

  • Backbone size: swap model_type='vit_l' or 'vit_h' (and the matching checkpoint URL) for finer masks at the cost of speed and ~3–7× more memory.

How to make the fine-tune actually good#

The numbers above are deliberately weak — we trained for 15 epochs on a tiny AOI just to keep the demo fast. Here’s what to change when you want results that you’d ship, in rough order of payoff:

  1. Get more training data. This is by far the biggest lever. YOLO needs to see hundreds to thousands of examples of the object to learn what it looks like under different lighting, angles, and surroundings. Widen the AOI, pull more cities, or merge multiple build_dataset runs into one folder. A 5× larger label set will usually beat any hyperparameter tweak.

  2. Clean up the labels. A small set of accurate boxes is worth more than a large set of sloppy ones. OpenStreetMap is crowdsourced and noisy — buildings get demolished, traced imprecisely, or sit a few meters off where they actually are. A short pass through QGIS removing or fixing the worst examples pays off more than another 50 epochs.

  3. Train longer. 15 epochs is a sketch. Real fine-tunes run 50–200 epochs. The training process saves the best checkpoint along the way, so longer training rarely hurts — it just costs time. Watch the validation mAP curve flatten and stop there.

  4. Use a bigger model. We trained yolov8n (nano, ~3M parameters) because it’s fast. Swap to yolov8s (small), yolov8m (medium), or yolov8l (large) for steady accuracy gains. Each step roughly doubles training time. On a GPU, yolov8s or yolov8m is the sweet spot for aerial imagery.

  5. Start from satellite-pretrained weights. We started from yolov8n.pt (trained on COCO — everyday photos). For overhead imagery, ``yolov8n-obb.pt`` is a much better starting point: it’s pretrained on DOTA, an aerial benchmark with planes, ships, vehicles, buildings, harbors, etc. The model already knows what bird’s-eye-view objects look like — you’re just teaching it your specific class.

  6. Tune the tile size. If your objects are tiny inside the tile (e.g. cars in a 1024-px tile), YOLO will struggle to see them. As a rule of thumb, an object should be at least ~32 pixels across after tiling. For dense small objects, shrink the tile (256–416 px) or use higher overlap so each object gets seen in at least one un-cropped tile.

  7. Sweep the confidence threshold after training. Don’t pick conf=0.05 blindly — once the model is trained, re-run inference at several thresholds (0.1, 0.25, 0.5) and look at the precision/recall trade-off via detection_accuracy. Higher thresholds give fewer, more reliable detections; lower ones catch more but include more false positives. Pick the spot that matches what your downstream use case cares about.

  8. Add hard negatives. If the model keeps flagging the same non-building things (parking lots, dark roofs), give it some empty tiles of those features in training. The build_dataset(background_ratio=0.2, ...) argument keeps a fraction of empty tiles in the dataset for exactly this reason.

  9. Use the QGIS review loop. After inference, export_for_review writes a GeoPackage you open in QGIS to tag each detection as TP / FP / FN by hand. Feed the corrected file back into recompute_from_review for honest numbers — and use the FP/FN polygons as additional training data in the next round. Two or three cycles of this usually matter more than any single training tweak.

If you do (1) and (5) and nothing else, expect a large jump. Everything else past that is fine-grained tuning.

Summary — the geowombat-native idiom#

Everything in this notebook fits the same shape as the rest of geowombat. Inside a with gw.open(...) as src: block:

  • src.gw.to_yolo_dataset(labels, class_col=..., out_dir=...) tiles the raster + labels into a YOLO training corpus on disk.

  • src.gw.detect(detector, ...) runs tiled, windowed inference with cross-tile NMS and returns a georeferenced GeoDataFrame.

Outside the block, in gw.detect:

  • YOLODetector (Ultralytics) and TorchGeoDetector (Faster R-CNN / RetinaNet) are the detector classes you pass into src.gw.detect.

  • gw.detect.predict, gw.detect.fit, gw.detect.fit_predict are module-level wrappers that mirror gw.ml.fit / predict / fit_predict for classification.

  • build_dataset is the function form of src.gw.to_yolo_dataset.

  • SAMRefiner (requires geowombat[sam]) refines boxes to polygon masks using each box as a SAM prompt — see Section 4 for the end-to-end example on the fine-tuned building detections.

  • detection_accuracy computes per-class precision/recall/F1/AP at one or more IoU thresholds and returns a review-ready GeoDataFrame.

  • export_for_review writes a GeoPackage for QGIS attribute-form review; recompute_from_review recomputes metrics after a human has labeled the review file.

When gw.config.update(sensor='rgb') (or 'bgr') is active, the accessor methods derive the RGB band indices from the active config — you don’t need to pass band_indices=[...] per call.