build_dataset#

geowombat.detect.build_dataset(src, labels, class_col, out_dir, tile_size=640, overlap=0.1, val_split=0.2, min_box_pixels=8, background_ratio=0.0, band_indices=None, scale=None, oriented=False, image_format='jpg', seed=42, class_names=None)#

Write a YOLO-format training dataset from a raster + label GDF.

Parameters:
srcxarray.DataArray

Raster opened with gw.open().

labelsgeopandas.GeoDataFrame, str, or Path

Vector labels. Polygons are converted to bounding boxes; existing box geometries are used as-is.

class_colstr

Column in labels holding class name/id.

out_dirstr or Path

Output directory. Will be created if missing. The Ultralytics layout images/{train,val} + labels/{train,val} is written plus a data.yaml at the root.

tile_sizeint

Square tile edge in pixels. Default 640.

overlapfloat

Fractional overlap between adjacent tiles (0..0.9). Default 0.1.

val_splitfloat

Fraction of tiles assigned to the validation split. Default 0.2.

min_box_pixelsint

Minimum width or height (in pixels) for a box to be kept after tile clipping. Default 8.

background_ratiofloat

Fraction (0..1) of empty tiles to retain. Default 0 (drop all).

band_indiceslist of int, optional

Three band indices (0-based) for the R, G, B channels. Required for non-3-band rasters or non-uint8 data unless the source is already 3-band uint8.

scaletuple of (lo, hi), optional

Linear stretch applied before writing. If None and dtype is uint8, no stretch is applied; otherwise a per-tile 2-98 pct stretch is used.

orientedbool

If True, write OBB labels (8 corner coords). Default False.

image_format{‘jpg’, ‘png’}

Tile image format. Default ‘jpg’.

seedint

RNG seed for train/val split. Default 42.

class_nameslist of str, optional

Override class ordering. If None, classes are taken from labels[class_col] sorted alphabetically.

Returns:
dict

Summary with keys out_dir, classes, n_train, n_val, n_boxes.