build_yolo_dataset#
- geowombat.detect.build_yolo_dataset(src, labels, class_col, out_dir, tile_size=640, overlap=0.1, val_split=0.2, min_box_pixels=8, background_ratio=0.0, band_indices=None, scale=None, oriented=False, image_format='jpg', seed=42, class_names=None)[source]#
Write a YOLO-format training dataset from a raster + label GDF.
- Parameters:
- srcxarray.DataArray
Raster opened with
gw.open().- labelsgeopandas.GeoDataFrame, str, or Path
Vector labels. Polygons are converted to bounding boxes; existing box geometries are used as-is.
- class_colstr
Column in
labelsholding class name/id.- out_dirstr or Path
Output directory. Will be created if missing. The Ultralytics layout
images/{train,val}+labels/{train,val}is written plus adata.yamlat the root.- tile_sizeint
Square tile edge in pixels. Default 640.
- overlapfloat
Fractional overlap between adjacent tiles (0..0.9). Default 0.1.
- val_splitfloat
Fraction of tiles assigned to the validation split. Default 0.2.
- min_box_pixelsint
Minimum width or height (in pixels) for a box to be kept after tile clipping. Default 8.
- background_ratiofloat
Fraction (0..1) of empty tiles to retain. Default 0 (drop all).
- band_indiceslist of int, optional
Three band indices (0-based) for the R, G, B channels. Required for non-3-band rasters or non-uint8 data unless the source is already 3-band uint8.
- scaletuple of (lo, hi), optional
Linear stretch applied before writing. If None and dtype is uint8, no stretch is applied; otherwise a per-tile 2-98 pct stretch is used.
- orientedbool
If True, write OBB labels (8 corner coords). Default False.
- image_format{‘jpg’, ‘png’}
Tile image format. Default ‘jpg’.
- seedint
RNG seed for train/val split. Default 42.
- class_nameslist of str, optional
Override class ordering. If None, classes are taken from
labels[class_col]sorted alphabetically.
- Returns:
- dict
Summary with keys
out_dir,classes,n_train,n_val,n_boxes.