geowombat.ml package#

Submodules#

geowombat.ml.classifiers module#

class geowombat.ml.classifiers.Classifiers[source]#

Bases: ClassifiersMixin

Attributes:
classes_

Methods

add_categorical(data, labels, col[, ...])

Adds numeric categorical data to array based on polygon col values.

fit(data, clf[, labels, col, targ_name, ...])

Fits a classifier given class labels.

fit_predict(data, clf[, labels, col, ...])

Fits a classifier given class labels and predicts on a DataArray.

predict(data, X, clf[, targ_name, ...])

Fits a classifier given class labels and predicts on a DataArray.

fit(data, clf, labels=None, col=None, targ_name='targ', targ_dim_name='sample')[source]#

Fits a classifier given class labels.

Parameters:
  • data (DataArray) – The data to predict on.

  • clf (object) – The classifier or classification pipeline.

  • labels (Optional[str | Path | GeoDataFrame]) – Class labels as polygon geometry.

  • col (Optional[str]) – The column in labels you want to assign values from. If None, creates a binary raster.

  • targ_name (Optional[str]) – The target name.

  • targ_dim_name (Optional[str]) – The target coordinate name.

Returns:

Original DataArray augmented to accept prediction dimension Xna if unsupervised classifier: tuple(xarray.DataArray, sklearn_xarray.Target): X:Reshaped feature data without NAs removed, y:None Xna if supervised classifier: tuple(xarray.DataArray, sklearn_xarray.Target): X:Reshaped feature data with NAs removed, y:Array holding target data clf, (sklearn pipeline): Fitted pipeline object

Return type:

X (xarray.DataArray)

Example

>>> import geowombat as gw
>>> from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
>>> from geowombat.ml import fit
>>>
>>> import geopandas as gpd
>>> from sklearn_xarray.preprocessing import Featurizer
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.preprocessing import StandardScaler, LabelEncoder
>>> from sklearn.decomposition import PCA
>>> from sklearn.naive_bayes import GaussianNB
>>>
>>> le = LabelEncoder()
>>>
>>> labels = gpd.read_file(l8_224078_20200518_polygons)
>>> labels['lc'] = le.fit(labels.name).transform(labels.name)
>>>
>>> # Use supervised classification pipeline
>>> pl = Pipeline([('scaler', StandardScaler()),
>>>                ('pca', PCA()),
>>>                ('clf', GaussianNB())])
>>>
>>> with gw.open(l8_224078_20200518) as src:
>>>   X, Xy, clf = fit(src, pl, labels, col='lc')
>>> # Fit an unsupervised classifier
>>> cl = Pipeline([('pca', PCA()),
>>>                ('cst', KMeans()))])
>>> with gw.open(l8_224078_20200518) as src:
>>>    X, Xy, clf = fit(src, cl)
fit_predict(data, clf, labels=None, col=None, targ_name='targ', targ_dim_name='sample', mask_nodataval=True)[source]#

Fits a classifier given class labels and predicts on a DataArray.

Parameters:
  • data (DataArray) – The data to predict on.

  • clf (object) – The classifier or classification pipeline.

  • labels (optional[str | Path | GeoDataFrame]) – Class labels as polygon geometry.

  • col (Optional[str]) – The column in labels you want to assign values from. If None, creates a binary raster.

  • targ_name (Optional[str]) – The target name.

  • targ_dim_name (Optional[str]) – The target coordinate name.

  • mask_nodataval (Optional[Bool]) – If true, data.attrs[“nodatavals”][0] are replaced with np.nan and the array is returned as type float

Returns:

Predictions shaped (‘time’ x ‘band’ x ‘y’ x ‘x’)

Return type:

xarray.DataArray

Example

>>> import geowombat as gw
>>> from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
>>> from geowombat.ml import fit_predict
>>>
>>> import geopandas as gpd
>>> from sklearn_xarray.preprocessing import Featurizer
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.preprocessing import StandardScaler, LabelEncoder
>>> from sklearn.decomposition import PCA
>>> from sklearn.naive_bayes import GaussianNB
>>> from sklearn.cluster import KMeans
>>>
>>> le = LabelEncoder()
>>>
>>> labels = gpd.read_file(l8_224078_20200518_polygons)
>>> labels['lc'] = le.fit(labels.name).transform(labels.name)
>>>
>>> # Use a supervised classification pipeline
>>> pl = Pipeline([('scaler', StandardScaler()),
>>>                ('pca', PCA()),
>>>                ('clf', GaussianNB()))])
>>>
>>> with gw.open(l8_224078_20200518, nodata=0) as src:
>>>     y = fit_predict(src, pl, labels, col='lc')
>>>     y.isel(time=0).sel(band='targ').gw.imshow()
>>>
>>> with gw.open([l8_224078_20200518,l8_224078_20200518], nodata=0) as src:
>>>     y = fit_predict(src, pl, labels, col='lc')
>>>     y.isel(time=1).sel(band='targ').gw.imshow()
>>>
>>> # Use an unsupervised classification pipeline
>>> cl = Pipeline([('pca', PCA()),
>>>                ('cst', KMeans()))])
>>> with gw.open(l8_224078_20200518, nodata=0) as src:
>>>     y2 = fit_predict(src, cl)
predict(data, X, clf, targ_name='targ', targ_dim_name='sample', mask_nodataval=True)[source]#

Fits a classifier given class labels and predicts on a DataArray.

Parameters:
  • data (DataArray) – The data to predict on.

  • X (str | Path | DataArray) – Data array generated by geowombat.ml.fit

  • clf (object) – The classifier or classification pipeline.

  • targ_name (Optional[str]) – The target name.

  • targ_dim_name (Optional[str]) – The target coordinate name.

  • mask_nodataval (Optional[Bool]) – If true, data.attrs[“nodatavals”][0] are replaced with np.nan and the array is returned as type float

Returns:

Predictions shaped (‘time’ x ‘band’ x ‘y’ x ‘x’)

Return type:

xarray.DataArray

Example

>>> import geowombat as gw
>>> from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
>>> from geowombat.ml import fit, predict
>>> import geopandas as gpd
>>> from sklearn_xarray.preprocessing import Featurizer
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.preprocessing import LabelEncoder, StandardScaler
>>> from sklearn.decomposition import PCA
>>> from sklearn.naive_bayes import GaussianNB
>>> le = LabelEncoder()
>>> labels = gpd.read_file(l8_224078_20200518_polygons)
>>> labels["lc"] = le.fit(labels.name).transform(labels.name)
>>> # Use a data pipeline
>>> pl = Pipeline([('scaler', StandardScaler()),
>>>                ('pca', PCA()),
>>>                ('clf', GaussianNB()))])
>>> # Fit and predict the classifier
>>> with gw.config.update(ref_res=100):
>>>     with gw.open(l8_224078_20200518, nodata=0) as src:
>>>         X, Xy, clf = fit(src, pl, labels, col="lc")
>>>         y = predict(src, X, clf)
>>>         print(y)
>>> # Fit and predict an unsupervised classifier
>>> cl = Pipeline([('pca', PCA()),
>>>                ('cst', KMeans()))])
>>> with gw.open(l8_224078_20200518) as src:
>>>    X, Xy, clf = fit(src, cl)
>>>    y1 = predict(src, X, clf)
class geowombat.ml.classifiers.ClassifiersMixin[source]#

Bases: object

Attributes:
classes_

Methods

add_categorical(data, labels, col[, ...])

Adds numeric categorical data to array based on polygon col values.

static add_categorical(data, labels, col, variable_name='cat1')[source]#

Adds numeric categorical data to array based on polygon col values. For multiple time periods, multiple copies are made, one for each time period.

Parameters:
  • data (xarray.DataArray) –

  • labels (Path or GeoDataFrame) – The labels with categorical data.

  • col (Optional[str]) – The column in labels you want to assign values from. If None, creates a binary raster.

  • variable_name (Optional[str]) – The name assigned to the categorical data.

Example

>>> from geowombat.ml.classifiers import Classifiers
>>>
>>> gwclf = Classifiers()
>>>
>>> climatecluster = ' ./ClusterEco15_Y5.shp'
>>>
>>> time_names = [str(x) for x in range(len(vrts))]
>>>
>>> with gw.open(vrts, time_names=time_names) as src:
>>>     src.attrs['filename'] = vrts
>>>     cats = gwclf.add_categorical(src, climatecluster, col='ClusterN_2', variable_name='clim_clust')
property classes_#
le = LabelEncoder()#
geowombat.ml.classifiers.wrapped_cls(cls)[source]#

geowombat.ml.transformers module#

Created on Mon Aug 10 13:41:40 2020 adapted from sklearn-xarray/preprocessing @author: mmann1123

class geowombat.ml.transformers.BaseTransformer[source]#

Bases: BaseEstimator, TransformerMixin

Base class for transformers.

Methods

fit(X[, y])

Fit estimator to data. Parameters ---------- X : xarray DataArray or Dataset Training set. y : xarray DataArray or Dataset Target values. Returns ------- self: The estimator itself.

fit_transform(X[, y])

Fit to data, then transform it.

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform input data. Parameters ---------- X : xarray DataArray or Dataset The input data. Returns ------- Xt : xarray DataArray or Dataset The transformed data.

fit(X, y=None, **fit_params)[source]#

Fit estimator to data. Parameters ———- X : xarray DataArray or Dataset

Training set.

yxarray DataArray or Dataset

Target values.

Returns#

self:

The estimator itself.

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

transform(X)[source]#

Transform input data. Parameters ———- X : xarray DataArray or Dataset

The input data.

Returns#

Xtxarray DataArray or Dataset

The transformed data.

class geowombat.ml.transformers.Featurizer_GW(sample_dim='sample', feature_dim='feature', var_name='Features', order=None, return_array=False, groupby=None, group_dim='sample')[source]#

Bases: BaseTransformer

Stack all dimensions and variables except for sample dimension.

Parameters:
sample_dimstr, list, tuple

Name of the dimension used to define how the data is sampled. For instance, an individual’s activity recorded over time would be sampled based on the dimension time.

If your sample dim has multiple dimensions, for instance x,y,time these can be passed as a list or tuple. Before stacking, a new multiindex z will be created for these dimensions.

feature_dimstr

Name of the feature dimension created to store the stacked data.

var_namestr

Name of the new variable (for Datasets).

orderlist or tuple

Order of dimension stacking.

return_array: bool

Whether to return a DataArray when a Dataset was passed.

groupbystr or list, optional

Name of coordinate or list of coordinates by which the groups are determined.

group_dimstr, optional

Name of dimension along which the groups are indexed.

Methods

fit(X[, y])

Fit estimator to data. Parameters ---------- X : xarray DataArray or Dataset Training set. y : xarray DataArray or Dataset Target values. Returns ------- self: The estimator itself.

fit_transform(X[, y])

Fit to data, then transform it.

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform input data. Parameters ---------- X : xarray DataArray or Dataset The input data. Returns ------- Xt : xarray DataArray or Dataset The transformed data.

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

transform(X)#

Transform input data. Parameters ———- X : xarray DataArray or Dataset

The input data.

Returns#

Xtxarray DataArray or Dataset

The transformed data.

class geowombat.ml.transformers.Stackerizer(stack_dims=None, direction='stack', sample_dim='sample', transposed=True, groupby=None)[source]#

Bases: BaseTransformer

Transformer to handle higher dimensional data, for instance data

sampled in time and location (‘x’,’y’,’time’), that must be stacked before running Featurizer, and unstacked after prediction.

Parameters:
sample_dimlist, tuple

List (tuple) of the dimensions used to define how the data is sampled.

If your sample dim has multiple dimensions, for instance x,y,time these can be passed as a list or tuple. Before stacking, a new multiindex ‘sample’ will be created for these dimensions.

directionstr, optional

“stack” or “unstack” defines the direction of transformation. Default is “stack”

sample_dimstr

Name of multiindex used to stack sample dims. Defaults to “sample”

transposedbool

Should the output be transposed after stacking. Default is True.

Returns:
Xtxarray DataArray or Dataset

The transformed data.

Methods

fit(X[, y])

Fit estimator to data. Parameters ---------- X : xarray DataArray or Dataset Training set. y : xarray DataArray or Dataset Target values. Returns ------- self: The estimator itself.

fit_transform(X[, y])

Fit to data, then transform it.

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform input data. Parameters ---------- X : xarray DataArray or Dataset The input data. Returns ------- Xt : xarray DataArray or Dataset The transformed data.

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

transform(X)#

Transform input data. Parameters ———- X : xarray DataArray or Dataset

The input data.

Returns#

Xtxarray DataArray or Dataset

The transformed data.

geowombat.ml.transformers.featurize_gw(X, return_estimator=False, **fit_params)[source]#

Stacks all dimensions and variables except for sample dimension.

Parameters:
Xxarray DataArray or Dataset

The input data.

return_estimatorbool

Whether to return the fitted estimator along with the transformed data.

Returns:
Xtxarray DataArray or Dataset

The transformed data.

geowombat.ml.transformers.is_dataarray(X, require_attrs=None)[source]#

Check whether an object is a DataArray.

Parameters:
Xanything

The object to be checked.

require_attrslist of str, optional

The attributes the object has to have in order to pass as a DataArray.

Returns:
bool

Whether the object is a DataArray or not.

geowombat.ml.transformers.is_dataset(X, require_attrs=None)[source]#

Check whether an object is a Dataset. Parameters ———- X : anything

The object to be checked.

require_attrslist of str, optional

The attributes the object has to have in order to pass as a Dataset.

Returns#

bool

Whether the object is a Dataset or not.

geowombat.ml.transformers.stackerizer(X, return_estimator=False, **fit_params)[source]#

Stacks all dimensions and variables except for sample dimension.

Parameters:
Xxarray DataArray or Dataset””

The input data.

return_estimatorbool

Whether to return the fitted estimator along with the transformed data.

Returns:
Xtxarray DataArray or Dataset

The transformed data.

Module contents#

geowombat.ml.fit(data, clf, labels=None, col=None, targ_name='targ', targ_dim_name='sample')#

Fits a classifier given class labels.

Parameters:
  • data (DataArray) – The data to predict on.

  • clf (object) – The classifier or classification pipeline.

  • labels (Optional[str | Path | GeoDataFrame]) – Class labels as polygon geometry.

  • col (Optional[str]) – The column in labels you want to assign values from. If None, creates a binary raster.

  • targ_name (Optional[str]) – The target name.

  • targ_dim_name (Optional[str]) – The target coordinate name.

Returns:

Original DataArray augmented to accept prediction dimension Xna if unsupervised classifier: tuple(xarray.DataArray, sklearn_xarray.Target): X:Reshaped feature data without NAs removed, y:None Xna if supervised classifier: tuple(xarray.DataArray, sklearn_xarray.Target): X:Reshaped feature data with NAs removed, y:Array holding target data clf, (sklearn pipeline): Fitted pipeline object

Return type:

X (xarray.DataArray)

Example

>>> import geowombat as gw
>>> from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
>>> from geowombat.ml import fit
>>>
>>> import geopandas as gpd
>>> from sklearn_xarray.preprocessing import Featurizer
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.preprocessing import StandardScaler, LabelEncoder
>>> from sklearn.decomposition import PCA
>>> from sklearn.naive_bayes import GaussianNB
>>>
>>> le = LabelEncoder()
>>>
>>> labels = gpd.read_file(l8_224078_20200518_polygons)
>>> labels['lc'] = le.fit(labels.name).transform(labels.name)
>>>
>>> # Use supervised classification pipeline
>>> pl = Pipeline([('scaler', StandardScaler()),
>>>                ('pca', PCA()),
>>>                ('clf', GaussianNB())])
>>>
>>> with gw.open(l8_224078_20200518) as src:
>>>   X, Xy, clf = fit(src, pl, labels, col='lc')
>>> # Fit an unsupervised classifier
>>> cl = Pipeline([('pca', PCA()),
>>>                ('cst', KMeans()))])
>>> with gw.open(l8_224078_20200518) as src:
>>>    X, Xy, clf = fit(src, cl)
geowombat.ml.fit_predict(data, clf, labels=None, col=None, targ_name='targ', targ_dim_name='sample', mask_nodataval=True)#

Fits a classifier given class labels and predicts on a DataArray.

Parameters:
  • data (DataArray) – The data to predict on.

  • clf (object) – The classifier or classification pipeline.

  • labels (optional[str | Path | GeoDataFrame]) – Class labels as polygon geometry.

  • col (Optional[str]) – The column in labels you want to assign values from. If None, creates a binary raster.

  • targ_name (Optional[str]) – The target name.

  • targ_dim_name (Optional[str]) – The target coordinate name.

  • mask_nodataval (Optional[Bool]) – If true, data.attrs[“nodatavals”][0] are replaced with np.nan and the array is returned as type float

Returns:

Predictions shaped (‘time’ x ‘band’ x ‘y’ x ‘x’)

Return type:

xarray.DataArray

Example

>>> import geowombat as gw
>>> from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
>>> from geowombat.ml import fit_predict
>>>
>>> import geopandas as gpd
>>> from sklearn_xarray.preprocessing import Featurizer
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.preprocessing import StandardScaler, LabelEncoder
>>> from sklearn.decomposition import PCA
>>> from sklearn.naive_bayes import GaussianNB
>>> from sklearn.cluster import KMeans
>>>
>>> le = LabelEncoder()
>>>
>>> labels = gpd.read_file(l8_224078_20200518_polygons)
>>> labels['lc'] = le.fit(labels.name).transform(labels.name)
>>>
>>> # Use a supervised classification pipeline
>>> pl = Pipeline([('scaler', StandardScaler()),
>>>                ('pca', PCA()),
>>>                ('clf', GaussianNB()))])
>>>
>>> with gw.open(l8_224078_20200518, nodata=0) as src:
>>>     y = fit_predict(src, pl, labels, col='lc')
>>>     y.isel(time=0).sel(band='targ').gw.imshow()
>>>
>>> with gw.open([l8_224078_20200518,l8_224078_20200518], nodata=0) as src:
>>>     y = fit_predict(src, pl, labels, col='lc')
>>>     y.isel(time=1).sel(band='targ').gw.imshow()
>>>
>>> # Use an unsupervised classification pipeline
>>> cl = Pipeline([('pca', PCA()),
>>>                ('cst', KMeans()))])
>>> with gw.open(l8_224078_20200518, nodata=0) as src:
>>>     y2 = fit_predict(src, cl)
geowombat.ml.predict(data, X, clf, targ_name='targ', targ_dim_name='sample', mask_nodataval=True)#

Fits a classifier given class labels and predicts on a DataArray.

Parameters:
  • data (DataArray) – The data to predict on.

  • X (str | Path | DataArray) – Data array generated by geowombat.ml.fit

  • clf (object) – The classifier or classification pipeline.

  • targ_name (Optional[str]) – The target name.

  • targ_dim_name (Optional[str]) – The target coordinate name.

  • mask_nodataval (Optional[Bool]) – If true, data.attrs[“nodatavals”][0] are replaced with np.nan and the array is returned as type float

Returns:

Predictions shaped (‘time’ x ‘band’ x ‘y’ x ‘x’)

Return type:

xarray.DataArray

Example

>>> import geowombat as gw
>>> from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
>>> from geowombat.ml import fit, predict
>>> import geopandas as gpd
>>> from sklearn_xarray.preprocessing import Featurizer
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.preprocessing import LabelEncoder, StandardScaler
>>> from sklearn.decomposition import PCA
>>> from sklearn.naive_bayes import GaussianNB
>>> le = LabelEncoder()
>>> labels = gpd.read_file(l8_224078_20200518_polygons)
>>> labels["lc"] = le.fit(labels.name).transform(labels.name)
>>> # Use a data pipeline
>>> pl = Pipeline([('scaler', StandardScaler()),
>>>                ('pca', PCA()),
>>>                ('clf', GaussianNB()))])
>>> # Fit and predict the classifier
>>> with gw.config.update(ref_res=100):
>>>     with gw.open(l8_224078_20200518, nodata=0) as src:
>>>         X, Xy, clf = fit(src, pl, labels, col="lc")
>>>         y = predict(src, X, clf)
>>>         print(y)
>>> # Fit and predict an unsupervised classifier
>>> cl = Pipeline([('pca', PCA()),
>>>                ('cst', KMeans()))])
>>> with gw.open(l8_224078_20200518) as src:
>>>    X, Xy, clf = fit(src, cl)
>>>    y1 = predict(src, X, clf)