geowombat.ml package#
Submodules#
geowombat.ml.classifiers module#
- class geowombat.ml.classifiers.Classifiers[source]#
Bases:
ClassifiersMixin
- Attributes:
- classes_
Methods
add_categorical
(data, labels, col[, ...])Adds numeric categorical data to array based on polygon col values.
fit
(data, clf[, labels, col, targ_name, ...])Fits a classifier given class labels.
fit_predict
(data, clf[, labels, col, ...])Fits a classifier given class labels and predicts on a DataArray.
predict
(data, X, clf[, targ_name, ...])Fits a classifier given class labels and predicts on a DataArray.
- fit(data, clf, labels=None, col=None, targ_name='targ', targ_dim_name='sample')[source]#
Fits a classifier given class labels.
- Parameters:
data (DataArray) – The data to predict on.
clf (object) – The classifier or classification pipeline.
labels (Optional[str | Path | GeoDataFrame]) – Class labels as polygon geometry.
col (Optional[str]) – The column in
labels
you want to assign values from. IfNone
, creates a binary raster.targ_name (Optional[str]) – The target name.
targ_dim_name (Optional[str]) – The target coordinate name.
- Returns:
Original DataArray augmented to accept prediction dimension Xna if unsupervised classifier: tuple(xarray.DataArray, sklearn_xarray.Target): X:Reshaped feature data without NAs removed, y:None Xna if supervised classifier: tuple(xarray.DataArray, sklearn_xarray.Target): X:Reshaped feature data with NAs removed, y:Array holding target data clf, (sklearn pipeline): Fitted pipeline object
- Return type:
X (xarray.DataArray)
Example
>>> import geowombat as gw >>> from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons >>> from geowombat.ml import fit >>> >>> import geopandas as gpd >>> from sklearn_xarray.preprocessing import Featurizer >>> from sklearn.pipeline import Pipeline >>> from sklearn.preprocessing import StandardScaler, LabelEncoder >>> from sklearn.decomposition import PCA >>> from sklearn.naive_bayes import GaussianNB >>> >>> le = LabelEncoder() >>> >>> labels = gpd.read_file(l8_224078_20200518_polygons) >>> labels['lc'] = le.fit(labels.name).transform(labels.name) >>> >>> # Use supervised classification pipeline >>> pl = Pipeline([('scaler', StandardScaler()), >>> ('pca', PCA()), >>> ('clf', GaussianNB())]) >>> >>> with gw.open(l8_224078_20200518) as src: >>> X, Xy, clf = fit(src, pl, labels, col='lc')
>>> # Fit an unsupervised classifier >>> cl = Pipeline([('pca', PCA()), >>> ('cst', KMeans()))]) >>> with gw.open(l8_224078_20200518) as src: >>> X, Xy, clf = fit(src, cl)
- fit_predict(data, clf, labels=None, col=None, targ_name='targ', targ_dim_name='sample', mask_nodataval=True)[source]#
Fits a classifier given class labels and predicts on a DataArray.
- Parameters:
data (DataArray) – The data to predict on.
clf (object) – The classifier or classification pipeline.
labels (optional[str | Path | GeoDataFrame]) – Class labels as polygon geometry.
col (Optional[str]) – The column in
labels
you want to assign values from. IfNone
, creates a binary raster.targ_name (Optional[str]) – The target name.
targ_dim_name (Optional[str]) – The target coordinate name.
mask_nodataval (Optional[Bool]) – If true, data.attrs[“nodatavals”][0] are replaced with np.nan and the array is returned as type float
- Returns:
Predictions shaped (‘time’ x ‘band’ x ‘y’ x ‘x’)
- Return type:
xarray.DataArray
Example
>>> import geowombat as gw >>> from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons >>> from geowombat.ml import fit_predict >>> >>> import geopandas as gpd >>> from sklearn_xarray.preprocessing import Featurizer >>> from sklearn.pipeline import Pipeline >>> from sklearn.preprocessing import StandardScaler, LabelEncoder >>> from sklearn.decomposition import PCA >>> from sklearn.naive_bayes import GaussianNB >>> from sklearn.cluster import KMeans >>> >>> le = LabelEncoder() >>> >>> labels = gpd.read_file(l8_224078_20200518_polygons) >>> labels['lc'] = le.fit(labels.name).transform(labels.name) >>> >>> # Use a supervised classification pipeline >>> pl = Pipeline([('scaler', StandardScaler()), >>> ('pca', PCA()), >>> ('clf', GaussianNB()))]) >>> >>> with gw.open(l8_224078_20200518, nodata=0) as src: >>> y = fit_predict(src, pl, labels, col='lc') >>> y.isel(time=0).sel(band='targ').gw.imshow() >>> >>> with gw.open([l8_224078_20200518,l8_224078_20200518], nodata=0) as src: >>> y = fit_predict(src, pl, labels, col='lc') >>> y.isel(time=1).sel(band='targ').gw.imshow() >>> >>> # Use an unsupervised classification pipeline >>> cl = Pipeline([('pca', PCA()), >>> ('cst', KMeans()))]) >>> with gw.open(l8_224078_20200518, nodata=0) as src: >>> y2 = fit_predict(src, cl)
- predict(data, X, clf, targ_name='targ', targ_dim_name='sample', mask_nodataval=True)[source]#
Fits a classifier given class labels and predicts on a DataArray.
- Parameters:
data (DataArray) – The data to predict on.
X (str | Path | DataArray) – Data array generated by geowombat.ml.fit
clf (object) – The classifier or classification pipeline.
targ_name (Optional[str]) – The target name.
targ_dim_name (Optional[str]) – The target coordinate name.
mask_nodataval (Optional[Bool]) – If true, data.attrs[“nodatavals”][0] are replaced with np.nan and the array is returned as type float
- Returns:
Predictions shaped (‘time’ x ‘band’ x ‘y’ x ‘x’)
- Return type:
xarray.DataArray
Example
>>> import geowombat as gw >>> from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons >>> from geowombat.ml import fit, predict >>> import geopandas as gpd >>> from sklearn_xarray.preprocessing import Featurizer >>> from sklearn.pipeline import Pipeline >>> from sklearn.preprocessing import LabelEncoder, StandardScaler >>> from sklearn.decomposition import PCA >>> from sklearn.naive_bayes import GaussianNB
>>> le = LabelEncoder() >>> labels = gpd.read_file(l8_224078_20200518_polygons) >>> labels["lc"] = le.fit(labels.name).transform(labels.name)
>>> # Use a data pipeline >>> pl = Pipeline([('scaler', StandardScaler()), >>> ('pca', PCA()), >>> ('clf', GaussianNB()))])
>>> # Fit and predict the classifier >>> with gw.config.update(ref_res=100): >>> with gw.open(l8_224078_20200518, nodata=0) as src: >>> X, Xy, clf = fit(src, pl, labels, col="lc") >>> y = predict(src, X, clf) >>> print(y)
>>> # Fit and predict an unsupervised classifier >>> cl = Pipeline([('pca', PCA()), >>> ('cst', KMeans()))]) >>> with gw.open(l8_224078_20200518) as src: >>> X, Xy, clf = fit(src, cl) >>> y1 = predict(src, X, clf)
- class geowombat.ml.classifiers.ClassifiersMixin[source]#
Bases:
object
- Attributes:
- classes_
Methods
add_categorical
(data, labels, col[, ...])Adds numeric categorical data to array based on polygon col values.
- static add_categorical(data, labels, col, variable_name='cat1')[source]#
Adds numeric categorical data to array based on polygon col values. For multiple time periods, multiple copies are made, one for each time period.
- Parameters:
data (xarray.DataArray) –
labels (Path or GeoDataFrame) – The labels with categorical data.
col (Optional[str]) – The column in
labels
you want to assign values from. IfNone
, creates a binary raster.variable_name (Optional[str]) – The name assigned to the categorical data.
Example
>>> from geowombat.ml.classifiers import Classifiers >>> >>> gwclf = Classifiers() >>> >>> climatecluster = ' ./ClusterEco15_Y5.shp' >>> >>> time_names = [str(x) for x in range(len(vrts))] >>> >>> with gw.open(vrts, time_names=time_names) as src: >>> src.attrs['filename'] = vrts >>> cats = gwclf.add_categorical(src, climatecluster, col='ClusterN_2', variable_name='clim_clust')
- property classes_#
- le = LabelEncoder()#
geowombat.ml.transformers module#
Created on Mon Aug 10 13:41:40 2020 adapted from sklearn-xarray/preprocessing @author: mmann1123
- class geowombat.ml.transformers.BaseTransformer[source]#
Bases:
BaseEstimator
,TransformerMixin
Base class for transformers.
Methods
fit
(X[, y])Fit estimator to data. Parameters ---------- X : xarray DataArray or Dataset Training set. y : xarray DataArray or Dataset Target values. Returns ------- self: The estimator itself.
fit_transform
(X[, y])Fit to data, then transform it.
get_params
([deep])Get parameters for this estimator.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform input data. Parameters ---------- X : xarray DataArray or Dataset The input data. Returns ------- Xt : xarray DataArray or Dataset The transformed data.
- fit(X, y=None, **fit_params)[source]#
Fit estimator to data. Parameters ———- X : xarray DataArray or Dataset
Training set.
- yxarray DataArray or Dataset
Target values.
Returns#
- self:
The estimator itself.
- fit_transform(X, y=None, **fit_params)#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- class geowombat.ml.transformers.Featurizer_GW(sample_dim='sample', feature_dim='feature', var_name='Features', order=None, return_array=False, groupby=None, group_dim='sample')[source]#
Bases:
BaseTransformer
Stack all dimensions and variables except for sample dimension.
- Parameters:
- sample_dimstr, list, tuple
Name of the dimension used to define how the data is sampled. For instance, an individual’s activity recorded over time would be sampled based on the dimension time.
If your sample dim has multiple dimensions, for instance x,y,time these can be passed as a list or tuple. Before stacking, a new multiindex z will be created for these dimensions.
- feature_dimstr
Name of the feature dimension created to store the stacked data.
- var_namestr
Name of the new variable (for Datasets).
- orderlist or tuple
Order of dimension stacking.
- return_array: bool
Whether to return a DataArray when a Dataset was passed.
- groupbystr or list, optional
Name of coordinate or list of coordinates by which the groups are determined.
- group_dimstr, optional
Name of dimension along which the groups are indexed.
Methods
fit
(X[, y])Fit estimator to data. Parameters ---------- X : xarray DataArray or Dataset Training set. y : xarray DataArray or Dataset Target values. Returns ------- self: The estimator itself.
fit_transform
(X[, y])Fit to data, then transform it.
get_params
([deep])Get parameters for this estimator.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform input data. Parameters ---------- X : xarray DataArray or Dataset The input data. Returns ------- Xt : xarray DataArray or Dataset The transformed data.
- fit_transform(X, y=None, **fit_params)#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- class geowombat.ml.transformers.Stackerizer(stack_dims=None, direction='stack', sample_dim='sample', transposed=True, groupby=None)[source]#
Bases:
BaseTransformer
- Transformer to handle higher dimensional data, for instance data
sampled in time and location (‘x’,’y’,’time’), that must be stacked before running Featurizer, and unstacked after prediction.
- Parameters:
- sample_dimlist, tuple
List (tuple) of the dimensions used to define how the data is sampled.
If your sample dim has multiple dimensions, for instance x,y,time these can be passed as a list or tuple. Before stacking, a new multiindex ‘sample’ will be created for these dimensions.
- directionstr, optional
“stack” or “unstack” defines the direction of transformation. Default is “stack”
- sample_dimstr
Name of multiindex used to stack sample dims. Defaults to “sample”
- transposedbool
Should the output be transposed after stacking. Default is True.
- Returns:
- Xtxarray DataArray or Dataset
The transformed data.
Methods
fit
(X[, y])Fit estimator to data. Parameters ---------- X : xarray DataArray or Dataset Training set. y : xarray DataArray or Dataset Target values. Returns ------- self: The estimator itself.
fit_transform
(X[, y])Fit to data, then transform it.
get_params
([deep])Get parameters for this estimator.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform input data. Parameters ---------- X : xarray DataArray or Dataset The input data. Returns ------- Xt : xarray DataArray or Dataset The transformed data.
- fit_transform(X, y=None, **fit_params)#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- geowombat.ml.transformers.featurize_gw(X, return_estimator=False, **fit_params)[source]#
Stacks all dimensions and variables except for sample dimension.
- Parameters:
- Xxarray DataArray or Dataset
The input data.
- return_estimatorbool
Whether to return the fitted estimator along with the transformed data.
- Returns:
- Xtxarray DataArray or Dataset
The transformed data.
- geowombat.ml.transformers.is_dataarray(X, require_attrs=None)[source]#
Check whether an object is a DataArray.
- Parameters:
- Xanything
The object to be checked.
- require_attrslist of str, optional
The attributes the object has to have in order to pass as a DataArray.
- Returns:
- bool
Whether the object is a DataArray or not.
- geowombat.ml.transformers.is_dataset(X, require_attrs=None)[source]#
Check whether an object is a Dataset. Parameters ———- X : anything
The object to be checked.
- require_attrslist of str, optional
The attributes the object has to have in order to pass as a Dataset.
Returns#
- bool
Whether the object is a Dataset or not.
- geowombat.ml.transformers.stackerizer(X, return_estimator=False, **fit_params)[source]#
Stacks all dimensions and variables except for sample dimension.
- Parameters:
- Xxarray DataArray or Dataset””
The input data.
- return_estimatorbool
Whether to return the fitted estimator along with the transformed data.
- Returns:
- Xtxarray DataArray or Dataset
The transformed data.
Module contents#
- geowombat.ml.fit(data, clf, labels=None, col=None, targ_name='targ', targ_dim_name='sample')#
Fits a classifier given class labels.
- Parameters:
data (DataArray) – The data to predict on.
clf (object) – The classifier or classification pipeline.
labels (Optional[str | Path | GeoDataFrame]) – Class labels as polygon geometry.
col (Optional[str]) – The column in
labels
you want to assign values from. IfNone
, creates a binary raster.targ_name (Optional[str]) – The target name.
targ_dim_name (Optional[str]) – The target coordinate name.
- Returns:
Original DataArray augmented to accept prediction dimension Xna if unsupervised classifier: tuple(xarray.DataArray, sklearn_xarray.Target): X:Reshaped feature data without NAs removed, y:None Xna if supervised classifier: tuple(xarray.DataArray, sklearn_xarray.Target): X:Reshaped feature data with NAs removed, y:Array holding target data clf, (sklearn pipeline): Fitted pipeline object
- Return type:
X (xarray.DataArray)
Example
>>> import geowombat as gw >>> from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons >>> from geowombat.ml import fit >>> >>> import geopandas as gpd >>> from sklearn_xarray.preprocessing import Featurizer >>> from sklearn.pipeline import Pipeline >>> from sklearn.preprocessing import StandardScaler, LabelEncoder >>> from sklearn.decomposition import PCA >>> from sklearn.naive_bayes import GaussianNB >>> >>> le = LabelEncoder() >>> >>> labels = gpd.read_file(l8_224078_20200518_polygons) >>> labels['lc'] = le.fit(labels.name).transform(labels.name) >>> >>> # Use supervised classification pipeline >>> pl = Pipeline([('scaler', StandardScaler()), >>> ('pca', PCA()), >>> ('clf', GaussianNB())]) >>> >>> with gw.open(l8_224078_20200518) as src: >>> X, Xy, clf = fit(src, pl, labels, col='lc')
>>> # Fit an unsupervised classifier >>> cl = Pipeline([('pca', PCA()), >>> ('cst', KMeans()))]) >>> with gw.open(l8_224078_20200518) as src: >>> X, Xy, clf = fit(src, cl)
- geowombat.ml.fit_predict(data, clf, labels=None, col=None, targ_name='targ', targ_dim_name='sample', mask_nodataval=True)#
Fits a classifier given class labels and predicts on a DataArray.
- Parameters:
data (DataArray) – The data to predict on.
clf (object) – The classifier or classification pipeline.
labels (optional[str | Path | GeoDataFrame]) – Class labels as polygon geometry.
col (Optional[str]) – The column in
labels
you want to assign values from. IfNone
, creates a binary raster.targ_name (Optional[str]) – The target name.
targ_dim_name (Optional[str]) – The target coordinate name.
mask_nodataval (Optional[Bool]) – If true, data.attrs[“nodatavals”][0] are replaced with np.nan and the array is returned as type float
- Returns:
Predictions shaped (‘time’ x ‘band’ x ‘y’ x ‘x’)
- Return type:
xarray.DataArray
Example
>>> import geowombat as gw >>> from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons >>> from geowombat.ml import fit_predict >>> >>> import geopandas as gpd >>> from sklearn_xarray.preprocessing import Featurizer >>> from sklearn.pipeline import Pipeline >>> from sklearn.preprocessing import StandardScaler, LabelEncoder >>> from sklearn.decomposition import PCA >>> from sklearn.naive_bayes import GaussianNB >>> from sklearn.cluster import KMeans >>> >>> le = LabelEncoder() >>> >>> labels = gpd.read_file(l8_224078_20200518_polygons) >>> labels['lc'] = le.fit(labels.name).transform(labels.name) >>> >>> # Use a supervised classification pipeline >>> pl = Pipeline([('scaler', StandardScaler()), >>> ('pca', PCA()), >>> ('clf', GaussianNB()))]) >>> >>> with gw.open(l8_224078_20200518, nodata=0) as src: >>> y = fit_predict(src, pl, labels, col='lc') >>> y.isel(time=0).sel(band='targ').gw.imshow() >>> >>> with gw.open([l8_224078_20200518,l8_224078_20200518], nodata=0) as src: >>> y = fit_predict(src, pl, labels, col='lc') >>> y.isel(time=1).sel(band='targ').gw.imshow() >>> >>> # Use an unsupervised classification pipeline >>> cl = Pipeline([('pca', PCA()), >>> ('cst', KMeans()))]) >>> with gw.open(l8_224078_20200518, nodata=0) as src: >>> y2 = fit_predict(src, cl)
- geowombat.ml.predict(data, X, clf, targ_name='targ', targ_dim_name='sample', mask_nodataval=True)#
Fits a classifier given class labels and predicts on a DataArray.
- Parameters:
data (DataArray) – The data to predict on.
X (str | Path | DataArray) – Data array generated by geowombat.ml.fit
clf (object) – The classifier or classification pipeline.
targ_name (Optional[str]) – The target name.
targ_dim_name (Optional[str]) – The target coordinate name.
mask_nodataval (Optional[Bool]) – If true, data.attrs[“nodatavals”][0] are replaced with np.nan and the array is returned as type float
- Returns:
Predictions shaped (‘time’ x ‘band’ x ‘y’ x ‘x’)
- Return type:
xarray.DataArray
Example
>>> import geowombat as gw >>> from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons >>> from geowombat.ml import fit, predict >>> import geopandas as gpd >>> from sklearn_xarray.preprocessing import Featurizer >>> from sklearn.pipeline import Pipeline >>> from sklearn.preprocessing import LabelEncoder, StandardScaler >>> from sklearn.decomposition import PCA >>> from sklearn.naive_bayes import GaussianNB
>>> le = LabelEncoder() >>> labels = gpd.read_file(l8_224078_20200518_polygons) >>> labels["lc"] = le.fit(labels.name).transform(labels.name)
>>> # Use a data pipeline >>> pl = Pipeline([('scaler', StandardScaler()), >>> ('pca', PCA()), >>> ('clf', GaussianNB()))])
>>> # Fit and predict the classifier >>> with gw.config.update(ref_res=100): >>> with gw.open(l8_224078_20200518, nodata=0) as src: >>> X, Xy, clf = fit(src, pl, labels, col="lc") >>> y = predict(src, X, clf) >>> print(y)
>>> # Fit and predict an unsupervised classifier >>> cl = Pipeline([('pca', PCA()), >>> ('cst', KMeans()))]) >>> with gw.open(l8_224078_20200518) as src: >>> X, Xy, clf = fit(src, cl) >>> y1 = predict(src, X, clf)