SelectFromModel#

class sklearn.feature_selection.SelectFromModel(estimator, *, threshold=None, prefit=False, norm_order=1, max_features=None, importance_getter='auto')[source]#

Meta-transformer for selecting features based on importance weights.

Added in version 0.17.

Read more in the User Guide.

Parameters:

estimatorobject

The base estimator from which the transformer is built. This can be both a fitted (if prefit is set to True) or a non-fitted estimator. The estimator should have a feature_importances_ or coef_ attribute after fitting. Otherwise, the importance_getter parameter should be used.

thresholdstr or float, default=None

The threshold value to use for feature selection. Features whose absolute importance value is greater or equal are kept while the others are discarded. If “median” (resp. “mean”), then the threshold value is the median (resp. the mean) of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. If None and if the estimator has a parameter penalty set to l1, either explicitly or implicitly (e.g, Lasso), the threshold used is 1e-5. Otherwise, “mean” is used by default.

prefitbool, default=False

Whether a prefit model is expected to be passed into the constructor directly or not. If True, estimator must be a fitted estimator. If False, estimator is fitted and updated by calling fit and partial_fit, respectively.

norm_ordernon-zero int, inf, -inf, default=1

Order of the norm used to filter the vectors of coefficients below threshold in the case where the coef_ attribute of the estimator is of dimension 2.

max_featuresint, callable, default=None

The maximum number of features to select.

If an integer, then it specifies the maximum number of features to allow.
If a callable, then it specifies how to calculate the maximum number of features allowed by using the output of max_features(X).
If None, then all features are kept.

To only select based on max_features, set threshold=-np.inf.

Added in version 0.20.

Changed in version 1.1: max_features accepts a callable.

importance_getterstr or callable, default=’auto’

If ‘auto’, uses the feature importance either through a coef_ attribute or feature_importances_ attribute of estimator.

Also accepts a string that specifies an attribute name/path for extracting feature importance (implemented with attrgetter). For example, give regressor_.coef_ in case of TransformedTargetRegressor or named_steps.clf.feature_importances_ in case of Pipeline with its last step named clf.

If callable, overrides the default feature importance getter. The callable is passed with the fitted estimator and it should return importance for each feature.

Added in version 0.24.

Attributes:

estimator_estimator

The base estimator from which the transformer is built. This attribute exist only when fit has been called.

If prefit=True, it is a deep copy of estimator.
If prefit=False, it is a clone of estimator and fit on the data passed to fit or partial_fit.

n_features_in_int

Number of features seen during fit.

max_features_int

Maximum number of features calculated during fit. Only defined if the max_features is not None.

If max_features is an int, then max_features_ = max_features.
If max_features is a callable, then max_features_ = max_features(X).

Added in version 1.1.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

Added in version 1.0.

threshold_float

Threshold value used for feature selection.

See also

RFE: Recursive feature elimination based on importance weights.
RFECV: Recursive feature elimination with built-in cross-validated selection of the best number of features.
SequentialFeatureSelector: Sequential cross-validation based feature selection. Does not rely on importance weights.

Notes

Allows NaN/Inf in the input if the underlying estimator does as well.

Examples

>>> from sklearn.feature_selection import SelectFromModel
>>> from sklearn.linear_model import LogisticRegression
>>> X = [[ 0.87, -1.34,  0.31 ],
...      [-2.79, -0.02, -0.85 ],
...      [-1.34, -0.48, -2.55 ],
...      [ 1.92,  1.48,  0.65 ]]
>>> y = [0, 1, 0, 1]
>>> selector = SelectFromModel(estimator=LogisticRegression()).fit(X, y)
>>> selector.estimator_.coef_
array([[-0.3252...,  0.8345...,  0.4976...]])
>>> selector.threshold_
0.55249...
>>> selector.get_support()
array([False,  True, False])
>>> selector.transform(X)
array([[-1.34],
       [-0.02],
       [-0.48],
       [ 1.48]])

Using a callable to create a selector that can use no more than half of the input features.

>>> def half_callable(X):
...     return round(len(X[0]) / 2)
>>> half_selector = SelectFromModel(estimator=LogisticRegression(),
...                                 max_features=half_callable)
>>> _ = half_selector.fit(X, y)
>>> half_selector.max_features_
2

fit(X, y=None, **fit_params)[source]#

Fit the SelectFromModel meta-transformer.

Parameters:

Xarray-like of shape (n_samples, n_features)

The training input samples.

yarray-like of shape (n_samples,), default=None

The target values (integers that correspond to classes in classification, real numbers in regression).

**fit_paramsdict

If enable_metadata_routing=False (default):

Parameters directly passed to the fit method of the sub-estimator. They are ignored if prefit=True.
If enable_metadata_routing=True:

Parameters safely routed to the fit method of the sub-estimator. They are ignored if prefit=True.

Changed in version 1.4: See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

Xarray-like of shape (n_samples, n_features): Input samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None: Target values (None for unsupervised transformations).
**fit_paramsdict: Additional fit parameters.

Returns:

X_newndarray array of shape (n_samples, n_features_new): Transformed array.

get_feature_names_out(input_features=None)[source]#

Mask feature names according to selected features.

Parameters:

input_featuresarray-like of str or None, default=None

Input features.

If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: ["x0", "x1", ..., "x(n_features_in_ - 1)"].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns:

feature_names_outndarray of str objects: Transformed feature names.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Added in version 1.4.

Returns:

routingMetadataRouter: A MetadataRouter encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

get_support(indices=False)[source]#

Get a mask, or integer index, of the features selected.

Parameters:

indicesbool, default=False: If True, the return value will be an array of integers, rather than a boolean mask.

Returns:

supportarray: An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform(X)[source]#

Reverse the transformation operation.

Parameters:

Xarray of shape [n_samples, n_selected_features]: The input samples.

Returns:

X_rarray of shape [n_samples, n_original_features]: X with columns of zeros inserted where features would have been removed by transform.

property n_features_in_#: Number of features seen during fit.

partial_fit(X, y=None, **partial_fit_params)[source]#

Fit the SelectFromModel meta-transformer only once.

Parameters:

Xarray-like of shape (n_samples, n_features)

The training input samples.

yarray-like of shape (n_samples,), default=None

The target values (integers that correspond to classes in classification, real numbers in regression).

**partial_fit_paramsdict

If enable_metadata_routing=False (default):

Parameters directly passed to the partial_fit method of the sub-estimator.
If enable_metadata_routing=True:

Parameters passed to the partial_fit method of the sub-estimator. They are ignored if prefit=True.

Changed in version 1.4: **partial_fit_params are routed to the sub-estimator, if enable_metadata_routing=True is set via set_config, which allows for aliasing.

See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

"default": Default output format of a transformer
"pandas": DataFrame output
"polars": Polars output
None: Transform configuration is unchanged

Added in version 1.4: "polars" option was added.

Returns:

selfestimator instance: Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

property threshold_#: Threshold value used for feature selection.

transform(X)[source]#

Reduce X to the selected features.

Parameters:

Xarray of shape [n_samples, n_features]: The input samples.

Returns:

X_rarray of shape [n_samples, n_selected_features]: The input samples with only the selected features.

Gallery examples#

Model-based and sequential feature selection