MissingIndicator#
- class sklearn.impute.MissingIndicator(*, missing_values=nan, features='missing-only', sparse='auto', error_on_new=True)[source]#
- Binary indicators for missing values. - Note that this component typically should not be used in a vanilla - Pipelineconsisting of transformers and a classifier, but rather could be added using a- FeatureUnionor- ColumnTransformer.- Read more in the User Guide. - Added in version 0.20. - Parameters:
- missing_valuesint, float, str, np.nan or None, default=np.nan
- The placeholder for the missing values. All occurrences of - missing_valueswill be imputed. For pandas’ dataframes with nullable integer dtypes with missing values,- missing_valuesshould be set to- np.nan, since- pd.NAwill be converted to- np.nan.
- features{‘missing-only’, ‘all’}, default=’missing-only’
- Whether the imputer mask should represent all or a subset of features. - If - 'missing-only'(default), the imputer mask will only represent features containing missing values during fit time.
- If - 'all', the imputer mask will represent all features.
 
- sparsebool or ‘auto’, default=’auto’
- Whether the imputer mask format should be sparse or dense. - If - 'auto'(default), the imputer mask will be of same type as input.
- If - True, the imputer mask will be a sparse matrix.
- If - False, the imputer mask will be a numpy array.
 
- error_on_newbool, default=True
- If - True,- transformwill raise an error when there are features with missing values that have no missing values in- fit. This is applicable only when- features='missing-only'.
 
- Attributes:
- features_ndarray of shape (n_missing_features,) or (n_features,)
- The features indices which will be returned when calling - transform. They are computed during- fit. If- features='all',- features_is equal to- range(n_features).
- n_features_in_int
- Number of features seen during fit. - Added in version 0.24. 
- feature_names_in_ndarray of shape (n_features_in_,)
- Names of features seen during fit. Defined only when - Xhas feature names that are all strings.- Added in version 1.0. 
 
 - See also - SimpleImputer
- Univariate imputation of missing values. 
- IterativeImputer
- Multivariate imputation of missing values. 
 - Examples - >>> import numpy as np >>> from sklearn.impute import MissingIndicator >>> X1 = np.array([[np.nan, 1, 3], ... [4, 0, np.nan], ... [8, 1, 0]]) >>> X2 = np.array([[5, 1, np.nan], ... [np.nan, 2, 3], ... [2, 4, 0]]) >>> indicator = MissingIndicator() >>> indicator.fit(X1) MissingIndicator() >>> X2_tr = indicator.transform(X2) >>> X2_tr array([[False, True], [ True, False], [False, False]]) - fit(X, y=None)[source]#
- Fit the transformer on - X.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
- Input data, where - n_samplesis the number of samples and- n_featuresis the number of features.
- yIgnored
- Not used, present for API consistency by convention. 
 
- Returns:
- selfobject
- Fitted estimator. 
 
 
 - fit_transform(X, y=None)[source]#
- Generate missing values indicator for - X.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
- The input data to complete. 
- yIgnored
- Not used, present for API consistency by convention. 
 
- Returns:
- Xt{ndarray, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_features_with_missing)
- The missing indicator for input data. The data type of - Xtwill be boolean.
 
 
 - get_feature_names_out(input_features=None)[source]#
- Get output feature names for transformation. - Parameters:
- input_featuresarray-like of str or None, default=None
- Input features. - If - input_featuresis- None, then- feature_names_in_is used as feature names in. If- feature_names_in_is not defined, then the following input feature names are generated:- ["x0", "x1", ..., "x(n_features_in_ - 1)"].
- If - input_featuresis an array-like, then- input_featuresmust match- feature_names_in_if- feature_names_in_is defined.
 
 
- Returns:
- feature_names_outndarray of str objects
- Transformed feature names. 
 
 
 - get_metadata_routing()[source]#
- Get metadata routing of this object. - Please check User Guide on how the routing mechanism works. - Returns:
- routingMetadataRequest
- A - MetadataRequestencapsulating routing information.
 
 
 - get_params(deep=True)[source]#
- Get parameters for this estimator. - Parameters:
- deepbool, default=True
- If True, will return the parameters for this estimator and contained subobjects that are estimators. 
 
- Returns:
- paramsdict
- Parameter names mapped to their values. 
 
 
 - set_output(*, transform=None)[source]#
- Set output container. - See Introducing the set_output API for an example on how to use the API. - Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
- Configure output of - transformand- fit_transform.- "default": Default output format of a transformer
- "pandas": DataFrame output
- "polars": Polars output
- None: Transform configuration is unchanged
 - Added in version 1.4: - "polars"option was added.
 
- Returns:
- selfestimator instance
- Estimator instance. 
 
 
 - set_params(**params)[source]#
- Set the parameters of this estimator. - The method works on simple estimators as well as on nested objects (such as - Pipeline). The latter have parameters of the form- <component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
- Estimator parameters. 
 
- Returns:
- selfestimator instance
- Estimator instance. 
 
 
 - transform(X)[source]#
- Generate missing values indicator for - X.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
- The input data to complete. 
 
- Returns:
- Xt{ndarray, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_features_with_missing)
- The missing indicator for input data. The data type of - Xtwill be boolean.
 
 
 
