CategoricalNB#
- class sklearn.naive_bayes.CategoricalNB(*, alpha=1.0, force_alpha=True, fit_prior=True, class_prior=None, min_categories=None)[source]#
- Naive Bayes classifier for categorical features. - The categorical Naive Bayes classifier is suitable for classification with discrete features that are categorically distributed. The categories of each feature are drawn from a categorical distribution. - Read more in the User Guide. - Parameters:
- alphafloat, default=1.0
- Additive (Laplace/Lidstone) smoothing parameter (set alpha=0 and force_alpha=True, for no smoothing). 
- force_alphabool, default=True
- If False and alpha is less than 1e-10, it will set alpha to 1e-10. If True, alpha will remain unchanged. This may cause numerical errors if alpha is too close to 0. - Added in version 1.2. - Changed in version 1.4: The default value of - force_alphachanged to- True.
- fit_priorbool, default=True
- Whether to learn class prior probabilities or not. If false, a uniform prior will be used. 
- class_priorarray-like of shape (n_classes,), default=None
- Prior probabilities of the classes. If specified, the priors are not adjusted according to the data. 
- min_categoriesint or array-like of shape (n_features,), default=None
- Minimum number of categories per feature. - integer: Sets the minimum number of categories per feature to - n_categoriesfor each features.
- array-like: shape (n_features,) where - n_categories[i]holds the minimum number of categories for the ith column of the input.
- None (default): Determines the number of categories automatically from the training data. 
 - Added in version 0.24. 
 
- Attributes:
- category_count_list of arrays of shape (n_features,)
- Holds arrays of shape (n_classes, n_categories of respective feature) for each feature. Each array provides the number of samples encountered for each class and category of the specific feature. 
- class_count_ndarray of shape (n_classes,)
- Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided. 
- class_log_prior_ndarray of shape (n_classes,)
- Smoothed empirical log probability for each class. 
- classes_ndarray of shape (n_classes,)
- Class labels known to the classifier 
- feature_log_prob_list of arrays of shape (n_features,)
- Holds arrays of shape (n_classes, n_categories of respective feature) for each feature. Each array provides the empirical log probability of categories given the respective feature and class, - P(x_i|y).
- n_features_in_int
- Number of features seen during fit. - Added in version 0.24. 
- feature_names_in_ndarray of shape (n_features_in_,)
- Names of features seen during fit. Defined only when - Xhas feature names that are all strings.- Added in version 1.0. 
- n_categories_ndarray of shape (n_features,), dtype=np.int64
- Number of categories for each feature. This value is inferred from the data or set by the minimum number of categories. - Added in version 0.24. 
 
 - See also - BernoulliNB
- Naive Bayes classifier for multivariate Bernoulli models. 
- ComplementNB
- Complement Naive Bayes classifier. 
- GaussianNB
- Gaussian Naive Bayes. 
- MultinomialNB
- Naive Bayes classifier for multinomial models. 
 - Examples - >>> import numpy as np >>> rng = np.random.RandomState(1) >>> X = rng.randint(5, size=(6, 100)) >>> y = np.array([1, 2, 3, 4, 5, 6]) >>> from sklearn.naive_bayes import CategoricalNB >>> clf = CategoricalNB() >>> clf.fit(X, y) CategoricalNB() >>> print(clf.predict(X[2:3])) [3] - fit(X, y, sample_weight=None)[source]#
- Fit Naive Bayes classifier according to X, y. - Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
- Training vectors, where - n_samplesis the number of samples and- n_featuresis the number of features. Here, each feature of X is assumed to be from a different categorical distribution. It is further assumed that all categories of each feature are represented by the numbers 0, …, n - 1, where n refers to the total number of categories for the given feature. This can, for instance, be achieved with the help of OrdinalEncoder.
- yarray-like of shape (n_samples,)
- Target values. 
- sample_weightarray-like of shape (n_samples,), default=None
- Weights applied to individual samples (1. for unweighted). 
 
- Returns:
- selfobject
- Returns the instance itself. 
 
 
 - get_metadata_routing()[source]#
- Get metadata routing of this object. - Please check User Guide on how the routing mechanism works. - Returns:
- routingMetadataRequest
- A - MetadataRequestencapsulating routing information.
 
 
 - get_params(deep=True)[source]#
- Get parameters for this estimator. - Parameters:
- deepbool, default=True
- If True, will return the parameters for this estimator and contained subobjects that are estimators. 
 
- Returns:
- paramsdict
- Parameter names mapped to their values. 
 
 
 - partial_fit(X, y, classes=None, sample_weight=None)[source]#
- Incremental fit on a batch of samples. - This method is expected to be called several times consecutively on different chunks of a dataset so as to implement out-of-core or online learning. - This is especially useful when the whole dataset is too big to fit in memory at once. - This method has some performance overhead hence it is better to call partial_fit on chunks of data that are as large as possible (as long as fitting in the memory budget) to hide the overhead. - Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
- Training vectors, where - n_samplesis the number of samples and- n_featuresis the number of features. Here, each feature of X is assumed to be from a different categorical distribution. It is further assumed that all categories of each feature are represented by the numbers 0, …, n - 1, where n refers to the total number of categories for the given feature. This can, for instance, be achieved with the help of OrdinalEncoder.
- yarray-like of shape (n_samples,)
- Target values. 
- classesarray-like of shape (n_classes,), default=None
- List of all the classes that can possibly appear in the y vector. - Must be provided at the first call to partial_fit, can be omitted in subsequent calls. 
- sample_weightarray-like of shape (n_samples,), default=None
- Weights applied to individual samples (1. for unweighted). 
 
- Returns:
- selfobject
- Returns the instance itself. 
 
 
 - predict(X)[source]#
- Perform classification on an array of test vectors X. - Parameters:
- Xarray-like of shape (n_samples, n_features)
- The input samples. 
 
- Returns:
- Cndarray of shape (n_samples,)
- Predicted target values for X. 
 
 
 - predict_joint_log_proba(X)[source]#
- Return joint log probability estimates for the test vector X. - For each row x of X and class y, the joint log probability is given by - log P(x, y) = log P(y) + log P(x|y),where- log P(y)is the class prior probability and- log P(x|y)is the class-conditional probability.- Parameters:
- Xarray-like of shape (n_samples, n_features)
- The input samples. 
 
- Returns:
- Cndarray of shape (n_samples, n_classes)
- Returns the joint log-probability of the samples for each class in the model. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. 
 
 
 - predict_log_proba(X)[source]#
- Return log-probability estimates for the test vector X. - Parameters:
- Xarray-like of shape (n_samples, n_features)
- The input samples. 
 
- Returns:
- Carray-like of shape (n_samples, n_classes)
- Returns the log-probability of the samples for each class in the model. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. 
 
 
 - predict_proba(X)[source]#
- Return probability estimates for the test vector X. - Parameters:
- Xarray-like of shape (n_samples, n_features)
- The input samples. 
 
- Returns:
- Carray-like of shape (n_samples, n_classes)
- Returns the probability of the samples for each class in the model. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. 
 
 
 - score(X, y, sample_weight=None)[source]#
- Return the mean accuracy on the given test data and labels. - In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. - Parameters:
- Xarray-like of shape (n_samples, n_features)
- Test samples. 
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
- True labels for - X.
- sample_weightarray-like of shape (n_samples,), default=None
- Sample weights. 
 
- Returns:
- scorefloat
- Mean accuracy of - self.predict(X)w.r.t.- y.
 
 
 - set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CategoricalNB[source]#
- Request metadata passed to the - fitmethod.- Note that this method is only relevant if - enable_metadata_routing=True(see- sklearn.set_config). Please see User Guide on how the routing mechanism works.- The options for each parameter are: - True: metadata is requested, and passed to- fitif provided. The request is ignored if metadata is not provided.
- False: metadata is not requested and the meta-estimator will not pass it to- fit.
- None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
- str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
 - The default ( - sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.- Added in version 1.3. - Note - This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a - Pipeline. Otherwise it has no effect.- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
- Metadata routing for - sample_weightparameter in- fit.
 
- Returns:
- selfobject
- The updated object. 
 
 
 - set_params(**params)[source]#
- Set the parameters of this estimator. - The method works on simple estimators as well as on nested objects (such as - Pipeline). The latter have parameters of the form- <component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
- Estimator parameters. 
 
- Returns:
- selfestimator instance
- Estimator instance. 
 
 
 - set_partial_fit_request(*, classes: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') CategoricalNB[source]#
- Request metadata passed to the - partial_fitmethod.- Note that this method is only relevant if - enable_metadata_routing=True(see- sklearn.set_config). Please see User Guide on how the routing mechanism works.- The options for each parameter are: - True: metadata is requested, and passed to- partial_fitif provided. The request is ignored if metadata is not provided.
- False: metadata is not requested and the meta-estimator will not pass it to- partial_fit.
- None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
- str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
 - The default ( - sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.- Added in version 1.3. - Note - This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a - Pipeline. Otherwise it has no effect.- Parameters:
- classesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
- Metadata routing for - classesparameter in- partial_fit.
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
- Metadata routing for - sample_weightparameter in- partial_fit.
 
- Returns:
- selfobject
- The updated object. 
 
 
 - set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CategoricalNB[source]#
- Request metadata passed to the - scoremethod.- Note that this method is only relevant if - enable_metadata_routing=True(see- sklearn.set_config). Please see User Guide on how the routing mechanism works.- The options for each parameter are: - True: metadata is requested, and passed to- scoreif provided. The request is ignored if metadata is not provided.
- False: metadata is not requested and the meta-estimator will not pass it to- score.
- None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
- str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
 - The default ( - sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.- Added in version 1.3. - Note - This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a - Pipeline. Otherwise it has no effect.- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
- Metadata routing for - sample_weightparameter in- score.
 
- Returns:
- selfobject
- The updated object. 
 
 
 
