fetch_kddcup99#
- sklearn.datasets.fetch_kddcup99(*, subset=None, data_home=None, shuffle=False, random_state=None, percent10=True, download_if_missing=True, return_X_y=False, as_frame=False, n_retries=3, delay=1.0)[source]#
- Load the kddcup99 dataset (classification). - Download it if necessary. - Classes - 23 - Samples total - 4898431 - Dimensionality - 41 - Features - discrete (int) or continuous (float) - Read more in the User Guide. - Added in version 0.18. - Parameters:
- subset{‘SA’, ‘SF’, ‘http’, ‘smtp’}, default=None
- To return the corresponding classical subsets of kddcup 99. If None, return the entire kddcup 99 dataset. 
- data_homestr or path-like, default=None
- Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders. - Added in version 0.19. 
- shufflebool, default=False
- Whether to shuffle dataset. 
- random_stateint, RandomState instance or None, default=None
- Determines random number generation for dataset shuffling and for selection of abnormal samples if - subset='SA'. Pass an int for reproducible output across multiple function calls. See Glossary.
- percent10bool, default=True
- Whether to load only 10 percent of the data. 
- download_if_missingbool, default=True
- If False, raise an OSError if the data is not locally available instead of trying to download the data from the source site. 
- return_X_ybool, default=False
- If True, returns - (data, target)instead of a Bunch object. See below for more information about the- dataand- targetobject.- Added in version 0.20. 
- as_framebool, default=False
- If - True, returns a pandas Dataframe for the- dataand- targetobjects in the- Bunchreturned object;- Bunchreturn object will also have a- framemember.- Added in version 0.24. 
- n_retriesint, default=3
- Number of retries when HTTP errors are encountered. - Added in version 1.5. 
- delayfloat, default=1.0
- Number of seconds between retries. - Added in version 1.5. 
 
- Returns:
- dataBunch
- Dictionary-like object, with the following attributes. - data{ndarray, dataframe} of shape (494021, 41)
- The data matrix to learn. If - as_frame=True,- datawill be a pandas DataFrame.
- target{ndarray, series} of shape (494021,)
- The regression target for each sample. If - as_frame=True,- targetwill be a pandas Series.
- framedataframe of shape (494021, 42)
- Only present when - as_frame=True. Contains- dataand- target.
- DESCRstr
- The full description of the dataset. 
- feature_nameslist
- The names of the dataset columns 
- target_names: list
- The names of the target columns 
 
- (data, target)tuple if return_X_yis True
- A tuple of two ndarray. The first containing a 2D array of shape (n_samples, n_features) with each row representing one sample and each column representing the features. The second ndarray of shape (n_samples,) containing the target samples. - Added in version 0.20. 
 
- data
 
 
    
  
  
