特徵選擇---SelectKBest

阿新 • • 發佈：2021-01-12

from sklearn.feature_selection import SelectKBest

http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html#sklearn.feature_selection.SelectKBest.set_params

 
class SelectKBest(_BaseFilter):
 
"""Select features according to the k highest scores.
 
 
 
Read more in the :ref:`User Guide <univariate_feature_selection>`.
 
 
 
Parameters
 
----------
 
score_func : callable
 
Function taking two arrays X and y, and returning a pair of arrays
 
(scores, pvalues) or a single array with scores.
 
Default is f_classif (see below "See also"). The default function only
 
works with classification tasks.
 
 
 
k : int or "all", optional, default=10
 
Number of top features to select.
 
The "all" option bypasses selection, for use in a parameter search.
 
 
 
Attributes
 
----------
 
scores_ : array-like, shape=(n_features,)
 
Scores of features.
 
 
 
pvalues_ : array-like, shape=(n_features,)
 
p-values of feature scores, None if `score_func` returned only scores.
 
 
 
Notes
 
-----
 
Ties between features with equal scores will be broken in an unspecified
 
way.
 
 
 
See also
 
--------
 
f_classif: ANOVA F-value between label/feature for classification tasks.
 
mutual_info_classif: Mutual information for a discrete target.
 
chi2: Chi-squared stats of non-negative features for classification tasks.
 
f_regression: F-value between label/feature for regression tasks.
 
mutual_info_regression: Mutual information for a continuous target.
 
SelectPercentile: Select features based on percentile of the highest scores.
 
SelectFpr: Select features based on a false positive rate test.
 
SelectFdr: Select features based on an estimated false discovery rate.
 
SelectFwe: Select features based on family-wise error rate.
 
GenericUnivariateSelect: Univariate feature selector with configurable mode.
 
 
"""

官網的一個例子（需要自己給出計算公式、和k值）

引數

1、score_func : callable，函式取兩個陣列X和y，返回一對陣列（scores, pvalues）或一個分數的陣列。預設函式為f_classif，預設函式只適用於分類函式。
2、k：int or "all", optional, default=10。所選擇的topK個特徵。“all”選項則繞過選擇，用於引數搜尋。

屬性

1、scores_ : array-like, shape=(n_features,)，特徵的得分
2、pvalues_ : array-like, shape=(n_features,)，特徵得分的p_value值，如果score_func只返回分數，則返回None。

score_func裡可選的公式

方法

1、fit(X,y)，在（X，y）上執行記分函式並得到適當的特徵。
2、fit_transform(X[, y])，擬合數據，然後轉換資料。
3、get_params([deep])，獲得此估計器的引數。
4、get_support([indices])，獲取所選特徵的掩碼或整數索引。
5、inverse_transform(X)，反向變換操作。
6、set_params(**params)，設定估計器的引數。
7、transform(X)，將X還原為所選特徵。

如何返回選擇特徵的名稱或者索引。其實在上面的方法中已經提了一下了，那就是get_support（）

之前的digit資料是不帶特徵名稱的，我選擇了帶特徵的波士頓房價資料，因為是迴歸資料，所以計算的評價指標也跟著變換了，f_regression，這裡需要先fit一下，才能使用get_support()。裡面的引數如果索引選擇True，

返回值就是feature的索引，可能想直接返回feature name在這裡不能這麼直接的呼叫了，但是在dataset裡面去對應一下應該很容易的。這裡我給出的K是5，選擇得分最高的前5個特徵，分別是第2,5,9,10,12個屬性。
如果裡面的引數選擇了False，返回值就是該特徵是否被選擇的Boolean值。

連結：https://www.jianshu.com/p/586ba8c96a3d

特徵選擇---SelectKBest

引數

屬性

score_func裡可選的公式

方法

特徵選擇---SelectKBest

特徵選擇

[翻譯]特徵選擇：比特徵本身重要麼？

【機器學習】scikit-learn中的特徵選擇小結

機器學習之特徵選擇（Feature Selection）

機器學習深度研究：特徵選擇中幾個重要的統計學概念

P12 資料的降維及特徵選擇

特徵選擇學習筆記

機器學習sklearn（十五）：特徵工程（六）特徵選擇（一）主成分分析PCA

機器學習sklearn（十六）：特徵工程（七）特徵選擇（二）卡方選擇（一）卡方檢驗

機器學習sklearn（十七）：特徵工程（八）特徵選擇（三）卡方選擇（二）卡方檢驗

機器學習sklearn（47）：特徵工程（十四）特徵選擇（五）Embedded嵌入法/Wrapper包裝法

SAPD：FSAF升級版，合理的損失值加權以及金字塔特徵選擇 | ECCV 2020

決策樹分裂時的特徵選擇

XGBoost特徵選擇

機器學習2.1-機器學習中的特徵選擇

Spark ML中的特徵選擇演算法

mRMR特徵選擇

機器學習—降維-特徵選擇6-1（過濾法）

機器學習—降維-特徵選擇6-2（包裝法）

特徵選擇---SelectKBest

引數

屬性

score_func裡可選的公式

方法

相關推薦