Keras 處理 不平衡的資料的分類問題 imbalance data 或者 highly skewed data
處理不平衡的資料集的時候,可以使用對資料加權來提高數量較小類的被選中的概率,具體方式如下
fit(self, x, y, batch_size=32, nb_epoch=10, verbose=1, callbacks=[], validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None)
class_weight:字典,將不同的類別對映為不同的權值,該引數用來在訓練過程中調整損失函式(只能用於訓練)
sample_weight:權值的numpy array
具體使用可以如下:
設定不同累的權值,如下:類0,權值1;類1,權值50
cw = {0: 1, 1: 50}
訓練模型
model.fit(x_train, y_train,batch_size=batch_size,epochs=epochs,verbose=1,callbacks=cbks,validation_data=(x_test, y_test), shuffle=True,class_weight=cw)
如果僅僅是類不平衡,則使用class_weight,sample_weights則是類內樣本之間還不平衡的時候使用。
class_weight
affects the relative weight of each class in the calculation of the objective function.
sample_weights
, as the name suggests, allows further control of the relative weight of samples that belong to the same class
Class weights are useful when training on highly skewed data sets; for example, a classifier to detect fraudulent transactions.
Sample weights are useful when you don't have equal confidence in the samples in your batch. A common example is performing regression on measurements with variable uncertainty.