1. 程式人生 > >mRMR特徵選擇演算法(feature_selection)的使用

mRMR特徵選擇演算法(feature_selection)的使用

源程式下載地址,本機電腦安裝java環境,具體環境安裝可自行百度,google.

用以實現用 mRMR 從特徵集中提取特徵的程式(python)

#inport neccesary bags

import csv#用來儲存csv檔案
import pandas as pd
import numpy as np
import re
import os#用來呼叫系統程式

#改變預設資料夾位置
os.chdir("XXX")
#input path name
datapath ="XXX"

#output path name
outputpath="XXX"
"""
    mrmr and svm
"""
#read csv data from path train_data = pd.read_csv(datapath, header=None, index_col=None) X = np.array(train_data) Y = list(map(lambda x: 1, xrange(len(train_data) // 2))) Y2 = list(map(lambda x: 0, xrange(len(train_data) // 2))) Y.extend(Y2) Y=np.array(Y) Y=Y.reshape(2260,1) #concatenate class and data
full_csv_with_class=np.concatenate([Y,X],axis=1) print full_csv_with_class #print the results of original csv data and final full data print "the shape of data:"+str(X.shape) print "the shape of data and class:"+str(full_csv_with_class.shape) #generating virtual headers columns=["class"] columns_numbers=np.arange(full_csv_with_class.shape[1
]-1) columns.extend(columns_numbers) # Write data into files csvFile2 = open(outputpath,'w') writer = csv.writer(csvFile2) m = len(full_csv_with_class) writer.writerow(columns) for i in range(m): writer.writerow(full_csv_with_class[i]) csvFile2.close()
[[ 1.     1.     1.    ...,  0.     1.     0.075]
 [ 1.     0.     0.    ...,  1.     1.     0.1  ]
 [ 1.     1.     0.    ...,  1.     0.     0.175]
 ..., 
 [ 0.     0.     0.    ...,  1.     1.     0.075]
 [ 0.     0.     0.    ...,  0.     1.     0.025]
 [ 0.     0.     0.    ...,  0.     1.     0.05 ]]
the shape of data:(2260, 200)
the shape of data and class:(2260, 201)
os.system("./mRMR/mrmr -i "+outputpath+" -n 200 >mRMR/output.mrmrout")
print "complete "
complete 
#讀取檔案

fn=open("mRMR/output.mrmrout",'r')
location_mark=0
final_set=[]
for line in fn.readlines():
    if line.strip() =="":
        location_mark=0
    if location_mark==1 and line.split()[1]!="Fea":
         final_set.append(int(line.split()[1]))
    if re.findall(r"mRMR",line) and re.findall(r"feature",line):
        location_mark=1
print final_set
[133, 135, 140, 130, 145, 110, 115, 105, 120, 125, 150, 102, 185, 190, 180, 195, 100, 160, 165, 155, 170, 175, 101, 5, 85, 95, 98, 90, 99, 200, 177, 33, 50, 14, 8, 149, 109, 94, 121, 134, 113, 84, 21, 156, 71, 31, 6, 59, 189, 158, 122, 176, 58, 46, 64, 188, 10, 1, 38, 184, 19, 138, 2, 159, 81, 181, 44, 199, 26, 63, 82, 45, 148, 114, 172, 183, 32, 7, 48, 131, 146, 163, 83, 39, 49, 171, 80, 132, 197, 77, 88, 56, 9, 157, 198, 75, 164, 147, 70, 76, 196, 27, 182, 25, 96, 127, 13, 57, 126, 65, 107, 34, 108, 60, 139, 69, 55, 89, 30, 35, 40, 106, 20, 15, 104, 97, 111, 18, 103, 41, 78, 116, 61, 192, 3, 43, 67, 23, 118, 191, 4, 11, 194, 119, 66, 17, 87, 137, 136, 167, 141, 53, 117, 154, 28, 86, 42, 151, 52, 74, 68, 193, 51, 22, 179, 153, 62, 186, 152, 169, 12, 161, 129, 112, 166, 93, 47, 79, 162, 128, 29, 16, 143, 36, 187, 168, 144, 73, 124, 91, 54, 174, 178, 24, 173, 37, 142, 72, 123, 92]
precision_copy=0
recall_copy=0
SN_copy=0
SP_copy=0
GM_copy=0
TP_copy=0
TN_copy=0
FP_copy=0
FN_copy=0
ACC_copy=0
F1_Score_copy=0
F_measure_copy=0
MCC_copy=0
pos_copy=0
neg_copy=0
y_pred_prob_copy=[]
y_pred_copy=[]

關鍵語句:
os.system("./mRMR/mrmr -i "+outputpath+" -n 200 >mRMR/output.mrmrout")
- ./mRMR/mrmr代表執行程式,也即最上面github裡面下載的
- -i outputpath代表輸出的csv地址,也即原始特診集合(一下會說明)
- -n 200代表選取200維度,一次從得分排列
- >mRMR/output.mrmrout代表輸出的檔案(檔案情況如下)
output.mrmrout

csv格式需要特別說明,分類的類別需要在第一行,同時必須要有columns的標籤(class一行必須有)
這裡寫圖片描述

    [133, 135, 140, 130, 145, 110, 115, 105, 120, 125, 150, 102, 185, 190, 180, 195, 100, 160, 165, 155, 170, 175, 101, 5, 85, 95, 98, 90, 99, 200, 177, 33, 50, 14, 8, 149, 109, 94, 121, 134, 113, 84, 21, 156, 71, 31, 6, 59, 189, 158, 122, 176, 58, 46, 64, 188, 10, 1, 38, 184, 19, 138, 2, 159, 81, 181, 44, 199, 26, 63, 82, 45, 148, 114, 172, 183, 32, 7, 48, 131, 146, 163, 83, 39, 49, 171, 80, 132, 197, 77, 88, 56, 9, 157, 198, 75, 164, 147, 70, 76, 196, 27, 182, 25, 96, 127, 13, 57, 126, 65, 107, 34, 108, 60, 139, 69, 55, 89, 30, 35, 40, 106, 20, 15, 104, 97, 111, 18, 103, 41, 78, 116, 61, 192, 3, 43, 67, 23, 118, 191, 4, 11, 194, 119, 66, 17, 87, 137, 136, 167, 141, 53, 117, 154, 28, 86, 42, 151, 52, 74, 68, 193, 51, 22, 179, 153, 62, 186, 152, 169, 12, 161, 129, 112, 166, 93, 47, 79, 162, 128, 29, 16, 143, 36, 187, 168, 144, 73, 124, 91, 54, 174, 178, 24, 173, 37, 142, 72, 123, 92]

這些數字是從mRMR/output.mrmrout裡面提取出來的特徵維度的排序
讀者可根據這些排序的維度逐漸提取以尋找最優的維度集合。

重申mrmr程式和特徵提取程式地址