1. 程式人生 > >基於模因框架的包裝過濾特徵選擇演算法

基於模因框架的包裝過濾特徵選擇演算法

#引用

##LaTex

@ARTICLE{4067093, author={Z. Zhu and Y. S. Ong and M. Dash}, journal={IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)}, title={Wrapper ndash;Filter Feature Selection Algorithm Using a Memetic Framework}, year={2007}, volume={37}, number={1}, pages={70-76}, keywords={biology computing;genetic algorithms;learning (artificial intelligence);pattern classification;search problems;classification problem;feature selection algorithm;genetic algorithm;local search;memetic framework;microarray data set;wrapper filter;Acceleration;Classification algorithms;Computational efficiency;Filters;Genetic algorithms;Machine learning;Machine learning algorithms;Pattern recognition;Pervasive computing;Spatial databases;Chi-square;feature selection;filter;gain ratio;genetic algorithm (GA);hybrid GA (HGA);memetic algorithm (MA);relief;wrapper;Algorithms;Artificial Intelligence;Biomimetics;Computer Simulation;Models, Theoretical;Pattern Recognition, Automated;Software;Systems Theory}, doi={10.1109/TSMCB.2006.883267}, ISSN={1083-4419}, month={Feb},}

##Normal

Z. Zhu, Y. S. Ong and M. Dash, “Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework,” in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 37, no. 1, pp. 70-76, Feb. 2007. doi: 10.1109/TSMCB.2006.883267 keywords: {biology computing;genetic algorithms;learning (artificial intelligence);pattern classification;search problems;classification problem;feature selection algorithm;genetic algorithm;local search;memetic framework;microarray data set;wrapper filter;Acceleration;Classification algorithms;Computational efficiency;Filters;Genetic algorithms;Machine learning;Machine learning algorithms;Pattern recognition;Pervasive computing;Spatial databases;Chi-square;feature selection;filter;gain ratio;genetic algorithm (GA);hybrid GA (HGA);memetic algorithm (MA);relief;wrapper;Algorithms;Artificial Intelligence;Biomimetics;Computer Simulation;Models, Theoretical;Pattern Recognition, Automated;Software;Systems Theory}, URL:

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4067093&isnumber=4067063

#摘要

a novel hybrid wrapper and filter feature selection algorithm for a classification problem using a memetic framework

a filter ranking method genetic algorithm univariate feature ranking information

the University of California, Irvine repository and microarray data sets

classification accuracy, number of selected features, and computational efficiency.

memetic algorithm (MA) — balance between local search and genetic search to maximize search quality and efficiency

#主要內容

  1. filter methods
  2. wrapper methods

##wrapper–filter feature selection algorithm (WFFSA) using a memetic framework

這裡寫圖片描述

WFFSA

Lamarckian learning

local improvement Genetic operators

###A 編碼表示與初始化

這裡寫圖片描述

a chromosome is a binary string of length equal to the total number of features

randomly initialized

###B 目標函式

the classification accuracy

這裡寫圖片描述

ScS_c — the corresponding selected feature subset encoded in chromosome cc J(Sc)J \left( S_c \right) — criterion function

###C LS改進過程

domain knowledge and heuristics

filter ranking methods as memes or LS heuristics

three different filter ranking methods, namely:

  1. ReliefF;
  2. gain ratio;
  3. chi-square.

based on different criteria:

  1. Euclidean distance,
  2. information entropy,
  3. chi-square statistics

basic LS operators:

  1. “Add”: select a feature from Y using the linear ranking selection and move it to X.
  2. “Del”: select a feature from X using the linear ranking selection and move it to Y .

這裡寫圖片描述

The intensity of LS — the LS length ll and interval ww LS length ll — the maximum number of Del and Add operations in each LS — l2l^2 possible combinations of Add and Del operations interval ww — the ww elite chromosomes in the population

until a local optimum or an improvement is reached

  1. Improvement First Strategy: a random choice from the l2l^2 combinations. stops once an improvement is obtained either in terms of classification accuracy or a reduction in the number of selected features without deterioration in accuracy greater than εε. 這裡寫圖片描述
  2. Greedy Strategy: carries out all possible l2l^2 combinations — the best improved solution 這裡寫圖片描述
  3. Sequential Strategy: the Add operation searches for the most significant feature yy in YY in a sequential manner; the Del operation searches for the least significant feature x from X in a sequential manner
  4. Evolutionary Operators: 這裡寫圖片描述 這裡寫圖片描述
  5. Computational Complexity: The ranking of features based on the filter methods — linear time complexity — a one-time offline cost — negligible the computational cost of a single fitness evaluation — the basic unit of computational cost GA — O(pg)O(pg): pp — the size of population, gg — the number of search generations +improvement first strategy — O(pg+l2wg/2)O (pg + l^2wg/2) +the greedy strategy (l2w/pl^2w/p) — O(pg+l2wg)O (pg + l^2wg) +the sequential strategy ((2Yl)l/2(2|Y | − l)l/2 and (2X+l)l/22|X| + l)l/2 — Add and Del operations — nlwnlw) — O(pg+nlwg)O(pg + nlwg) KaTeX parse error: Unexpected character: '' at position 8: n \gg ̲ lKaTeX parse error: Unexpected character: '' at position 8: nlw \gg̲ l^2w > l^2w/2 — sequential LS strategy requires significantly more computations

##試驗

UCI data sets ALL/AML, Colon, NCI60, and SRBCT

population size — 3030 and 5050 or 100100 (microarray data sets) fitness function calls — 60006000 and 1000010000 or 2000020 000 (microarray data sets)

這裡寫圖片描述

the one nearest neighbor (1NN) classifier the leave-one-out cross validation (LOOCV)

這裡寫圖片描述 這裡寫圖片描述 這裡寫圖片描述 這裡寫圖片描述