1. 程式人生 > >常用的機器學習&資料探勘翻譯(轉)

常用的機器學習&資料探勘翻譯(轉)

Basis(基礎):

MSE(Mean Square Error 均方誤差),

LMS(LeastMean Square 最小均方),

LSM(Least Square Methods 最小二乘法),

MLE(MaximumLikelihood Estimation最大似然估計),

QP(Quadratic Programming 二次規劃),

CP(Conditional Probability條件概率),

JP(Joint Probability 聯合概率),

MP(Marginal Probability邊緣概率),

Bayesian Formula(貝葉斯公式),

L1 /L2Regularization(L1/L2正則,以及更多的,現在比較火的L2.5正則等)

GD(GradientDescent 梯度下降),

SGD(Stochastic Gradient Descent 隨機梯度下降),

Eigenvalue(特徵值),

Eigenvector(特徵向量),

QR-decomposition(QR分解),

Quantile (分位數),

Covariance(協方差矩陣)。

Common Distribution(常見分佈):

Discrete Distribution(離散型分佈):

BernoulliDistribution/Binomial(貝努利分佈/二項分佈),

Negative BinomialDistribution(負二項分佈),

MultinomialDistribution(多項式分佈),

Geometric Distribution(幾何分佈),

HypergeometricDistribution(超幾何分佈),

Poisson Distribution (泊松分佈)。

Continuous Distribution (連續型分佈):

UniformDistribution(均勻分佈),

Normal Distribution /Guassian Distribution(正態分佈/高斯分佈),

ExponentialDistribution(指數分佈),

Lognormal Distribution(對數正態分佈),

GammaDistribution(Gamma分佈),

Beta Distribution(Beta分佈),

Dirichlet Distribution(狄利克雷分佈),

Rayleigh Distribution(瑞利分佈),

Cauchy Distribution(柯西分佈),

Weibull Distribution (韋伯分佈)。

Three Sampling Distribution(三大抽樣分佈):

Chi-squareDistribution(卡方分佈),

t-distribution(t-distribution),

F-distribution(F-分佈)。

Data Pre-processing(資料預處理):

Missing Value Imputation(缺失值填充),

Discretization(離散化),Mapping(對映),

Normalization(歸一化/標準化)。

Sampling(取樣):

Simple Random Sampling(簡單隨機取樣),

OfflineSampling(離線等可能K取樣),

Online Sampling(線上等可能K取樣),

Ratio-based Sampling(等比例隨機取樣),

Acceptance-RejectionSampling(接受-拒絕取樣),

Importance Sampling(重要性取樣),

MCMC(MarkovChain Monte Carlo 馬爾科夫蒙特卡羅取樣演算法:Metropolis-Hasting& Gibbs)。

Clustering(聚類):

K-Means,

K-Mediods,

二分K-Means,

FK-Means,

Canopy,

Spectral-KMeans(譜聚類),

GMM-EM(混合高斯模型-期望最大化演算法解決),

K-Pototypes,CLARANS(基於劃分),

BIRCH(基於層次),

CURE(基於層次),

DBSCAN(基於密度),

CLIQUE(基於密度和基於網格)。

Classification&Regression(分類&迴歸):

LR(Linear Regression 線性迴歸),

LR(LogisticRegression邏輯迴歸),

SR(Softmax Regression 多分類邏輯迴歸),

GLM(GeneralizedLinear Model 廣義線性模型),

RR(Ridge Regression 嶺迴歸/L2正則最小二乘迴歸),

LASSO(Least Absolute Shrinkage andSelectionator Operator L1正則最小二乘迴歸),

RF(隨機森林),

DT(DecisionTree決策樹),

GBDT(Gradient BoostingDecision Tree 梯度下降決策樹),

CART(ClassificationAnd Regression Tree 分類迴歸樹),

KNN(K-Nearest Neighbor K近鄰),

SVM(Support VectorMachine),

KF(KernelFunction 核函式PolynomialKernel Function 多項式核函、

Guassian KernelFunction 高斯核函式/Radial BasisFunction RBF徑向基函式、

String KernelFunction 字串核函式)、

NB(Naive Bayes 樸素貝葉斯),BN(Bayesian Network/Bayesian Belief Network/ Belief Network 貝葉斯網路/貝葉斯信度網路/信念網路),

LDA(Linear Discriminant Analysis/FisherLinear Discriminant 線性判別分析/Fisher線性判別),

EL(Ensemble Learning整合學習Boosting,Bagging,Stacking),

AdaBoost(Adaptive Boosting 自適應增強),

MEM(MaximumEntropy Model最大熵模型)。

Effectiveness Evaluation(分類效果評估):

Confusion Matrix(混淆矩陣),

Precision(精確度),Recall(召回率),

Accuracy(準確率),F-score(F得分),

ROC Curve(ROC曲線),AUC(AUC面積),

LiftCurve(Lift曲線) ,KS Curve(KS曲線)。

PGM(Probabilistic Graphical Models概率圖模型):

BN(Bayesian Network/Bayesian Belief Network/ BeliefNetwork 貝葉斯網路/貝葉斯信度網路/信念網路),

MC(Markov Chain 馬爾科夫鏈),

HMM(HiddenMarkov Model 馬爾科夫模型),

MEMM(Maximum Entropy Markov Model 最大熵馬爾科夫模型),

CRF(ConditionalRandom Field 條件隨機場),

MRF(MarkovRandom Field 馬爾科夫隨機場)。

NN(Neural Network神經網路):

ANN(Artificial Neural Network 人工神經網路),

BP(Error BackPropagation 誤差反向傳播)。


Deep Learning(深度學習):

Auto-encoder(自動編碼器),

SAE(Stacked Auto-encoders堆疊自動編碼器,

Sparse Auto-encoders稀疏自動編碼器、

Denoising Auto-encoders去噪自動編碼器、

Contractive Auto-encoders 收縮自動編碼器),

RBM(RestrictedBoltzmann Machine 受限玻爾茲曼機),

DBN(Deep Belief Network 深度信念網路),

CNN(ConvolutionalNeural Network 卷積神經網路),

Word2Vec(詞向量學習模型)。

DimensionalityReduction(降維):

LDA LinearDiscriminant Analysis/Fisher Linear Discriminant 線性判別分析/Fisher線性判別,

PCA(Principal Component Analysis 主成分分析),

ICA(IndependentComponent Analysis 獨立成分分析),

SVD(Singular Value Decomposition 奇異值分解),

FA(FactorAnalysis 因子分析法)。

Text Mining(文字挖掘):

VSM(Vector Space Model向量空間模型),

Word2Vec(詞向量學習模型),

TF(Term Frequency詞頻),

TF-IDF(Term Frequency-Inverse DocumentFrequency 詞頻-逆向文件頻率),

MI(MutualInformation 互資訊),

ECE(Expected Cross Entropy 期望交叉熵),

QEMI(二次資訊熵),

IG(InformationGain 資訊增益),

IGR(Information Gain Ratio 資訊增益率),

Gini(基尼係數),

x2 Statistic(x2統計量),

TEW(TextEvidence Weight文字證據權),

OR(Odds Ratio 優勢率),

N-Gram Model,

LSA(Latent Semantic Analysis 潛在語義分析),

PLSA(ProbabilisticLatent Semantic Analysis 基於概率的潛在語義分析),

LDA(Latent DirichletAllocation 潛在狄利克雷模型)。

Association Mining(關聯挖掘):

Apriori,

FP-growth(Frequency Pattern Tree Growth 頻繁模式樹生長演算法),

AprioriAll,

Spade。

Recommendation Engine(推薦引擎):

DBR(Demographic-based Recommendation 基於人口統計學的推薦),

CBR(Context-basedRecommendation 基於內容的推薦),

CF(Collaborative Filtering協同過濾),

UCF(User-basedCollaborative Filtering Recommendation 基於使用者的協同過濾推薦),

ICF(Item-basedCollaborative Filtering Recommendation 基於專案的協同過濾推薦)。

Similarity Measure&Distance Measure(相似性與距離度量):

Euclidean Distance(歐式距離),

ManhattanDistance(曼哈頓距離),

Chebyshev Distance(切比雪夫距離),

MinkowskiDistance(閔可夫斯基距離),

Standardized Euclidean Distance(標準化歐氏距離),

MahalanobisDistance(馬氏距離),

Cos(Cosine 餘弦),

HammingDistance/Edit Distance(漢明距離/編輯距離),

JaccardDistance(傑卡德距離),

Correlation Coefficient Distance(相關係數距離),

InformationEntropy(資訊熵),

KL(Kullback-Leibler Divergence KL散度/Relative Entropy 相對熵)。

Optimization(最優化):

Non-constrainedOptimization(無約束優化):

Cyclic VariableMethods(變數輪換法),

Pattern Search Methods(模式搜尋法),

VariableSimplex Methods(可變單純形法),

Gradient Descent Methods(梯度下降法),

Newton Methods(牛頓法),

Quasi-NewtonMethods(擬牛頓法),

Conjugate Gradient Methods(共軛梯度法)。

ConstrainedOptimization(有約束優化):

Approximation Programming Methods(近似規劃法),

FeasibleDirection Methods(可行方向法),

Penalty Function Methods(罰函式法),

Multiplier Methods(乘子法)。

Heuristic Algorithm(啟發式演算法),

SA(SimulatedAnnealing,

模擬退火演算法),

GA(genetic algorithm遺傳演算法)。

Feature Selection(特徵選擇演算法):

Mutual Information(互資訊),

DocumentFrequence(文件頻率),

Information Gain(資訊增益),

Chi-squared Test(卡方檢驗),

Gini(基尼係數)。

Outlier Detection(異常點檢測演算法):

Statistic-based(基於統計),

Distance-based(基於距離),

Density-based(基於密度),

Clustering-based(基於聚類)。

Learning to Rank(基於學習的排序):

Pointwise:McRank;

Pairwise:RankingSVM,RankNet,Frank,RankBoost;

Listwise:AdaRank,SoftRank,LamdaMART。Basis(基礎):

MSE(Mean Square Error 均方誤差),

LMS(LeastMean Square 最小均方),

LSM(Least Square Methods 最小二乘法),

MLE(MaximumLikelihood Estimation最大似然估計),

QP(Quadratic Programming 二次規劃),

CP(Conditional Probability條件概率),

JP(Joint Probability 聯合概率),

MP(Marginal Probability邊緣概率),

Bayesian Formula(貝葉斯公式),

L1 /L2Regularization(L1/L2正則,

以及更多的,現在比較火的L2.5正則等)

GD(GradientDescent 梯度下降),

SGD(Stochastic Gradient Descent 隨機梯度下降),

Eigenvalue(特徵值),

Eigenvector(特徵向量),

QR-decomposition(QR分解),

Quantile (分位數),

Covariance(協方差矩陣)。

Common Distribution(常見分佈):

Discrete Distribution(離散型分佈):

BernoulliDistribution/Binomial(貝努利分佈/二項分佈),

Negative BinomialDistribution(負二項分佈),

MultinomialDistribution(多項式分佈),

Geometric Distribution(幾何分佈),

HypergeometricDistribution(超幾何分佈),

Poisson Distribution (泊松分佈)。

Continuous Distribution (連續型分佈):

UniformDistribution(均勻分佈),

Normal Distribution /Guassian Distribution(正態分佈/高斯分佈),

ExponentialDistribution(指數分佈),

Lognormal Distribution(對數正態分佈),

GammaDistribution(Gamma分佈),

Beta Distribution(Beta分佈),

Dirichlet Distribution(狄利克雷分佈),

Rayleigh Distribution(瑞利分佈),

Cauchy Distribution(柯西分佈),

Weibull Distribution (韋伯分佈)。

Three Sampling Distribution(三大抽樣分佈):

Chi-squareDistribution(卡方分佈),

t-distribution(t-distribution),

F-distribution(F-分佈)。

Data Pre-processing(資料預處理):

Missing Value Imputation(缺失值填充),

Discretization(離散化),Mapping(對映),

Normalization(歸一化/標準化)。

Sampling(取樣):

Simple Random Sampling(簡單隨機取樣),

OfflineSampling(離線等可能K取樣),

Online Sampling(線上等可能K取樣),

Ratio-based Sampling(等比例隨機取樣),

Acceptance-RejectionSampling(接受-拒絕取樣),

Importance Sampling(重要性取樣),

MCMC(MarkovChain Monte Carlo 馬爾科夫蒙特卡羅取樣演算法:Metropolis-Hasting& Gibbs)。

Clustering(聚類):

K-Means,

K-Mediods,

二分K-Means,

FK-Means,

Canopy,

Spectral-KMeans(譜聚類),

GMM-EM(混合高斯模型-期望最大化演算法解決),

K-Pototypes,CLARANS(基於劃分),

BIRCH(基於層次),

CURE(基於層次),

DBSCAN(基於密度),

CLIQUE(基於密度和基於網格)。

Classification&Regression(分類&迴歸):

LR(Linear Regression 線性迴歸),

LR(LogisticRegression邏輯迴歸),

SR(Softmax Regression 多分類邏輯迴歸),

GLM(GeneralizedLinear Model 廣義線性模型),

RR(Ridge Regression 嶺迴歸/L2正則最小二乘迴歸),

LASSO(Least Absolute Shrinkage andSelectionator Operator L1正則最小二乘迴歸),

RF(隨機森林),

DT(DecisionTree決策樹),

GBDT(Gradient BoostingDecision Tree 梯度下降決策樹),

CART(ClassificationAnd Regression Tree 分類迴歸樹),

KNN(K-Nearest Neighbor K近鄰),

SVM(Support VectorMachine),

KF(KernelFunction 核函式PolynomialKernel Function 多項式核函、

Guassian KernelFunction 高斯核函式/Radial BasisFunction RBF徑向基函式、

String KernelFunction 字串核函式)、

NB(Naive Bayes 樸素貝葉斯),BN(Bayesian Network/Bayesian Belief Network/ Belief Network 貝葉斯網路/貝葉斯信度網路/信念網路),

LDA(Linear Discriminant Analysis/FisherLinear Discriminant 線性判別分析/Fisher線性判別),

EL(Ensemble Learning整合學習Boosting,Bagging,Stacking),

AdaBoost(Adaptive Boosting 自適應增強),

MEM(MaximumEntropy Model最大熵模型)。

Effectiveness Evaluation(分類效果評估):

Confusion Matrix(混淆矩陣),

Precision(精確度),Recall(召回率),

Accuracy(準確率),F-score(F得分),

ROC Curve(ROC曲線),AUC(AUC面積),

LiftCurve(Lift曲線) ,KS Curve(KS曲線)。

PGM(Probabilistic Graphical Models概率圖模型):

BN(Bayesian Network/Bayesian Belief Network/ BeliefNetwork 貝葉斯網路/貝葉斯信度網路/信念網路),

MC(Markov Chain 馬爾科夫鏈),

HMM(HiddenMarkov Model 馬爾科夫模型),

MEMM(Maximum Entropy Markov Model 最大熵馬爾科夫模型),

CRF(ConditionalRandom Field 條件隨機場),

MRF(MarkovRandom Field 馬爾科夫隨機場)。

NN(Neural Network神經網路):

ANN(Artificial Neural Network 人工神經網路),

BP(Error BackPropagation 誤差反向傳播)。


Deep Learning(深度學習):

Auto-encoder(自動編碼器),

SAE(Stacked Auto-encoders堆疊自動編碼器,

Sparse Auto-encoders稀疏自動編碼器、

Denoising Auto-encoders去噪自動編碼器、

Contractive Auto-encoders 收縮自動編碼器),

RBM(RestrictedBoltzmann Machine 受限玻爾茲曼機),

DBN(Deep Belief Network 深度信念網路),

CNN(ConvolutionalNeural Network 卷積神經網路),

Word2Vec(詞向量學習模型)。

DimensionalityReduction(降維):

LDA LinearDiscriminant Analysis/Fisher Linear Discriminant 線性判別分析/Fisher線性判別,

PCA(Principal Component Analysis 主成分分析),

ICA(IndependentComponent Analysis 獨立成分分析),

SVD(Singular Value Decomposition 奇異值分解),

FA(FactorAnalysis 因子分析法)。

Text Mining(文字挖掘):

VSM(Vector Space Model向量空間模型),

Word2Vec(詞向量學習模型),

TF(Term Frequency詞頻),

TF-IDF(Term Frequency-Inverse DocumentFrequency 詞頻-逆向文件頻率),

MI(MutualInformation 互資訊),

ECE(Expected Cross Entropy 期望交叉熵),

QEMI(二次資訊熵),

IG(InformationGain 資訊增益),

IGR(Information Gain Ratio 資訊增益率),

Gini(基尼係數),

x2 Statistic(x2統計量),

TEW(TextEvidence Weight文字證據權),

OR(Odds Ratio 優勢率),

N-Gram Model,

LSA(Latent Semantic Analysis 潛在語義分析),

PLSA(ProbabilisticLatent Semantic Analysis 基於概率的潛在語義分析),

LDA(Latent DirichletAllocation 潛在狄利克雷模型)。

Association Mining(關聯挖掘):

Apriori,

FP-growth(Frequency Pattern Tree Growth 頻繁模式樹生長演算法),

AprioriAll,

Spade。

Recommendation Engine(推薦引擎):

DBR(Demographic-based Recommendation 基於人口統計學的推薦),

CBR(Context-basedRecommendation 基於內容的推薦),

CF(Collaborative Filtering協同過濾),

UCF(User-basedCollaborative Filtering Recommendation 基於使用者的協同過濾推薦),

ICF(Item-basedCollaborative Filtering Recommendation 基於專案的協同過濾推薦)。

Similarity Measure&Distance Measure(相似性與距離度量):

Euclidean Distance(歐式距離),

ManhattanDistance(曼哈頓距離),

Chebyshev Distance(切比雪夫距離),

MinkowskiDistance(閔可夫斯基距離),

Standardized Euclidean Distance(標準化歐氏距離),

MahalanobisDistance(馬氏距離),

Cos(Cosine 餘弦),

HammingDistance/Edit Distance(漢明距離/編輯距離),

JaccardDistance(傑卡德距離),

Correlation Coefficient Distance(相關係數距離),

InformationEntropy(資訊熵),

KL(Kullback-Leibler Divergence KL散度/Relative Entropy 相對熵)。

Optimization(最優化):

Non-constrainedOptimization(無約束優化):

Cyclic VariableMethods(變數輪換法),

Pattern Search Methods(模式搜尋法),

VariableSimplex Methods(可變單純形法),

Gradient Descent Methods(梯度下降法),

Newton Methods(牛頓法),

Quasi-NewtonMethods(擬牛頓法),

Conjugate Gradient Methods(共軛梯度法)。

ConstrainedOptimization(有約束優化):

Approximation Programming Methods(近似規劃法),

FeasibleDirection Methods(可行方向法),

Penalty Function Methods(罰函式法),

Multiplier Methods(乘子法)。

Heuristic Algorithm(啟發式演算法),

SA(SimulatedAnnealing,

模擬退火演算法),

GA(genetic algorithm遺傳演算法)。

Feature Selection(特徵選擇演算法):

Mutual Information(互資訊),

DocumentFrequence(文件頻率),

Information Gain(資訊增益),

Chi-squared Test(卡方檢驗),

Gini(基尼係數)。

Outlier Detection(異常點檢測演算法):

Statistic-based(基於統計),

Distance-based(基於距離),

Density-based(基於密度),

Clustering-based(基於聚類)。

Learning to Rank(基於學習的排序):

Pointwise:McRank;

Pairwise:RankingSVM,RankNet,Frank,RankBoost;

Listwise:AdaRank,SoftRank,LamdaMART。

Basis(基礎):

MSE(Mean Square Error 均方誤差),

LMS(LeastMean Square 最小均方),

LSM(Least Square Methods 最小二乘法),

MLE(MaximumLikelihood Estimation最大似然估計),

QP(Quadratic Programming 二次規劃),

CP(Conditional Probability條件概率),

JP(Joint Probability 聯合概率),

MP(Marginal Probability邊緣概率),

Bayesian Formula(貝葉斯公式),

L1 /L2Regularization(L1/L2正則,

以及更多的,現在比較火的L2.5正則等)

GD(GradientDescent 梯度下降),

SGD(Stochastic Gradient Descent 隨機梯度下降),

Eigenvalue(特徵值),

Eigenvector(特徵向量),

QR-decomposition(QR分解),

Quantile (分位數),

Covariance(協方差矩陣)。

Common Distribution(常見分佈):

Discrete Distribution(離散型分佈):

BernoulliDistribution/Binomial(貝努利分佈/二項分佈),

Negative BinomialDistribution(負二項分佈),

MultinomialDistribution(多項式分佈),

Geometric Distribution(幾何分佈),

HypergeometricDistribution(超幾何分佈),

Poisson Distribution (泊松分佈)。

Continuous Distribution (連續型分佈):

UniformDistribution(均勻分佈),

Normal Distribution /Guassian Distribution(正態分佈/高斯分佈),

ExponentialDistribution(指數分佈),

Lognormal Distribution(對數正態分佈),

GammaDistribution(Gamma分佈),

Beta Distribution(Beta分佈),

Dirichlet Distribution(狄利克雷分佈),

Rayleigh Distribution(瑞利分佈),

Cauchy Distribution(柯西分佈),

Weibull Distribution (韋伯分佈)。

Three Sampling Distribution(三大抽樣分佈):

Chi-squareDistribution(卡方分佈),

t-distribution(t-distribution),

F-distribution(F-分佈)。

Data Pre-processing(資料預處理):

Missing Value Imputation(缺失值填充),

Discretization(離散化),Mapping(對映),

Normalization(歸一化/標準化)。

Sampling(取樣):

Simple Random Sampling(簡單隨機取樣),

OfflineSampling(離線等可能K取樣),

Online Sampling(線上等可能K取樣),

Ratio-based Sampling(等比例隨機取樣),

Acceptance-RejectionSampling(接受-拒絕取樣),

Importance Sampling(重要性取樣),

MCMC(MarkovChain Monte Carlo 馬爾科夫蒙特卡羅取樣演算法:Metropolis-Hasting& Gibbs)。

Clustering(聚類):

K-Means,

K-Mediods,

二分K-Means,

FK-Means,

Canopy,

Spectral-KMeans(譜聚類),

GMM-EM(混合高斯模型-期望最大化演算法解決),

K-Pototypes,CLARANS(基於劃分),

BIRCH(基於層次),

CURE(基於層次),

DBSCAN(基於密度),

CLIQUE(基於密度和基於網格)。

Classification&Regression(分類&迴歸):

LR(Linear Regression 線性迴歸),

LR(LogisticRegression邏輯迴歸),

SR(Softmax Regression 多分類邏輯迴歸),

GLM(GeneralizedLinear Model 廣義線性模型),

RR(Ridge Regression 嶺迴歸/L2正則最小二乘迴歸),

LASSO(Least Absolute Shrinkage andSelectionator Operator L1正則最小二乘迴歸),

RF(隨機森林),

DT(DecisionTree決策樹),

GBDT(Gradient BoostingDecision Tree 梯度下降決策樹),

CART(ClassificationAnd Regression Tree 分類迴歸樹),

KNN(K-Nearest Neighbor K近鄰),

SVM(Support VectorMachine),

KF(KernelFunction 核函式PolynomialKernel Function 多項式核函、

Guassian KernelFunction 高斯核函式/Radial BasisFunction RBF徑向基函式、

String KernelFunction 字串核函式)、

NB(Naive Bayes 樸素貝葉斯),BN(Bayesian Network/Bayesian Belief Network/ Belief Network 貝葉斯網路/貝葉斯信度網路/信念網路),

LDA(Linear Discriminant Analysis/FisherLinear Discriminant 線性判別分析/Fisher線性判別),

EL(Ensemble Learning整合學習Boosting,Bagging,Stacking),

AdaBoost(Adaptive Boosting 自適應增強),

MEM(MaximumEntropy Model最大熵模型)。

Effectiveness Evaluation(分類效果評估):

Confusion Matrix(混淆矩陣),

Precision(精確度),Recall(召回率),

Accuracy(準確率),F-score(F得分),

ROC Curve(ROC曲線),AUC(AUC面積),

LiftCurve(Lift曲線) ,KS Curve(KS曲線)。

PGM(Probabilistic Graphical Models概率圖模型):

BN(Bayesian Network/Bayesian Belief Network/ BeliefNetwork 貝葉斯網路/貝葉斯信度網路/信念網路),

MC(Markov Chain 馬爾科夫鏈),

HMM(HiddenMarkov Model 馬爾科夫模型),

MEMM(Maximum Entropy Markov Model 最大熵馬爾科夫模型),

CRF(ConditionalRandom Field 條件隨機場),

MRF(MarkovRandom Field 馬爾科夫隨機場)。

NN(Neural Network神經網路):

ANN(Artificial Neural Network 人工神經網路),

BP(Error BackPropagation 誤差反向傳播)。


Deep Learning(深度學習):

Auto-encoder(自動編碼器),

SAE(Stacked Auto-encoders堆疊自動編碼器,

Sparse Auto-encoders稀疏自動編碼器、

Denoising Auto-encoders去噪自動編碼器、

Contractive Auto-encoders 收縮自動編碼器),

RBM(RestrictedBoltzmann Machine 受限玻爾茲曼機),

DBN(Deep Belief Network 深度信念網路),

CNN(ConvolutionalNeural Network 卷積神經網路),

Word2Vec(詞向量學習模型)。

DimensionalityReduction(降維):

LDA LinearDiscriminant Analysis/Fisher Linear Discriminant 線性判別分析/Fisher線性判別,

PCA(Principal Component Analysis 主成分分析),

ICA(IndependentComponent Analysis 獨立成分分析),

SVD(Singular Value Decomposition 奇異值分解),

FA(FactorAnalysis 因子分析法)。

Text Mining(文字挖掘):

VSM(Vector Space Model向量空間模型),

Word2Vec(詞向量學習模型),

TF(Term Frequency詞頻),

TF-IDF(Term Frequency-Inverse DocumentFrequency 詞頻-逆向文件頻率),

MI(MutualInformation 互資訊),

ECE(Expected Cross Entropy 期望交叉熵),

QEMI(二次資訊熵),

IG(InformationGain 資訊增益),

IGR(Information Gain Ratio 資訊增益率),

Gini(基尼係數),

x2 Statistic(x2統計量),

TEW(TextEvidence Weight文字證據權),

OR(Odds Ratio 優勢率),

N-Gram Model,

LSA(Latent Semantic Analysis 潛在語義分析),

PLSA(ProbabilisticLatent Semantic Analysis 基於概率的潛在語義分析),

LDA(Latent DirichletAllocation 潛在狄利克雷模型)。

Association Mining(關聯挖掘):

Apriori,

FP-growth(Frequency Pattern Tree Growth 頻繁模式樹生長演算法),

AprioriAll,

Spade。

Recommendation Engine(推薦引擎):

DBR(Demographic-based Recommendation 基於人口統計學的推薦),

CBR(Context-basedRecommendation 基於內容的推薦),

CF(Collaborative Filtering協同過濾),

UCF(User-basedCollaborative Filtering Recommendation 基於使用者的協同過濾推薦),

ICF(Item-basedCollaborative Filtering Recommendation 基於專案的協同過濾推薦)。

Similarity Measure&Distance Measure(相似性與距離度量):

Euclidean Distance(歐式距離),

ManhattanDistance(曼哈頓距離),

Chebyshev Distance(切比雪夫距離),

MinkowskiDistance(閔可夫斯基距離),

Standardized Euclidean Distance(標準化歐氏距離),

MahalanobisDistance(馬氏距離),

Cos(Cosine 餘弦),

HammingDistance/Edit Distance(漢明距離/編輯距離),

JaccardDistance(傑卡德距離),

Correlation Coefficient Distance(相關係數距離),

InformationEntropy(資訊熵),

KL(Kullback-Leibler Divergence KL散度/Relative Entropy 相對熵)。

Optimization(最優化):

Non-constrainedOptimization(無約束優化):

Cyclic VariableMethods(變數輪換法),

Pattern Search Methods(模式搜尋法),

VariableSimplex Methods(可變單純形法),

Gradient Descent Methods(梯度下降法),

Newton Methods(牛頓法),

Quasi-NewtonMethods(擬牛頓法),

Conjugate Gradient Methods(共軛梯度法)。

ConstrainedOptimization(有約束優化):

Approximation Programming Methods(近似規劃法),

FeasibleDirection Methods(可行方向法),

Penalty Function Methods(罰函式法),

Multiplier Methods(乘子法)。

Heuristic Algorithm(啟發式演算法),

SA(SimulatedAnnealing,

模擬退火演算法),

GA(genetic algorithm遺傳演算法)。

Feature Selection(特徵選擇演算法):

Mutual Information(互資訊),

DocumentFrequence(文件頻率),

Information Gain(資訊增益),

Chi-squared Test(卡方檢驗),

Gini(基尼係數)。

Outlier Detection(異常點檢測演算法):

Statistic-based(基於統計),

Distance-based(基於距離),

Density-based(基於密度),

Clustering-based(基於聚類)。

Learning to Rank(基於學習的排序):

Pointwise:McRank;

Pairwise:RankingSVM,RankNet,Frank,RankBoost;

Listwise:AdaRank,SoftRank,LamdaMART。

Tool(工具):

MPI,Hadoop生態圈,Spark,BSP,Weka,Mahout,Scikit-learn,PyBrain…

以及一些具體的業務場景與case等。