【轉】距離相關係數的python實現
阿新 • • 發佈:2021-11-06
距離相關係數的python實現
覺得有用的話,歡迎一起討論相互學習~
轉載自:https://blog.csdn.net/jiaoaodechunlv/article/details/80655592
最近在做特徵選擇,要考量幾個特徵的相關性,想找這個方法的描述,發現很難在網頁上搜到。以下為整合的:
[11] 王黎明, 吳香華, 趙天良,等. 基於距離相關係數和支援向量機迴歸的PM_(2.5)濃度滾動統計預報方案[J]. 環境科學學報, 2017,37(4):1268-1276.(我是從這篇論文上找的,維基百科上有更細緻的,可惜我看不下去啊)
下為python程式:
原文:https://gist.github.com/satra/aa3d19a12b74e9ab7941
from scipy.spatial.distance import pdist, squareform import numpy as np from numbapro import jit, float32 def distcorr(X, Y): """ Compute the distance correlation function >>> a = [1,2,3,4,5] >>> b = np.array([1,2,9,4,4]) >>> distcorr(a, b) 0.762676242417 """ X = np.atleast_1d(X) Y = np.atleast_1d(Y) if np.prod(X.shape) == len(X): X = X[:, None] if np.prod(Y.shape) == len(Y): Y = Y[:, None] X = np.atleast_2d(X) Y = np.atleast_2d(Y) n = X.shape[0] if Y.shape[0] != X.shape[0]: raise ValueError('Number of samples must match') a = squareform(pdist(X)) b = squareform(pdist(Y)) A = a - a.mean(axis=0)[None, :] - a.mean(axis=1)[:, None] + a.mean() B = b - b.mean(axis=0)[None, :] - b.mean(axis=1)[:, None] + b.mean() dcov2_xy = (A * B).sum()/float(n * n) dcov2_xx = (A * A).sum()/float(n * n) dcov2_yy = (B * B).sum()/float(n * n) dcor = np.sqrt(dcov2_xy)/np.sqrt(np.sqrt(dcov2_xx) * np.sqrt(dcov2_yy)) return dcor