基尼辛普森指數衡量多樣性
阿新 • • 發佈:2021-02-09
技術標籤:資料分析
Simpson index
The measure equals the probability that two entities taken at random from the dataset (with replacement) represent the same type, whereis the total number of types in the dataset.
Gini–Simpson index
The transformationequals the probability that the two entities represent different types.
分佈越均衡,該指數越高;分佈越集中,該指數越低。
Code
import pandas as pd def gini_calc(df2): sum_ = sum_square = 0 sum_ = df2['cnt'].sum() df2['cnt_prop']=df2['cnt'].apply(lambda x :x/sum_) for i in df2['cnt_prop']: sum_square += i**2 return 1-sum_square ################################ df = pd.read_excel('gini.xlsx') df=df.groupby([df['population'],df['subpopulation'],df['type']],as_index=False).sum() ################################ a=[] b=[] c=[] for name,group in df.groupby([df['population'],df['subpopulation']]): index = gini_calc(group) a.append(name[0]) b.append(name[1]) c.append(index) res={"population":a, "subpopulation":b, "gini_simpson_index":c} data=pd.DataFrame(res) result=data.to_csv('gini_result.csv')