1. 程式人生 > 其它 >基尼辛普森指數衡量多樣性

基尼辛普森指數衡量多樣性

技術標籤:資料分析

Simpson index

\lambda =\sum_{i=1}^{R}p_{i}^{2}

The measure equals the probability that two entities taken at random from the dataset (with replacement) represent the same type, whereRis the total number of types in the dataset.

Gini–Simpson index

The transformation1-\lambdaequals the probability that the two entities represent different types.

分佈越均衡,該指數越高;分佈越集中,該指數越低。

Code

import pandas as pd

def gini_calc(df2):
    sum_ = sum_square = 0
    sum_ = df2['cnt'].sum()
    df2['cnt_prop']=df2['cnt'].apply(lambda x :x/sum_)
    for i in df2['cnt_prop']:
        sum_square += i**2
    return 1-sum_square


################################
df = pd.read_excel('gini.xlsx')
df=df.groupby([df['population'],df['subpopulation'],df['type']],as_index=False).sum()


################################
a=[]
b=[]
c=[]
for name,group in df.groupby([df['population'],df['subpopulation']]):
    index = gini_calc(group)
    a.append(name[0])
    b.append(name[1])
    c.append(index)
 
res={"population":a, "subpopulation":b, "gini_simpson_index":c}
data=pd.DataFrame(res)
result=data.to_csv('gini_result.csv')