1. 程式人生 > 其它 >10 綜合實戰--使用者消費行為分析

10 綜合實戰--使用者消費行為分析

import numpy as np
import pandas as pd
from pandas import DataFrame,Series
import matplotlib.pyplot as plt
#CDNOW_master.txt

第一部分:資料型別處理

  • 資料載入
    • 欄位含義:
      • user_id:使用者ID
      • order_dt:購買日期
      • order_product:購買產品的數量
      • order_amount:購買金額
  • 觀察資料
    • 檢視資料的資料型別
    • 資料中是否儲存在缺失值
    • 將order_dt轉換成時間型別
    • 檢視資料的統計描述
      • 計算所有使用者購買商品的平均數量
      • 計算所有使用者購買商品的平均花費
    • 在源資料中新增一列表示月份:astype('datetime64[M]')
#資料的載入
df = pd.read_csv('./data/CDNOW_master.txt',header=None,sep='\s+',names=['user_id','order_dt','order_product','order_amount'])
df
user_id order_dt order_product order_amount
0 1 19970101 1 11.77
1 2 19970112 1 12.00
2 2 19970112 5 77.00
3 3 19970102 2 20.76
4 3 19970330 2 20.76
5 3 19970402 2 19.54
6 3 19971115 5 57.45
7 3 19971125 4 20.96
8 3 19980528 1 16.99
9 4 19970101 2 29.33
10 4 19970118 2 29.73
11 4 19970802 1 14.96
12 4 19971212 2 26.48
13 5 19970101 2 29.33
14 5 19970114 1 13.97
15 5 19970204 3 38.90
16 5 19970411 3 45.55
17 5 19970531 3 38.71
18 5 19970616 2 26.14
19 5 19970722 2 28.14
20 5 19970915 3 40.47
21 5 19971208 4 46.46
22 5 19971212 3 40.47
23 5 19980103 3 37.47
24 6 19970101 1 20.99
25 7 19970101 2 28.74
26 7 19971011 7 97.43
27 7 19980322 9 138.50
28 8 19970101 1 9.77
29 8 19970213 1 13.97
... ... ... ... ...
69629 23556 19970927 3 31.47
69630 23556 19980103 2 28.98
69631 23556 19980607 2 28.98
69632 23557 19970325 1 14.37
69633 23558 19970325 2 28.13
69634 23558 19970518 3 45.51
69635 23558 19970624 2 23.74
69636 23558 19980225 4 48.22
69637 23559 19970325 2 23.54
69638 23559 19970518 3 35.31
69639 23559 19970627 3 52.80
69640 23560 19970325 1 18.36
69641 23561 19970325 2 30.92
69642 23561 19980128 1 15.49
69643 23561 19980529 3 37.05
69644 23562 19970325 2 29.33
69645 23563 19970325 1 10.77
69646 23563 19971004 2 47.98
69647 23564 19970325 1 11.77
69648 23564 19970521 1 11.77
69649 23564 19971130 3 46.47
69650 23565 19970325 1 11.77
69651 23566 19970325 2 36.00
69652 23567 19970325 1 20.97
69653 23568 19970325 1 22.97
69654 23568 19970405 4 83.74
69655 23568 19970422 1 14.99
69656 23569 19970325 2 25.74
69657 23570 19970325 3 51.12
69658 23570 19970326 2 42.96

69659 rows × 4 columns

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69659 entries, 0 to 69658
Data columns (total 4 columns):
user_id          69659 non-null int64
order_dt         69659 non-null int64
order_product    69659 non-null int64
order_amount     69659 non-null float64
dtypes: float64(1), int64(3)
memory usage: 2.1 MB
#將order_dt轉換成時間型別
df['order_dt'] = pd.to_datetime(df['order_dt'],format='%Y%m%d')
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69659 entries, 0 to 69658
Data columns (total 4 columns):
user_id          69659 non-null int64
order_dt         69659 non-null datetime64[ns]
order_product    69659 non-null int64
order_amount     69659 non-null float64
dtypes: datetime64[ns](1), float64(1), int64(2)
memory usage: 2.1 MB
#檢視資料的統計描述
df.describe()
user_id order_product order_amount
count 69659.000000 69659.000000 69659.000000
mean 11470.854592 2.410040 35.893648
std 6819.904848 2.333924 36.281942
min 1.000000 1.000000 0.000000
25% 5506.000000 1.000000 14.490000
50% 11410.000000 2.000000 25.980000
75% 17273.000000 3.000000 43.700000
max 23570.000000 99.000000 1286.010000
#基於order_dt取出其中的月份
df['order_dt'].astype('datetime64[M]')
0       1997-01-01
1       1997-01-01
2       1997-01-01
3       1997-01-01
4       1997-03-01
5       1997-04-01
6       1997-11-01
7       1997-11-01
8       1998-05-01
9       1997-01-01
10      1997-01-01
11      1997-08-01
12      1997-12-01
13      1997-01-01
14      1997-01-01
15      1997-02-01
16      1997-04-01
17      1997-05-01
18      1997-06-01
19      1997-07-01
20      1997-09-01
21      1997-12-01
22      1997-12-01
23      1998-01-01
24      1997-01-01
25      1997-01-01
26      1997-10-01
27      1998-03-01
28      1997-01-01
29      1997-02-01
           ...    
69629   1997-09-01
69630   1998-01-01
69631   1998-06-01
69632   1997-03-01
69633   1997-03-01
69634   1997-05-01
69635   1997-06-01
69636   1998-02-01
69637   1997-03-01
69638   1997-05-01
69639   1997-06-01
69640   1997-03-01
69641   1997-03-01
69642   1998-01-01
69643   1998-05-01
69644   1997-03-01
69645   1997-03-01
69646   1997-10-01
69647   1997-03-01
69648   1997-05-01
69649   1997-11-01
69650   1997-03-01
69651   1997-03-01
69652   1997-03-01
69653   1997-03-01
69654   1997-04-01
69655   1997-04-01
69656   1997-03-01
69657   1997-03-01
69658   1997-03-01
Name: order_dt, Length: 69659, dtype: datetime64[ns]
#在源資料中新增一列表示月份:astype('datetime64[M]')
df['month'] = df['order_dt'].astype('datetime64[M]')
df.head()
user_id order_dt order_product order_amount month
0 1 1997-01-01 1 11.77 1997-01-01
1 2 1997-01-12 1 12.00 1997-01-01
2 2 1997-01-12 5 77.00 1997-01-01
3 3 1997-01-02 2 20.76 1997-01-01
4 3 1997-03-30 2 20.76 1997-03-01

第二部分:按月資料分析

  • 使用者每月花費的總金額
    • 繪製曲線圖展示
  • 所有使用者每月的產品購買量
  • 所有使用者每月的消費總次數
  • 統計每月的消費人數
#使用者每月花費的總金額
df.groupby(by='month')['order_amount'].sum()
month
1997-01-01    299060.17
1997-02-01    379590.03
1997-03-01    393155.27
1997-04-01    142824.49
1997-05-01    107933.30
1997-06-01    108395.87
1997-07-01    122078.88
1997-08-01     88367.69
1997-09-01     81948.80
1997-10-01     89780.77
1997-11-01    115448.64
1997-12-01     95577.35
1998-01-01     76756.78
1998-02-01     77096.96
1998-03-01    108970.15
1998-04-01     66231.52
1998-05-01     70989.66
1998-06-01     76109.30
Name: order_amount, dtype: float64
# plt.plot(df.groupby(by='month')['order_amount'].sum())
df.groupby(by='month')['order_amount'].sum().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x111536c50>



#所有使用者每月的產品購買量
df.groupby(by='month')['order_product'].sum().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x1115d2978>



#所有使用者每月的消費總次數(原始資料中的一行資料表示一次消費記錄)
df.groupby(by='month')['user_id'].count()
month
1997-01-01     8928
1997-02-01    11272
1997-03-01    11598
1997-04-01     3781
1997-05-01     2895
1997-06-01     3054
1997-07-01     2942
1997-08-01     2320
1997-09-01     2296
1997-10-01     2562
1997-11-01     2750
1997-12-01     2504
1998-01-01     2032
1998-02-01     2026
1998-03-01     2793
1998-04-01     1878
1998-05-01     1985
1998-06-01     2043
Name: user_id, dtype: int64
#統計每月的消費人數(可能同一天一個使用者會消費多次) nunique表示統計去重後的個數
df.groupby(by='month')['user_id'].nunique()
month
1997-01-01    7846
1997-02-01    9633
1997-03-01    9524
1997-04-01    2822
1997-05-01    2214
1997-06-01    2339
1997-07-01    2180
1997-08-01    1772
1997-09-01    1739
1997-10-01    1839
1997-11-01    2028
1997-12-01    1864
1998-01-01    1537
1998-02-01    1551
1998-03-01    2060
1998-04-01    1437
1998-05-01    1488
1998-06-01    1506
Name: user_id, dtype: int64

第三部分:使用者個體消費資料分析

  • 使用者消費總金額和消費總次數的統計描述
  • 使用者消費金額和消費產品數量的散點圖
  • 各個使用者消費總金額的直方分佈圖(消費金額在1000之內的分佈)
  • 各個使用者消費的總數量的直方分佈圖(消費商品的數量在100次之內的分佈)
#使用者消費總金額和消費總次數的統計描述
df.groupby(by='user_id')['order_amount'].sum() #每一個使用者消費的總金額
user_id
1         11.77
2         89.00
3        156.46
4        100.50
5        385.61
6         20.99
7        264.67
8        197.66
9         95.85
10        39.31
11        58.55
12        57.06
13        72.94
14        29.92
15        52.87
16        79.87
17        73.22
18        14.96
19       175.12
20       653.01
21        75.11
22        14.37
23        24.74
24        57.77
25       137.53
26       102.69
27       135.87
28        90.99
29       435.81
30        28.34
          ...  
23541     57.34
23542     77.43
23543     50.76
23544    134.63
23545     24.99
23546     13.97
23547     23.54
23548     23.54
23549     27.13
23550     25.28
23551    264.63
23552     49.38
23553     98.58
23554     36.37
23555    189.18
23556    203.00
23557     14.37
23558    145.60
23559    111.65
23560     18.36
23561     83.46
23562     29.33
23563     58.75
23564     70.01
23565     11.77
23566     36.00
23567     20.97
23568    121.70
23569     25.74
23570     94.08
Name: order_amount, Length: 23570, dtype: float64
#每一個使用者消費的總次數
df.groupby(by='user_id').count()['order_dt']
user_id
1         1
2         2
3         6
4         4
5        11
6         1
7         3
8         8
9         3
10        1
11        4
12        1
13        1
14        1
15        1
16        4
17        1
18        1
19        2
20        2
21        2
22        1
23        1
24        2
25        8
26        2
27        2
28        3
29       12
30        2
         ..
23541     2
23542     1
23543     1
23544     3
23545     1
23546     1
23547     2
23548     1
23549     1
23550     1
23551     6
23552     2
23553     2
23554     2
23555     5
23556     7
23557     1
23558     4
23559     3
23560     1
23561     3
23562     1
23563     2
23564     3
23565     1
23566     1
23567     1
23568     3
23569     1
23570     2
Name: order_dt, Length: 23570, dtype: int64
#使用者消費金額和消費產品數量的散點圖
user_amount_sum = df.groupby(by='user_id')['order_amount'].sum()
user_product_sum = df.groupby(by='user_id')['order_product'].sum()
plt.scatter(user_product_sum,user_amount_sum)
<matplotlib.collections.PathCollection at 0x112253588>



#各個使用者消費總金額的直方分佈圖(消費金額在1000之內的分佈)
df.groupby(by='user_id').sum().query('order_amount <= 1000')['order_amount']
df.groupby(by='user_id').sum().query('order_amount <= 1000')['order_amount'].hist()
<matplotlib.axes._subplots.AxesSubplot at 0x1122f1d30>



#各個使用者消費的總數量的直方分佈圖(消費商品的數量在100次之內的分佈)
df.groupby(by='user_id').sum().query('order_product <= 100')['order_product'].hist()
<matplotlib.axes._subplots.AxesSubplot at 0x11491f828>



第四部分:使用者消費行為分析

  • 使用者第一次消費的月份分佈,和人數統計
    • 繪製線形圖
  • 使用者最後一次消費的時間分佈,和人數統計
    • 繪製線形圖
  • 新老客戶的佔比
    • 消費一次為新使用者
    • 消費多次為老使用者
      • 分析出每一個使用者的第一個消費和最後一次消費的時間
        • agg(['func1','func2']):對分組後的結果進行指定聚合
      • 分析出新老客戶的消費比例
  • 使用者分層
    • 分析得出每個使用者的總購買量和總消費金額and最近一次消費的時間的表格rfm
    • RFM模型設計
      • R表示客戶最近一次交易時間的間隔。
        • /np.timedelta64(1,'D'):去除days
      • F表示客戶購買商品的總數量,F值越大,表示客戶交易越頻繁,反之則表示客戶交易不夠活躍。
      • M表示客戶交易的金額。M值越大,表示客戶價值越高,反之則表示客戶價值越低。
      • 將R,F,M作用到rfm表中
    • 根據價值分層,將使用者分為:
      • 重要價值客戶
      • 重要保持客戶
      • 重要挽留客戶
      • 重要發展客戶
      • 一般價值客戶
      • 一般保持客戶
      • 一般挽留客戶
      • 一般發展客戶
        • 使用已有的分層模型即可rfm_func
#使用者第一次消費的月份分佈,和人數統計
#第一次消費的月份:每一個使用者消費月份的最小值就是該使用者第一次消費的月份
df.groupby(by='user_id')['month'].min()
user_id
1       1997-01-01
2       1997-01-01
3       1997-01-01
4       1997-01-01
5       1997-01-01
6       1997-01-01
7       1997-01-01
8       1997-01-01
9       1997-01-01
10      1997-01-01
11      1997-01-01
12      1997-01-01
13      1997-01-01
14      1997-01-01
15      1997-01-01
16      1997-01-01
17      1997-01-01
18      1997-01-01
19      1997-01-01
20      1997-01-01
21      1997-01-01
22      1997-01-01
23      1997-01-01
24      1997-01-01
25      1997-01-01
26      1997-01-01
27      1997-01-01
28      1997-01-01
29      1997-01-01
30      1997-01-01
           ...    
23541   1997-03-01
23542   1997-03-01
23543   1997-03-01
23544   1997-03-01
23545   1997-03-01
23546   1997-03-01
23547   1997-03-01
23548   1997-03-01
23549   1997-03-01
23550   1997-03-01
23551   1997-03-01
23552   1997-03-01
23553   1997-03-01
23554   1997-03-01
23555   1997-03-01
23556   1997-03-01
23557   1997-03-01
23558   1997-03-01
23559   1997-03-01
23560   1997-03-01
23561   1997-03-01
23562   1997-03-01
23563   1997-03-01
23564   1997-03-01
23565   1997-03-01
23566   1997-03-01
23567   1997-03-01
23568   1997-03-01
23569   1997-03-01
23570   1997-03-01
Name: month, Length: 23570, dtype: datetime64[ns]
df.groupby(by='user_id')['month'].min().value_counts() #人數的統計
df.groupby(by='user_id')['month'].min().value_counts().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x11dddba90>



#使用者最後一次消費的時間分佈,和人數統計
#使用者消費月份的最大值就是使用者最後一次消費的月份
df.groupby(by='user_id')['month'].max().value_counts().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x11e35ba58>



#新老客戶的佔比
#消費一次為新使用者,消費多次為老使用者
#如何獲知使用者是否為第一次消費?可以根據使用者的消費時間進行判定?
    #如果使用者的第一次消費時間和最後一次消費時間一樣,則該使用者只消費了一次為新使用者,否則為老使用者
new_old_user_df = df.groupby(by='user_id')['order_dt'].agg(['min','max'])#agg對分組後的結果進行多種指定聚合
new_old_user_df['min'] == new_old_user_df['max'] #True新使用者,False老使用者
#統計True和False的個數
(new_old_user_df['min'] == new_old_user_df['max']).value_counts()
True     12054
False    11516
dtype: int64
#分析得出每個使用者的總購買量和總消費金額and最近一次消費的時間的表格rfm
rfm = df.pivot_table(index='user_id',aggfunc={'order_product':'sum','order_amount':'sum','order_dt':"max"})
rfm
order_amount order_dt order_product
user_id
1 11.77 1997-01-01 1
2 89.00 1997-01-12 6
3 156.46 1998-05-28 16
4 100.50 1997-12-12 7
5 385.61 1998-01-03 29
6 20.99 1997-01-01 1
7 264.67 1998-03-22 18
8 197.66 1998-03-29 18
9 95.85 1998-06-08 6
10 39.31 1997-01-21 3
11 58.55 1998-02-20 4
12 57.06 1997-01-01 4
13 72.94 1997-01-01 4
14 29.92 1997-01-01 2
15 52.87 1997-01-01 4
16 79.87 1997-09-10 8
17 73.22 1997-01-01 5
18 14.96 1997-01-04 1
19 175.12 1997-06-10 11
20 653.01 1997-01-18 46
21 75.11 1997-01-13 4
22 14.37 1997-01-01 1
23 24.74 1997-01-01 2
24 57.77 1998-01-20 4
25 137.53 1998-06-08 12
26 102.69 1997-01-26 6
27 135.87 1997-01-12 10
28 90.99 1997-03-08 7
29 435.81 1998-04-26 28
30 28.34 1997-02-14 2
... ... ... ...
23541 57.34 1997-04-02 2
23542 77.43 1997-03-25 5
23543 50.76 1997-03-25 2
23544 134.63 1998-01-24 12
23545 24.99 1997-03-25 1
23546 13.97 1997-03-25 1
23547 23.54 1997-04-07 2
23548 23.54 1997-03-25 2
23549 27.13 1997-03-25 2
23550 25.28 1997-03-25 2
23551 264.63 1997-09-11 12
23552 49.38 1997-04-03 4
23553 98.58 1997-03-28 8
23554 36.37 1998-02-01 3
23555 189.18 1998-06-10 14
23556 203.00 1998-06-07 15
23557 14.37 1997-03-25 1
23558 145.60 1998-02-25 11
23559 111.65 1997-06-27 8
23560 18.36 1997-03-25 1
23561 83.46 1998-05-29 6
23562 29.33 1997-03-25 2
23563 58.75 1997-10-04 3
23564 70.01 1997-11-30 5
23565 11.77 1997-03-25 1
23566 36.00 1997-03-25 2
23567 20.97 1997-03-25 1
23568 121.70 1997-04-22 6
23569 25.74 1997-03-25 2
23570 94.08 1997-03-26 5

23570 rows × 3 columns

#R表示客戶最近一次交易時間的間隔
max_dt = df['order_dt'].max() #今天的日期
#每一個使用者最後一次交易的時間
-(df.groupby(by='user_id')['order_dt'].max() - max_dt)
rfm['R'] = -(df.groupby(by='user_id')['order_dt'].max() - max_dt)/np.timedelta64(1,'D') # 將R列的days字尾去掉 /np.timedelta64(1,'D')
rfm.drop(labels='order_dt',axis=1,inplace=True)
rfm.columns = ['M','F','R'] # 修改列標籤名
rfm.head()
M F R
user_id
1 11.77 1 545.0
2 89.00 6 534.0
3 156.46 16 33.0
4 100.50 7 200.0
5 385.61 29 178.0
def rfm_func(x):
    #儲存儲存的是三個字串形式的0或者1
    level = x.map(lambda x :'1' if x >= 0 else '0')
    label = level.R + level.F + level.M
    d = {
        '111':'重要價值客戶',
        '011':'重要保持客戶',
        '101':'重要挽留客戶',
        '001':'重要發展客戶',
        '110':'一般價值客戶',
        '010':'一般保持客戶',
        '100':'一般挽留客戶',
        '000':'一般發展客戶'
    }
    result = d[label]
    return result
#df.apply(func):可以對df中的行或者列進行某種(func)形式的運算
rfm['label'] = rfm.apply(lambda x : x - x.mean()).apply(rfm_func,axis = 1)
rfm.head()
M F R label
user_id
1 11.77 1 545.0 一般挽留客戶
2 89.00 6 534.0 一般挽留客戶
3 156.46 16 33.0 重要保持客戶
4 100.50 7 200.0 一般發展客戶
5 385.61 29 178.0 重要保持客戶

第五部分:使用者的生命週期

  • 將使用者劃分為活躍使用者和其他使用者
    • 統計每個使用者每個月的消費次數
    • 統計每個使用者每個月是否消費,消費記錄為1否則記錄為0
      • 知識點:DataFrame的apply和applymap的區別
        • applymap:傳入每個單個元素返回df
        • 將函式做用於DataFrame中的所有元素(elements)
        • apply:返回Series
        • apply()將一個函式作用於DataFrame中的每個行或者列
    • 將使用者按照每一個月份分成:
      • unreg:觀望使用者(前兩月沒買,第三個月才第一次買,則使用者前兩個月為觀望使用者)
      • unactive:首月購買後,後序月份沒有購買則在沒有購買的月份中該使用者的為非活躍使用者
      • new:當前月就進行首次購買的使用者在當前月為新使用者
      • active:連續月份購買的使用者在這些月中為活躍使用者
      • return:購買之後間隔n月再次購買的第一個月份為該月份的回頭客
#統計每個使用者每個月的消費次數
user_month_count_df = df.pivot_table(index='user_id',values='order_dt',aggfunc='count',columns='month').fillna(0)
user_month_count_df.head()
month 1997-01-01 00:00:00 1997-02-01 00:00:00 1997-03-01 00:00:00 1997-04-01 00:00:00 1997-05-01 00:00:00 1997-06-01 00:00:00 1997-07-01 00:00:00 1997-08-01 00:00:00 1997-09-01 00:00:00 1997-10-01 00:00:00 1997-11-01 00:00:00 1997-12-01 00:00:00 1998-01-01 00:00:00 1998-02-01 00:00:00 1998-03-01 00:00:00 1998-04-01 00:00:00 1998-05-01 00:00:00 1998-06-01 00:00:00
user_id
1 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
4 2.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
5 2.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 2.0 1.0 0.0 0.0 0.0 0.0 0.0
#統計每個使用者每個月是否消費,消費記錄為1否則記錄為0
df_purchase = user_month_count_df.applymap(lambda x:1 if x >= 1 else 0)

month 1997-01-01 00:00:00 1997-02-01 00:00:00 1997-03-01 00:00:00 1997-04-01 00:00:00 1997-05-01 00:00:00 1997-06-01 00:00:00 1997-07-01 00:00:00 1997-08-01 00:00:00 1997-09-01 00:00:00 1997-10-01 00:00:00 1997-11-01 00:00:00 1997-12-01 00:00:00 1998-01-01 00:00:00 1998-02-01 00:00:00 1998-03-01 00:00:00 1998-04-01 00:00:00 1998-05-01 00:00:00 1998-06-01 00:00:00
user_id
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0
4 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
5 1 1 0 1 1 1 1 0 1 0 0 1 1 0 0 0 0 0
6 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0
8 1 1 0 0 0 1 1 0 0 0 1 1 0 0 1 0 0 0
9 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
10 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
12 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0
17 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
20 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
22 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
25 1 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 1 1
26 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
27 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
28 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
29 1 1 1 1 1 0 1 0 1 0 1 0 0 0 0 1 0 0
30 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
23541 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23542 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23543 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23544 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
23545 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23546 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23547 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23548 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23549 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23550 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23551 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0
23552 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23553 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23554 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
23555 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1
23556 0 0 1 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1
23557 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23558 0 0 1 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0
23559 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
23560 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23561 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0
23562 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23563 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
23564 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0
23565 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23566 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23567 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23568 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23569 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23570 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

23570 rows × 18 columns

df_purchase.head()
month 1997-01-01 00:00:00 1997-02-01 00:00:00 1997-03-01 00:00:00 1997-04-01 00:00:00 1997-05-01 00:00:00 1997-06-01 00:00:00 1997-07-01 00:00:00 1997-08-01 00:00:00 1997-09-01 00:00:00 1997-10-01 00:00:00 1997-11-01 00:00:00 1997-12-01 00:00:00 1998-01-01 00:00:00 1998-02-01 00:00:00 1998-03-01 00:00:00 1998-04-01 00:00:00 1998-05-01 00:00:00 1998-06-01 00:00:00
user_id
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0
4 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
5 1 1 0 1 1 1 1 0 1 0 0 1 1 0 0 0 0 0
#將df_purchase中的原始資料0和1修改為new,unactive......,返回新的df叫做df_purchase_new
#固定演算法
def active_status(data):
    status = []#某個使用者每一個月的活躍度
    for i in range(18):
        
        #若本月沒有消費
        if data[i] == 0:
            if len(status) > 0:
                if status[i-1] == 'unreg':
                    status.append('unreg')
                else:
                    status.append('unactive')
            else:
                status.append('unreg')
                    
        #若本月消費
        else:
            if len(status) == 0:
                status.append('new')
            else:
                if status[i-1] == 'unactive':
                    status.append('return')
                elif status[i-1] == 'unreg':
                    status.append('new')
                else:
                    status.append('active')
    return status

pivoted_status = df_purchase.apply(active_status,axis = 1) 
pivoted_status.head()
user_id
1    [new, unactive, unactive, unactive, unactive, ...
2    [new, unactive, unactive, unactive, unactive, ...
3    [new, unactive, return, active, unactive, unac...
4    [new, unactive, unactive, unactive, unactive, ...
5    [new, active, unactive, return, active, active...
dtype: object
df_purchase_new = DataFrame(data=pivoted_status.values.tolist(),index=df_purchase.index,columns=df_purchase.columns)
df_purchase_new
month 1997-01-01 00:00:00 1997-02-01 00:00:00 1997-03-01 00:00:00 1997-04-01 00:00:00 1997-05-01 00:00:00 1997-06-01 00:00:00 1997-07-01 00:00:00 1997-08-01 00:00:00 1997-09-01 00:00:00 1997-10-01 00:00:00 1997-11-01 00:00:00 1997-12-01 00:00:00 1998-01-01 00:00:00 1998-02-01 00:00:00 1998-03-01 00:00:00 1998-04-01 00:00:00 1998-05-01 00:00:00 1998-06-01 00:00:00
user_id
1 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
2 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
3 new unactive return active unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive unactive return unactive
4 new unactive unactive unactive unactive unactive unactive return unactive unactive unactive return unactive unactive unactive unactive unactive unactive
5 new active unactive return active active active unactive return unactive unactive return active unactive unactive unactive unactive unactive
6 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
7 new unactive unactive unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive return unactive unactive unactive
8 new active unactive unactive unactive return active unactive unactive unactive return active unactive unactive return unactive unactive unactive
9 new unactive unactive unactive return unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive return
10 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
11 new unactive return unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive
12 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
13 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
14 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
15 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
16 new unactive unactive unactive unactive unactive return unactive return unactive unactive unactive unactive unactive unactive unactive unactive unactive
17 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
18 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
19 new unactive unactive unactive unactive return unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
20 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
21 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
22 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
24 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive unactive
25 new unactive unactive unactive unactive unactive return active unactive return unactive unactive unactive unactive unactive return active active
26 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
27 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
28 new unactive return unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
29 new active active active active unactive return unactive return unactive return unactive unactive unactive unactive return unactive unactive
30 new active unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
23541 unreg unreg new active unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23542 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23543 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23544 unreg unreg new unactive return unactive unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive unactive
23545 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23546 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23547 unreg unreg new active unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23548 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23549 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23550 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23551 unreg unreg new unactive unactive return unactive return active unactive unactive unactive unactive unactive unactive unactive unactive unactive
23552 unreg unreg new active unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23553 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23554 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive
23555 unreg unreg new unactive unactive unactive unactive unactive unactive return unactive return unactive unactive unactive unactive return active
23556 unreg unreg new unactive unactive return active unactive return unactive unactive unactive return unactive unactive unactive unactive return
23557 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23558 unreg unreg new unactive return active unactive unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive
23559 unreg unreg new unactive return active unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23560 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23561 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive return unactive unactive unactive return unactive
23562 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23563 unreg unreg new unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive unactive unactive unactive unactive
23564 unreg unreg new unactive return unactive unactive unactive unactive unactive return unactive unactive unactive unactive unactive unactive unactive
23565 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23566 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23567 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23568 unreg unreg new active unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23569 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23570 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive

23570 rows × 18 columns

  • 每月【不同活躍】使用者的計數
    • purchase_status_ct = df_purchase_new.apply(lambda x : pd.value_counts(x)).fillna(0)
    • 轉置進行最終結果的檢視
purchase_status_ct = df_purchase_new.apply(lambda x : pd.value_counts(x)).fillna(0)
purchase_status_ct
month 1997-01-01 00:00:00 1997-02-01 00:00:00 1997-03-01 00:00:00 1997-04-01 00:00:00 1997-05-01 00:00:00 1997-06-01 00:00:00 1997-07-01 00:00:00 1997-08-01 00:00:00 1997-09-01 00:00:00 1997-10-01 00:00:00 1997-11-01 00:00:00 1997-12-01 00:00:00 1998-01-01 00:00:00 1998-02-01 00:00:00 1998-03-01 00:00:00 1998-04-01 00:00:00 1998-05-01 00:00:00 1998-06-01 00:00:00
active 0.0 1157.0 1681.0 1773.0 852.0 747.0 746.0 604.0 528.0 532.0 624.0 632.0 512.0 472.0 571.0 518.0 459.0 446.0
new 7846.0 8476.0 7248.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
return 0.0 0.0 595.0 1049.0 1362.0 1592.0 1434.0 1168.0 1211.0 1307.0 1404.0 1232.0 1025.0 1079.0 1489.0 919.0 1029.0 1060.0
unactive 0.0 6689.0 14046.0 20748.0 21356.0 21231.0 21390.0 21798.0 21831.0 21731.0 21542.0 21706.0 22033.0 22019.0 21510.0 22133.0 22082.0 22064.0
unreg 15724.0 7248.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
purchase_status_ct.T
active new return unactive unreg
month
1997-01-01 0.0 7846.0 0.0 0.0 15724.0
1997-02-01 1157.0 8476.0 0.0 6689.0 7248.0
1997-03-01 1681.0 7248.0 595.0 14046.0 0.0
1997-04-01 1773.0 0.0 1049.0 20748.0 0.0
1997-05-01 852.0 0.0 1362.0 21356.0 0.0
1997-06-01 747.0 0.0 1592.0 21231.0 0.0
1997-07-01 746.0 0.0 1434.0 21390.0 0.0
1997-08-01 604.0 0.0 1168.0 21798.0 0.0
1997-09-01 528.0 0.0 1211.0 21831.0 0.0
1997-10-01 532.0 0.0 1307.0 21731.0 0.0
1997-11-01 624.0 0.0 1404.0 21542.0 0.0
1997-12-01 632.0 0.0 1232.0 21706.0 0.0
1998-01-01 512.0 0.0 1025.0 22033.0 0.0
1998-02-01 472.0 0.0 1079.0 22019.0 0.0
1998-03-01 571.0 0.0 1489.0 21510.0 0.0
1998-04-01 518.0 0.0 919.0 22133.0 0.0
1998-05-01 459.0 0.0 1029.0 22082.0 0.0
1998-06-01 446.0 0.0 1060.0 22064.0 0.0