10 綜合實戰--使用者消費行為分析
阿新 • • 發佈:2021-06-22
import numpy as np
import pandas as pd
from pandas import DataFrame,Series
import matplotlib.pyplot as plt
#CDNOW_master.txt
第一部分:資料型別處理
- 資料載入
- 欄位含義:
- user_id:使用者ID
- order_dt:購買日期
- order_product:購買產品的數量
- order_amount:購買金額
- 欄位含義:
- 觀察資料
- 檢視資料的資料型別
- 資料中是否儲存在缺失值
- 將order_dt轉換成時間型別
- 檢視資料的統計描述
- 計算所有使用者購買商品的平均數量
- 計算所有使用者購買商品的平均花費
- 在源資料中新增一列表示月份:astype('datetime64[M]')
#資料的載入
df = pd.read_csv('./data/CDNOW_master.txt',header=None,sep='\s+',names=['user_id','order_dt','order_product','order_amount'])
df
user_id | order_dt | order_product | order_amount | |
---|---|---|---|---|
0 | 1 | 19970101 | 1 | 11.77 |
1 | 2 | 19970112 | 1 | 12.00 |
2 | 2 | 19970112 | 5 | 77.00 |
3 | 3 | 19970102 | 2 | 20.76 |
4 | 3 | 19970330 | 2 | 20.76 |
5 | 3 | 19970402 | 2 | 19.54 |
6 | 3 | 19971115 | 5 | 57.45 |
7 | 3 | 19971125 | 4 | 20.96 |
8 | 3 | 19980528 | 1 | 16.99 |
9 | 4 | 19970101 | 2 | 29.33 |
10 | 4 | 19970118 | 2 | 29.73 |
11 | 4 | 19970802 | 1 | 14.96 |
12 | 4 | 19971212 | 2 | 26.48 |
13 | 5 | 19970101 | 2 | 29.33 |
14 | 5 | 19970114 | 1 | 13.97 |
15 | 5 | 19970204 | 3 | 38.90 |
16 | 5 | 19970411 | 3 | 45.55 |
17 | 5 | 19970531 | 3 | 38.71 |
18 | 5 | 19970616 | 2 | 26.14 |
19 | 5 | 19970722 | 2 | 28.14 |
20 | 5 | 19970915 | 3 | 40.47 |
21 | 5 | 19971208 | 4 | 46.46 |
22 | 5 | 19971212 | 3 | 40.47 |
23 | 5 | 19980103 | 3 | 37.47 |
24 | 6 | 19970101 | 1 | 20.99 |
25 | 7 | 19970101 | 2 | 28.74 |
26 | 7 | 19971011 | 7 | 97.43 |
27 | 7 | 19980322 | 9 | 138.50 |
28 | 8 | 19970101 | 1 | 9.77 |
29 | 8 | 19970213 | 1 | 13.97 |
... | ... | ... | ... | ... |
69629 | 23556 | 19970927 | 3 | 31.47 |
69630 | 23556 | 19980103 | 2 | 28.98 |
69631 | 23556 | 19980607 | 2 | 28.98 |
69632 | 23557 | 19970325 | 1 | 14.37 |
69633 | 23558 | 19970325 | 2 | 28.13 |
69634 | 23558 | 19970518 | 3 | 45.51 |
69635 | 23558 | 19970624 | 2 | 23.74 |
69636 | 23558 | 19980225 | 4 | 48.22 |
69637 | 23559 | 19970325 | 2 | 23.54 |
69638 | 23559 | 19970518 | 3 | 35.31 |
69639 | 23559 | 19970627 | 3 | 52.80 |
69640 | 23560 | 19970325 | 1 | 18.36 |
69641 | 23561 | 19970325 | 2 | 30.92 |
69642 | 23561 | 19980128 | 1 | 15.49 |
69643 | 23561 | 19980529 | 3 | 37.05 |
69644 | 23562 | 19970325 | 2 | 29.33 |
69645 | 23563 | 19970325 | 1 | 10.77 |
69646 | 23563 | 19971004 | 2 | 47.98 |
69647 | 23564 | 19970325 | 1 | 11.77 |
69648 | 23564 | 19970521 | 1 | 11.77 |
69649 | 23564 | 19971130 | 3 | 46.47 |
69650 | 23565 | 19970325 | 1 | 11.77 |
69651 | 23566 | 19970325 | 2 | 36.00 |
69652 | 23567 | 19970325 | 1 | 20.97 |
69653 | 23568 | 19970325 | 1 | 22.97 |
69654 | 23568 | 19970405 | 4 | 83.74 |
69655 | 23568 | 19970422 | 1 | 14.99 |
69656 | 23569 | 19970325 | 2 | 25.74 |
69657 | 23570 | 19970325 | 3 | 51.12 |
69658 | 23570 | 19970326 | 2 | 42.96 |
69659 rows × 4 columns
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69659 entries, 0 to 69658
Data columns (total 4 columns):
user_id 69659 non-null int64
order_dt 69659 non-null int64
order_product 69659 non-null int64
order_amount 69659 non-null float64
dtypes: float64(1), int64(3)
memory usage: 2.1 MB
#將order_dt轉換成時間型別
df['order_dt'] = pd.to_datetime(df['order_dt'],format='%Y%m%d')
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69659 entries, 0 to 69658
Data columns (total 4 columns):
user_id 69659 non-null int64
order_dt 69659 non-null datetime64[ns]
order_product 69659 non-null int64
order_amount 69659 non-null float64
dtypes: datetime64[ns](1), float64(1), int64(2)
memory usage: 2.1 MB
#檢視資料的統計描述
df.describe()
user_id | order_product | order_amount | |
---|---|---|---|
count | 69659.000000 | 69659.000000 | 69659.000000 |
mean | 11470.854592 | 2.410040 | 35.893648 |
std | 6819.904848 | 2.333924 | 36.281942 |
min | 1.000000 | 1.000000 | 0.000000 |
25% | 5506.000000 | 1.000000 | 14.490000 |
50% | 11410.000000 | 2.000000 | 25.980000 |
75% | 17273.000000 | 3.000000 | 43.700000 |
max | 23570.000000 | 99.000000 | 1286.010000 |
#基於order_dt取出其中的月份
df['order_dt'].astype('datetime64[M]')
0 1997-01-01
1 1997-01-01
2 1997-01-01
3 1997-01-01
4 1997-03-01
5 1997-04-01
6 1997-11-01
7 1997-11-01
8 1998-05-01
9 1997-01-01
10 1997-01-01
11 1997-08-01
12 1997-12-01
13 1997-01-01
14 1997-01-01
15 1997-02-01
16 1997-04-01
17 1997-05-01
18 1997-06-01
19 1997-07-01
20 1997-09-01
21 1997-12-01
22 1997-12-01
23 1998-01-01
24 1997-01-01
25 1997-01-01
26 1997-10-01
27 1998-03-01
28 1997-01-01
29 1997-02-01
...
69629 1997-09-01
69630 1998-01-01
69631 1998-06-01
69632 1997-03-01
69633 1997-03-01
69634 1997-05-01
69635 1997-06-01
69636 1998-02-01
69637 1997-03-01
69638 1997-05-01
69639 1997-06-01
69640 1997-03-01
69641 1997-03-01
69642 1998-01-01
69643 1998-05-01
69644 1997-03-01
69645 1997-03-01
69646 1997-10-01
69647 1997-03-01
69648 1997-05-01
69649 1997-11-01
69650 1997-03-01
69651 1997-03-01
69652 1997-03-01
69653 1997-03-01
69654 1997-04-01
69655 1997-04-01
69656 1997-03-01
69657 1997-03-01
69658 1997-03-01
Name: order_dt, Length: 69659, dtype: datetime64[ns]
#在源資料中新增一列表示月份:astype('datetime64[M]')
df['month'] = df['order_dt'].astype('datetime64[M]')
df.head()
user_id | order_dt | order_product | order_amount | month | |
---|---|---|---|---|---|
0 | 1 | 1997-01-01 | 1 | 11.77 | 1997-01-01 |
1 | 2 | 1997-01-12 | 1 | 12.00 | 1997-01-01 |
2 | 2 | 1997-01-12 | 5 | 77.00 | 1997-01-01 |
3 | 3 | 1997-01-02 | 2 | 20.76 | 1997-01-01 |
4 | 3 | 1997-03-30 | 2 | 20.76 | 1997-03-01 |
第二部分:按月資料分析
- 使用者每月花費的總金額
- 繪製曲線圖展示
- 所有使用者每月的產品購買量
- 所有使用者每月的消費總次數
- 統計每月的消費人數
#使用者每月花費的總金額
df.groupby(by='month')['order_amount'].sum()
month
1997-01-01 299060.17
1997-02-01 379590.03
1997-03-01 393155.27
1997-04-01 142824.49
1997-05-01 107933.30
1997-06-01 108395.87
1997-07-01 122078.88
1997-08-01 88367.69
1997-09-01 81948.80
1997-10-01 89780.77
1997-11-01 115448.64
1997-12-01 95577.35
1998-01-01 76756.78
1998-02-01 77096.96
1998-03-01 108970.15
1998-04-01 66231.52
1998-05-01 70989.66
1998-06-01 76109.30
Name: order_amount, dtype: float64
# plt.plot(df.groupby(by='month')['order_amount'].sum())
df.groupby(by='month')['order_amount'].sum().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x111536c50>
#所有使用者每月的產品購買量
df.groupby(by='month')['order_product'].sum().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x1115d2978>
#所有使用者每月的消費總次數(原始資料中的一行資料表示一次消費記錄)
df.groupby(by='month')['user_id'].count()
month
1997-01-01 8928
1997-02-01 11272
1997-03-01 11598
1997-04-01 3781
1997-05-01 2895
1997-06-01 3054
1997-07-01 2942
1997-08-01 2320
1997-09-01 2296
1997-10-01 2562
1997-11-01 2750
1997-12-01 2504
1998-01-01 2032
1998-02-01 2026
1998-03-01 2793
1998-04-01 1878
1998-05-01 1985
1998-06-01 2043
Name: user_id, dtype: int64
#統計每月的消費人數(可能同一天一個使用者會消費多次) nunique表示統計去重後的個數
df.groupby(by='month')['user_id'].nunique()
month
1997-01-01 7846
1997-02-01 9633
1997-03-01 9524
1997-04-01 2822
1997-05-01 2214
1997-06-01 2339
1997-07-01 2180
1997-08-01 1772
1997-09-01 1739
1997-10-01 1839
1997-11-01 2028
1997-12-01 1864
1998-01-01 1537
1998-02-01 1551
1998-03-01 2060
1998-04-01 1437
1998-05-01 1488
1998-06-01 1506
Name: user_id, dtype: int64
第三部分:使用者個體消費資料分析
- 使用者消費總金額和消費總次數的統計描述
- 使用者消費金額和消費產品數量的散點圖
- 各個使用者消費總金額的直方分佈圖(消費金額在1000之內的分佈)
- 各個使用者消費的總數量的直方分佈圖(消費商品的數量在100次之內的分佈)
#使用者消費總金額和消費總次數的統計描述
df.groupby(by='user_id')['order_amount'].sum() #每一個使用者消費的總金額
user_id
1 11.77
2 89.00
3 156.46
4 100.50
5 385.61
6 20.99
7 264.67
8 197.66
9 95.85
10 39.31
11 58.55
12 57.06
13 72.94
14 29.92
15 52.87
16 79.87
17 73.22
18 14.96
19 175.12
20 653.01
21 75.11
22 14.37
23 24.74
24 57.77
25 137.53
26 102.69
27 135.87
28 90.99
29 435.81
30 28.34
...
23541 57.34
23542 77.43
23543 50.76
23544 134.63
23545 24.99
23546 13.97
23547 23.54
23548 23.54
23549 27.13
23550 25.28
23551 264.63
23552 49.38
23553 98.58
23554 36.37
23555 189.18
23556 203.00
23557 14.37
23558 145.60
23559 111.65
23560 18.36
23561 83.46
23562 29.33
23563 58.75
23564 70.01
23565 11.77
23566 36.00
23567 20.97
23568 121.70
23569 25.74
23570 94.08
Name: order_amount, Length: 23570, dtype: float64
#每一個使用者消費的總次數
df.groupby(by='user_id').count()['order_dt']
user_id
1 1
2 2
3 6
4 4
5 11
6 1
7 3
8 8
9 3
10 1
11 4
12 1
13 1
14 1
15 1
16 4
17 1
18 1
19 2
20 2
21 2
22 1
23 1
24 2
25 8
26 2
27 2
28 3
29 12
30 2
..
23541 2
23542 1
23543 1
23544 3
23545 1
23546 1
23547 2
23548 1
23549 1
23550 1
23551 6
23552 2
23553 2
23554 2
23555 5
23556 7
23557 1
23558 4
23559 3
23560 1
23561 3
23562 1
23563 2
23564 3
23565 1
23566 1
23567 1
23568 3
23569 1
23570 2
Name: order_dt, Length: 23570, dtype: int64
#使用者消費金額和消費產品數量的散點圖
user_amount_sum = df.groupby(by='user_id')['order_amount'].sum()
user_product_sum = df.groupby(by='user_id')['order_product'].sum()
plt.scatter(user_product_sum,user_amount_sum)
<matplotlib.collections.PathCollection at 0x112253588>
#各個使用者消費總金額的直方分佈圖(消費金額在1000之內的分佈)
df.groupby(by='user_id').sum().query('order_amount <= 1000')['order_amount']
df.groupby(by='user_id').sum().query('order_amount <= 1000')['order_amount'].hist()
<matplotlib.axes._subplots.AxesSubplot at 0x1122f1d30>
#各個使用者消費的總數量的直方分佈圖(消費商品的數量在100次之內的分佈)
df.groupby(by='user_id').sum().query('order_product <= 100')['order_product'].hist()
<matplotlib.axes._subplots.AxesSubplot at 0x11491f828>
第四部分:使用者消費行為分析
- 使用者第一次消費的月份分佈,和人數統計
- 繪製線形圖
- 使用者最後一次消費的時間分佈,和人數統計
- 繪製線形圖
- 新老客戶的佔比
- 消費一次為新使用者
- 消費多次為老使用者
- 分析出每一個使用者的第一個消費和最後一次消費的時間
- agg(['func1','func2']):對分組後的結果進行指定聚合
- 分析出新老客戶的消費比例
- 分析出每一個使用者的第一個消費和最後一次消費的時間
- 使用者分層
- 分析得出每個使用者的總購買量和總消費金額and最近一次消費的時間的表格rfm
- RFM模型設計
- R表示客戶最近一次交易時間的間隔。
- /np.timedelta64(1,'D'):去除days
- F表示客戶購買商品的總數量,F值越大,表示客戶交易越頻繁,反之則表示客戶交易不夠活躍。
- M表示客戶交易的金額。M值越大,表示客戶價值越高,反之則表示客戶價值越低。
- 將R,F,M作用到rfm表中
- R表示客戶最近一次交易時間的間隔。
- 根據價值分層,將使用者分為:
- 重要價值客戶
- 重要保持客戶
- 重要挽留客戶
- 重要發展客戶
- 一般價值客戶
- 一般保持客戶
- 一般挽留客戶
- 一般發展客戶
- 使用已有的分層模型即可rfm_func
#使用者第一次消費的月份分佈,和人數統計
#第一次消費的月份:每一個使用者消費月份的最小值就是該使用者第一次消費的月份
df.groupby(by='user_id')['month'].min()
user_id
1 1997-01-01
2 1997-01-01
3 1997-01-01
4 1997-01-01
5 1997-01-01
6 1997-01-01
7 1997-01-01
8 1997-01-01
9 1997-01-01
10 1997-01-01
11 1997-01-01
12 1997-01-01
13 1997-01-01
14 1997-01-01
15 1997-01-01
16 1997-01-01
17 1997-01-01
18 1997-01-01
19 1997-01-01
20 1997-01-01
21 1997-01-01
22 1997-01-01
23 1997-01-01
24 1997-01-01
25 1997-01-01
26 1997-01-01
27 1997-01-01
28 1997-01-01
29 1997-01-01
30 1997-01-01
...
23541 1997-03-01
23542 1997-03-01
23543 1997-03-01
23544 1997-03-01
23545 1997-03-01
23546 1997-03-01
23547 1997-03-01
23548 1997-03-01
23549 1997-03-01
23550 1997-03-01
23551 1997-03-01
23552 1997-03-01
23553 1997-03-01
23554 1997-03-01
23555 1997-03-01
23556 1997-03-01
23557 1997-03-01
23558 1997-03-01
23559 1997-03-01
23560 1997-03-01
23561 1997-03-01
23562 1997-03-01
23563 1997-03-01
23564 1997-03-01
23565 1997-03-01
23566 1997-03-01
23567 1997-03-01
23568 1997-03-01
23569 1997-03-01
23570 1997-03-01
Name: month, Length: 23570, dtype: datetime64[ns]
df.groupby(by='user_id')['month'].min().value_counts() #人數的統計
df.groupby(by='user_id')['month'].min().value_counts().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x11dddba90>
#使用者最後一次消費的時間分佈,和人數統計
#使用者消費月份的最大值就是使用者最後一次消費的月份
df.groupby(by='user_id')['month'].max().value_counts().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x11e35ba58>
#新老客戶的佔比
#消費一次為新使用者,消費多次為老使用者
#如何獲知使用者是否為第一次消費?可以根據使用者的消費時間進行判定?
#如果使用者的第一次消費時間和最後一次消費時間一樣,則該使用者只消費了一次為新使用者,否則為老使用者
new_old_user_df = df.groupby(by='user_id')['order_dt'].agg(['min','max'])#agg對分組後的結果進行多種指定聚合
new_old_user_df['min'] == new_old_user_df['max'] #True新使用者,False老使用者
#統計True和False的個數
(new_old_user_df['min'] == new_old_user_df['max']).value_counts()
True 12054
False 11516
dtype: int64
#分析得出每個使用者的總購買量和總消費金額and最近一次消費的時間的表格rfm
rfm = df.pivot_table(index='user_id',aggfunc={'order_product':'sum','order_amount':'sum','order_dt':"max"})
rfm
order_amount | order_dt | order_product | |
---|---|---|---|
user_id | |||
1 | 11.77 | 1997-01-01 | 1 |
2 | 89.00 | 1997-01-12 | 6 |
3 | 156.46 | 1998-05-28 | 16 |
4 | 100.50 | 1997-12-12 | 7 |
5 | 385.61 | 1998-01-03 | 29 |
6 | 20.99 | 1997-01-01 | 1 |
7 | 264.67 | 1998-03-22 | 18 |
8 | 197.66 | 1998-03-29 | 18 |
9 | 95.85 | 1998-06-08 | 6 |
10 | 39.31 | 1997-01-21 | 3 |
11 | 58.55 | 1998-02-20 | 4 |
12 | 57.06 | 1997-01-01 | 4 |
13 | 72.94 | 1997-01-01 | 4 |
14 | 29.92 | 1997-01-01 | 2 |
15 | 52.87 | 1997-01-01 | 4 |
16 | 79.87 | 1997-09-10 | 8 |
17 | 73.22 | 1997-01-01 | 5 |
18 | 14.96 | 1997-01-04 | 1 |
19 | 175.12 | 1997-06-10 | 11 |
20 | 653.01 | 1997-01-18 | 46 |
21 | 75.11 | 1997-01-13 | 4 |
22 | 14.37 | 1997-01-01 | 1 |
23 | 24.74 | 1997-01-01 | 2 |
24 | 57.77 | 1998-01-20 | 4 |
25 | 137.53 | 1998-06-08 | 12 |
26 | 102.69 | 1997-01-26 | 6 |
27 | 135.87 | 1997-01-12 | 10 |
28 | 90.99 | 1997-03-08 | 7 |
29 | 435.81 | 1998-04-26 | 28 |
30 | 28.34 | 1997-02-14 | 2 |
... | ... | ... | ... |
23541 | 57.34 | 1997-04-02 | 2 |
23542 | 77.43 | 1997-03-25 | 5 |
23543 | 50.76 | 1997-03-25 | 2 |
23544 | 134.63 | 1998-01-24 | 12 |
23545 | 24.99 | 1997-03-25 | 1 |
23546 | 13.97 | 1997-03-25 | 1 |
23547 | 23.54 | 1997-04-07 | 2 |
23548 | 23.54 | 1997-03-25 | 2 |
23549 | 27.13 | 1997-03-25 | 2 |
23550 | 25.28 | 1997-03-25 | 2 |
23551 | 264.63 | 1997-09-11 | 12 |
23552 | 49.38 | 1997-04-03 | 4 |
23553 | 98.58 | 1997-03-28 | 8 |
23554 | 36.37 | 1998-02-01 | 3 |
23555 | 189.18 | 1998-06-10 | 14 |
23556 | 203.00 | 1998-06-07 | 15 |
23557 | 14.37 | 1997-03-25 | 1 |
23558 | 145.60 | 1998-02-25 | 11 |
23559 | 111.65 | 1997-06-27 | 8 |
23560 | 18.36 | 1997-03-25 | 1 |
23561 | 83.46 | 1998-05-29 | 6 |
23562 | 29.33 | 1997-03-25 | 2 |
23563 | 58.75 | 1997-10-04 | 3 |
23564 | 70.01 | 1997-11-30 | 5 |
23565 | 11.77 | 1997-03-25 | 1 |
23566 | 36.00 | 1997-03-25 | 2 |
23567 | 20.97 | 1997-03-25 | 1 |
23568 | 121.70 | 1997-04-22 | 6 |
23569 | 25.74 | 1997-03-25 | 2 |
23570 | 94.08 | 1997-03-26 | 5 |
23570 rows × 3 columns
#R表示客戶最近一次交易時間的間隔
max_dt = df['order_dt'].max() #今天的日期
#每一個使用者最後一次交易的時間
-(df.groupby(by='user_id')['order_dt'].max() - max_dt)
rfm['R'] = -(df.groupby(by='user_id')['order_dt'].max() - max_dt)/np.timedelta64(1,'D') # 將R列的days字尾去掉 /np.timedelta64(1,'D')
rfm.drop(labels='order_dt',axis=1,inplace=True)
rfm.columns = ['M','F','R'] # 修改列標籤名
rfm.head()
M | F | R | |
---|---|---|---|
user_id | |||
1 | 11.77 | 1 | 545.0 |
2 | 89.00 | 6 | 534.0 |
3 | 156.46 | 16 | 33.0 |
4 | 100.50 | 7 | 200.0 |
5 | 385.61 | 29 | 178.0 |
def rfm_func(x):
#儲存儲存的是三個字串形式的0或者1
level = x.map(lambda x :'1' if x >= 0 else '0')
label = level.R + level.F + level.M
d = {
'111':'重要價值客戶',
'011':'重要保持客戶',
'101':'重要挽留客戶',
'001':'重要發展客戶',
'110':'一般價值客戶',
'010':'一般保持客戶',
'100':'一般挽留客戶',
'000':'一般發展客戶'
}
result = d[label]
return result
#df.apply(func):可以對df中的行或者列進行某種(func)形式的運算
rfm['label'] = rfm.apply(lambda x : x - x.mean()).apply(rfm_func,axis = 1)
rfm.head()
M | F | R | label | |
---|---|---|---|---|
user_id | ||||
1 | 11.77 | 1 | 545.0 | 一般挽留客戶 |
2 | 89.00 | 6 | 534.0 | 一般挽留客戶 |
3 | 156.46 | 16 | 33.0 | 重要保持客戶 |
4 | 100.50 | 7 | 200.0 | 一般發展客戶 |
5 | 385.61 | 29 | 178.0 | 重要保持客戶 |
第五部分:使用者的生命週期
- 將使用者劃分為活躍使用者和其他使用者
- 統計每個使用者每個月的消費次數
- 統計每個使用者每個月是否消費,消費記錄為1否則記錄為0
- 知識點:DataFrame的apply和applymap的區別
- applymap:傳入每個單個元素返回df
- 將函式做用於DataFrame中的所有元素(elements)
- apply:返回Series
- apply()將一個函式作用於DataFrame中的每個行或者列
- 知識點:DataFrame的apply和applymap的區別
- 將使用者按照每一個月份分成:
- unreg:觀望使用者(前兩月沒買,第三個月才第一次買,則使用者前兩個月為觀望使用者)
- unactive:首月購買後,後序月份沒有購買則在沒有購買的月份中該使用者的為非活躍使用者
- new:當前月就進行首次購買的使用者在當前月為新使用者
- active:連續月份購買的使用者在這些月中為活躍使用者
- return:購買之後間隔n月再次購買的第一個月份為該月份的回頭客
#統計每個使用者每個月的消費次數
user_month_count_df = df.pivot_table(index='user_id',values='order_dt',aggfunc='count',columns='month').fillna(0)
user_month_count_df.head()
month | 1997-01-01 00:00:00 | 1997-02-01 00:00:00 | 1997-03-01 00:00:00 | 1997-04-01 00:00:00 | 1997-05-01 00:00:00 | 1997-06-01 00:00:00 | 1997-07-01 00:00:00 | 1997-08-01 00:00:00 | 1997-09-01 00:00:00 | 1997-10-01 00:00:00 | 1997-11-01 00:00:00 | 1997-12-01 00:00:00 | 1998-01-01 00:00:00 | 1998-02-01 00:00:00 | 1998-03-01 00:00:00 | 1998-04-01 00:00:00 | 1998-05-01 00:00:00 | 1998-06-01 00:00:00 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
user_id | ||||||||||||||||||
1 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
4 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 | 2.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 2.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
#統計每個使用者每個月是否消費,消費記錄為1否則記錄為0
df_purchase = user_month_count_df.applymap(lambda x:1 if x >= 1 else 0)
month | 1997-01-01 00:00:00 | 1997-02-01 00:00:00 | 1997-03-01 00:00:00 | 1997-04-01 00:00:00 | 1997-05-01 00:00:00 | 1997-06-01 00:00:00 | 1997-07-01 00:00:00 | 1997-08-01 00:00:00 | 1997-09-01 00:00:00 | 1997-10-01 00:00:00 | 1997-11-01 00:00:00 | 1997-12-01 00:00:00 | 1998-01-01 00:00:00 | 1998-02-01 00:00:00 | 1998-03-01 00:00:00 | 1998-04-01 00:00:00 | 1998-05-01 00:00:00 | 1998-06-01 00:00:00 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
user_id | ||||||||||||||||||
1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
6 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
7 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
8 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
9 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
10 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
11 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
12 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
13 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
14 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
15 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
16 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
17 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
18 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
19 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
20 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
24 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
25 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
26 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
27 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
28 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
29 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
30 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
23541 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23542 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23543 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23544 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
23545 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23546 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23547 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23548 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23549 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23550 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23551 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23552 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23553 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23554 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
23555 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 |
23556 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
23557 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23558 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
23559 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23560 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23561 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
23562 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23563 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23564 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23565 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23566 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23567 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23568 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23569 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23570 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23570 rows × 18 columns
df_purchase.head()
month | 1997-01-01 00:00:00 | 1997-02-01 00:00:00 | 1997-03-01 00:00:00 | 1997-04-01 00:00:00 | 1997-05-01 00:00:00 | 1997-06-01 00:00:00 | 1997-07-01 00:00:00 | 1997-08-01 00:00:00 | 1997-09-01 00:00:00 | 1997-10-01 00:00:00 | 1997-11-01 00:00:00 | 1997-12-01 00:00:00 | 1998-01-01 00:00:00 | 1998-02-01 00:00:00 | 1998-03-01 00:00:00 | 1998-04-01 00:00:00 | 1998-05-01 00:00:00 | 1998-06-01 00:00:00 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
user_id | ||||||||||||||||||
1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
#將df_purchase中的原始資料0和1修改為new,unactive......,返回新的df叫做df_purchase_new
#固定演算法
def active_status(data):
status = []#某個使用者每一個月的活躍度
for i in range(18):
#若本月沒有消費
if data[i] == 0:
if len(status) > 0:
if status[i-1] == 'unreg':
status.append('unreg')
else:
status.append('unactive')
else:
status.append('unreg')
#若本月消費
else:
if len(status) == 0:
status.append('new')
else:
if status[i-1] == 'unactive':
status.append('return')
elif status[i-1] == 'unreg':
status.append('new')
else:
status.append('active')
return status
pivoted_status = df_purchase.apply(active_status,axis = 1)
pivoted_status.head()
user_id
1 [new, unactive, unactive, unactive, unactive, ...
2 [new, unactive, unactive, unactive, unactive, ...
3 [new, unactive, return, active, unactive, unac...
4 [new, unactive, unactive, unactive, unactive, ...
5 [new, active, unactive, return, active, active...
dtype: object
df_purchase_new = DataFrame(data=pivoted_status.values.tolist(),index=df_purchase.index,columns=df_purchase.columns)
df_purchase_new
month | 1997-01-01 00:00:00 | 1997-02-01 00:00:00 | 1997-03-01 00:00:00 | 1997-04-01 00:00:00 | 1997-05-01 00:00:00 | 1997-06-01 00:00:00 | 1997-07-01 00:00:00 | 1997-08-01 00:00:00 | 1997-09-01 00:00:00 | 1997-10-01 00:00:00 | 1997-11-01 00:00:00 | 1997-12-01 00:00:00 | 1998-01-01 00:00:00 | 1998-02-01 00:00:00 | 1998-03-01 00:00:00 | 1998-04-01 00:00:00 | 1998-05-01 00:00:00 | 1998-06-01 00:00:00 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
user_id | ||||||||||||||||||
1 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
2 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
3 | new | unactive | return | active | unactive | unactive | unactive | unactive | unactive | unactive | return | unactive | unactive | unactive | unactive | unactive | return | unactive |
4 | new | unactive | unactive | unactive | unactive | unactive | unactive | return | unactive | unactive | unactive | return | unactive | unactive | unactive | unactive | unactive | unactive |
5 | new | active | unactive | return | active | active | active | unactive | return | unactive | unactive | return | active | unactive | unactive | unactive | unactive | unactive |
6 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
7 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | return | unactive | unactive | unactive | unactive | return | unactive | unactive | unactive |
8 | new | active | unactive | unactive | unactive | return | active | unactive | unactive | unactive | return | active | unactive | unactive | return | unactive | unactive | unactive |
9 | new | unactive | unactive | unactive | return | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | return |
10 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
11 | new | unactive | return | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | return | unactive | unactive | unactive | unactive |
12 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
13 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
14 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
15 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
16 | new | unactive | unactive | unactive | unactive | unactive | return | unactive | return | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
17 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
18 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
19 | new | unactive | unactive | unactive | unactive | return | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
20 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
21 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
22 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
24 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | return | unactive | unactive | unactive | unactive | unactive |
25 | new | unactive | unactive | unactive | unactive | unactive | return | active | unactive | return | unactive | unactive | unactive | unactive | unactive | return | active | active |
26 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
27 | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
28 | new | unactive | return | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
29 | new | active | active | active | active | unactive | return | unactive | return | unactive | return | unactive | unactive | unactive | unactive | return | unactive | unactive |
30 | new | active | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
23541 | unreg | unreg | new | active | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23542 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23543 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23544 | unreg | unreg | new | unactive | return | unactive | unactive | unactive | unactive | unactive | unactive | unactive | return | unactive | unactive | unactive | unactive | unactive |
23545 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23546 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23547 | unreg | unreg | new | active | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23548 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23549 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23550 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23551 | unreg | unreg | new | unactive | unactive | return | unactive | return | active | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23552 | unreg | unreg | new | active | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23553 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23554 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | return | unactive | unactive | unactive | unactive |
23555 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | return | unactive | return | unactive | unactive | unactive | unactive | return | active |
23556 | unreg | unreg | new | unactive | unactive | return | active | unactive | return | unactive | unactive | unactive | return | unactive | unactive | unactive | unactive | return |
23557 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23558 | unreg | unreg | new | unactive | return | active | unactive | unactive | unactive | unactive | unactive | unactive | unactive | return | unactive | unactive | unactive | unactive |
23559 | unreg | unreg | new | unactive | return | active | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23560 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23561 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | return | unactive | unactive | unactive | return | unactive |
23562 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23563 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | return | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23564 | unreg | unreg | new | unactive | return | unactive | unactive | unactive | unactive | unactive | return | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23565 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23566 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23567 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23568 | unreg | unreg | new | active | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23569 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23570 | unreg | unreg | new | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive | unactive |
23570 rows × 18 columns
- 每月【不同活躍】使用者的計數
- purchase_status_ct = df_purchase_new.apply(lambda x : pd.value_counts(x)).fillna(0)
- 轉置進行最終結果的檢視
purchase_status_ct = df_purchase_new.apply(lambda x : pd.value_counts(x)).fillna(0)
purchase_status_ct
month | 1997-01-01 00:00:00 | 1997-02-01 00:00:00 | 1997-03-01 00:00:00 | 1997-04-01 00:00:00 | 1997-05-01 00:00:00 | 1997-06-01 00:00:00 | 1997-07-01 00:00:00 | 1997-08-01 00:00:00 | 1997-09-01 00:00:00 | 1997-10-01 00:00:00 | 1997-11-01 00:00:00 | 1997-12-01 00:00:00 | 1998-01-01 00:00:00 | 1998-02-01 00:00:00 | 1998-03-01 00:00:00 | 1998-04-01 00:00:00 | 1998-05-01 00:00:00 | 1998-06-01 00:00:00 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
active | 0.0 | 1157.0 | 1681.0 | 1773.0 | 852.0 | 747.0 | 746.0 | 604.0 | 528.0 | 532.0 | 624.0 | 632.0 | 512.0 | 472.0 | 571.0 | 518.0 | 459.0 | 446.0 |
new | 7846.0 | 8476.0 | 7248.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
return | 0.0 | 0.0 | 595.0 | 1049.0 | 1362.0 | 1592.0 | 1434.0 | 1168.0 | 1211.0 | 1307.0 | 1404.0 | 1232.0 | 1025.0 | 1079.0 | 1489.0 | 919.0 | 1029.0 | 1060.0 |
unactive | 0.0 | 6689.0 | 14046.0 | 20748.0 | 21356.0 | 21231.0 | 21390.0 | 21798.0 | 21831.0 | 21731.0 | 21542.0 | 21706.0 | 22033.0 | 22019.0 | 21510.0 | 22133.0 | 22082.0 | 22064.0 |
unreg | 15724.0 | 7248.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
purchase_status_ct.T
active | new | return | unactive | unreg | |
---|---|---|---|---|---|
month | |||||
1997-01-01 | 0.0 | 7846.0 | 0.0 | 0.0 | 15724.0 |
1997-02-01 | 1157.0 | 8476.0 | 0.0 | 6689.0 | 7248.0 |
1997-03-01 | 1681.0 | 7248.0 | 595.0 | 14046.0 | 0.0 |
1997-04-01 | 1773.0 | 0.0 | 1049.0 | 20748.0 | 0.0 |
1997-05-01 | 852.0 | 0.0 | 1362.0 | 21356.0 | 0.0 |
1997-06-01 | 747.0 | 0.0 | 1592.0 | 21231.0 | 0.0 |
1997-07-01 | 746.0 | 0.0 | 1434.0 | 21390.0 | 0.0 |
1997-08-01 | 604.0 | 0.0 | 1168.0 | 21798.0 | 0.0 |
1997-09-01 | 528.0 | 0.0 | 1211.0 | 21831.0 | 0.0 |
1997-10-01 | 532.0 | 0.0 | 1307.0 | 21731.0 | 0.0 |
1997-11-01 | 624.0 | 0.0 | 1404.0 | 21542.0 | 0.0 |
1997-12-01 | 632.0 | 0.0 | 1232.0 | 21706.0 | 0.0 |
1998-01-01 | 512.0 | 0.0 | 1025.0 | 22033.0 | 0.0 |
1998-02-01 | 472.0 | 0.0 | 1079.0 | 22019.0 | 0.0 |
1998-03-01 | 571.0 | 0.0 | 1489.0 | 21510.0 | 0.0 |
1998-04-01 | 518.0 | 0.0 | 919.0 | 22133.0 | 0.0 |
1998-05-01 | 459.0 | 0.0 | 1029.0 | 22082.0 | 0.0 |
1998-06-01 | 446.0 | 0.0 | 1060.0 | 22064.0 | 0.0 |