用python分析簡訊資料
阿新 • • 發佈:2019-02-09
原始資料片段展示:
來電,2017/1/5 上午11:55,95599,【中國農業銀行】您尾號9672的農行賬戶於01月05日11時54分完成一筆支付寶交易,金額為-18.00,餘額3905.35。,
來電,2017/1/5 下午12:10,95599,【中國農業銀行】您尾號9672的農行賬戶於01月05日12時10分完成一筆現支交易,金額為-200.00,餘額3705.35。,
來電,2017/1/5 下午12:35,95599,【中國農業銀行】您尾號9672的農行賬戶於01月05日12時35分完成一筆支付寶交易,金額為-50.00,餘額3650.35。,
來電,2017/1/5 下午1:47,95599,【中國農業銀行】您尾號9672的農行賬戶於01 月05日13時47分完成一筆支付寶浙交易,金額為-199.00,餘額3451.35。,
來電,2017/1/5 下午2:45,95599,【中國農業銀行】您尾號9672的農行賬戶於01月05日14時45分完成一筆消費交易,金額為-199.00,餘額3252.35。,
來電,2017/1/5 下午4:21,95599,【中國農業銀行】您尾號9672的農行賬戶於01月05日16時21分完成一筆支付寶浙交易,金額為-329.00,餘額2923.35。,
來電,2017/1/5 下午5:56,95599,【中國農業銀行】您尾號9672的農行賬戶於01月05日17時56分完成一筆支付寶交易,金額為-20.00,餘額2903.35。,
來電,2017/1/9 上午10 :33,106906615500,【京東】還剩最後兩天!PLUS會員新年特權,開通立送2000京豆,獨享全品類神券,確定要錯過? dc.jd.com/auVjQQ 回TD退訂,
來電,2017/1/10 下午1:10,106980005618000055,【京東】我是京東配送員:韓富韓,您的訂單正在配送途中,請準備收貨,聯絡電話:15005125027。,
來電,2017/1/10 下午3:13,106906615500,【京東】等著放假,忘了您的PLUS賬戶中還有超過2000待返京豆?現在開通PLUS正式使用者即可到賬,還可享受高於普通使用者10倍的購物回饋,隨時京豆拿到手軟。另有全年360元運費補貼、專享商品、專屬客服等權益。戳 dc.jd.com/XhuKQQ 開通。回TD退訂,
(資料來源-手機簡訊匯出CVS格式)
目的
第一階段的目的:分析基於中國農業銀行的簡訊提醒,基於時間和銀行賬戶餘額的一個圖表。
二階段:想辦法表現消費原因,消費金額。
三階段:在處理語言方面可以靈活變動,不是簡單地切片處理,而是基於處理自然語言的理解文意
以下是第一階段的程式碼。如有問題或建議,歡迎交流!
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun Jul 22 22:13:20 2018
@author: mrzhang
"""
import csv
import os
import matplotlib.pyplot as plt
class DealMessage:
def __init__(self):
self.home_path = os.getcwd() # get absolute path
self.filename = self.home_path + "/message.csv"
def get_cvs_list(self):
''' get data for cvs '''
with open(self.filename) as f: # open file
reader = csv.reader(f)
list_read = list(reader)
return list_read
def get_yinghang_message_list(self):
''' del other data likes name, phone and others '''
total_list = self.get_cvs_list()
money_list = []
for each_line in total_list:
if each_line[2] == '95599':
del each_line[0] # remove useless data
del each_line[1]
del each_line[2]
each_line_list = each_line[1][37:].split(',')
each_line_list.insert(0, each_line[0])
money_list.append(each_line_list) # add to a new List
return money_list
def get_type_by_parameter(self, num):
''' there are 2 types of data, use len of data to distinguish it '''
money_list = self.get_yinghang_message_list()
first_list = []
for each in money_list:
if len(each) == num:
first_list.append(each)
return first_list
def deal_time_form(self, messages):
''' transform time form like 1995/02/07/02/23 '''
for each in messages:
correct_time = each[0].split()
date = correct_time[0]
time = correct_time[1]
time = time[2:]
shi, feng = time.split(":")
if time[0:2] == "下":
shi = int(shi) + 12
final_time = date + "/" + str(shi) + "/" + feng
each.insert(0, final_time)
def choose_message_by_time(self, is_before_0223):
''' reduce the difference betwoon different data, deal with time and money at the same time.'''
if is_before_0223:
num = 4
remove_num = 2
else:
num = 3
remove_num = 5
messages = self.get_type_by_parameter(num)
for each in messages:
# deal with time , transform time form like 1995/12/17/02/23
correct_time = each[0].split()
date = correct_time[0]
time = correct_time[1]
time = time[2:]
shi, feng = time.split(":")
if time[0:2] == "下": # transform time-form into 24h-form
shi = int(shi) + 12
final_time = date + "/" + str(shi) + "/" + feng
each.insert(0, final_time)
# deal with money
money = each[-1][remove_num:][0:-1]
each.insert(1, money)
return messages
def get_x_y(self):
''' get money and time '''
messages = self.choose_message_by_time(True)+self.choose_message_by_time(False)
time_list = []
money_list = []
for each in messages:
time_list.append(each[0])
money_list.append(float(each[1]))
return time_list[35::3], money_list
def draw_picture(self):
''' draw a picture about money change '''
x, y = self.get_x_y()
plt.figure(figsize=(16, 4)) # Create figure object
plt.plot(y, 'r') # plot‘s paramter(x,y,color,width)
plt.xlabel("Time")
plt.ylabel("Money")
plt.title("money")
plt.grid(True)
plt.show() # show picture
plt.savefig("line.jpg") # save picture
m = DealMessage() # get a class object
m.draw_picture() # draw picture
程式執行:
隨意轉載,歡迎交流!