1. 程式人生 > >用python分析簡訊資料

用python分析簡訊資料

原始資料片段展示:

來電,2017/1/5 上午11:55,95599,【中國農業銀行】您尾號9672的農行賬戶於01051154分完成一筆支付寶交易,金額為-18.00,餘額3905.35。,
來電,2017/1/5 下午12:10,95599,【中國農業銀行】您尾號9672的農行賬戶於01051210分完成一筆現支交易,金額為-200.00,餘額3705.35。,
來電,2017/1/5 下午12:35,95599,【中國農業銀行】您尾號9672的農行賬戶於01051235分完成一筆支付寶交易,金額為-50.00,餘額3650.35。,
來電,2017/1/5 下午1:47,95599,【中國農業銀行】您尾號9672的農行賬戶於01
051347分完成一筆支付寶浙交易,金額為-199.00,餘額3451.35。, 來電,2017/1/5 下午2:45,95599,【中國農業銀行】您尾號9672的農行賬戶於01051445分完成一筆消費交易,金額為-199.00,餘額3252.35。, 來電,2017/1/5 下午4:21,95599,【中國農業銀行】您尾號9672的農行賬戶於01051621分完成一筆支付寶浙交易,金額為-329.00,餘額2923.35。, 來電,2017/1/5 下午5:56,95599,【中國農業銀行】您尾號9672的農行賬戶於01051756分完成一筆支付寶交易,金額為-20.00,餘額2903.35。, 來電,2017/1/9 上午10
:33,106906615500,【京東】還剩最後兩天!PLUS會員新年特權,開通立送2000京豆,獨享全品類神券,確定要錯過? dc.jd.com/auVjQQ 回TD退訂, 來電,2017/1/10 下午1:10,106980005618000055,【京東】我是京東配送員:韓富韓,您的訂單正在配送途中,請準備收貨,聯絡電話:15005125027。, 來電,2017/1/10 下午3:13,106906615500,【京東】等著放假,忘了您的PLUS賬戶中還有超過2000待返京豆?現在開通PLUS正式使用者即可到賬,還可享受高於普通使用者10倍的購物回饋,隨時京豆拿到手軟。另有全年360元運費補貼、專享商品、專屬客服等權益。戳 dc.jd.com/XhuKQQ 開通。回TD退訂,

(資料來源-手機簡訊匯出CVS格式)

目的

第一階段的目的:分析基於中國農業銀行的簡訊提醒,基於時間和銀行賬戶餘額的一個圖表。
二階段:想辦法表現消費原因,消費金額。
三階段:在處理語言方面可以靈活變動,不是簡單地切片處理,而是基於處理自然語言的理解文意

以下是第一階段的程式碼。如有問題或建議,歡迎交流!

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun Jul 22 22:13:20 2018

@author: mrzhang
"""

import csv
import os
import matplotlib.pyplot as plt


class DealMessage:

    def __init__(self):
        self.home_path = os.getcwd() # get absolute path
        self.filename = self.home_path + "/message.csv" 

    def get_cvs_list(self):
        ''' get data for cvs '''
        with open(self.filename) as f: # open file
            reader = csv.reader(f)
            list_read = list(reader)
        return list_read

    def get_yinghang_message_list(self):
        ''' del other data likes name, phone and others '''
        total_list = self.get_cvs_list()
        money_list = []
        for each_line in total_list:
            if each_line[2] == '95599':
                del each_line[0] # remove useless data
                del each_line[1]
                del each_line[2]
                each_line_list = each_line[1][37:].split(',')
                each_line_list.insert(0, each_line[0])
                money_list.append(each_line_list) # add to a new List
        return money_list

    def get_type_by_parameter(self, num):
        ''' there are 2 types of data, use len of data to distinguish it '''
        money_list = self.get_yinghang_message_list()
        first_list = []
        for each in money_list:
            if len(each) == num:
                first_list.append(each)
        return first_list

    def deal_time_form(self, messages):
        ''' transform time form like 1995/02/07/02/23 '''
        for each in messages:
            correct_time = each[0].split()
            date = correct_time[0]
            time = correct_time[1]
            time = time[2:]
            shi, feng = time.split(":")
            if time[0:2] == "下":
                shi = int(shi) + 12
            final_time = date + "/" + str(shi) + "/" + feng
            each.insert(0, final_time)

    def choose_message_by_time(self, is_before_0223):
        ''' reduce the difference betwoon different data, deal with time and money at the same time.'''
        if is_before_0223:
            num = 4
            remove_num = 2
        else:
            num = 3
            remove_num = 5
        messages = self.get_type_by_parameter(num)
        for each in messages:
            # deal with time , transform time form like 1995/12/17/02/23 
            correct_time = each[0].split() 
            date = correct_time[0]
            time = correct_time[1]
            time = time[2:]
            shi, feng = time.split(":")
            if time[0:2] == "下": # transform time-form into 24h-form
                shi = int(shi) + 12
            final_time = date + "/" + str(shi) + "/" + feng
            each.insert(0, final_time)
            # deal with money
            money = each[-1][remove_num:][0:-1]
            each.insert(1, money)
        return messages

    def get_x_y(self):
        ''' get money and time  '''
        messages = self.choose_message_by_time(True)+self.choose_message_by_time(False)
        time_list = []
        money_list = []
        for each in messages:
            time_list.append(each[0])
            money_list.append(float(each[1]))
        return time_list[35::3], money_list

    def draw_picture(self):
        ''' draw a picture about money change '''
        x, y = self.get_x_y()
        plt.figure(figsize=(16, 4))  # Create figure object
        plt.plot(y, 'r')  # plot‘s paramter(x,y,color,width)
        plt.xlabel("Time")  
        plt.ylabel("Money") 
        plt.title("money")  
        plt.grid(True) 

        plt.show()  # show picture
        plt.savefig("line.jpg")  # save picture

m = DealMessage() # get a class object
m.draw_picture() # draw picture

程式執行:
結果圖

隨意轉載,歡迎交流!