輿情分析專案-重慶公交墜江原因

阿新 • • 發佈：2018-11-04

輿情分析專案

1、分析事件：重慶公交墜江原因
2、分析物件：
（1）網友評論（初級分類-分詞匹配；高階分類-自然語言識別，對映人類情感和意圖，比如：積極、消極、無奈、諷刺、建設、謾罵、理性分析、事後、和事佬等）
（2）評論者的公網IP（依據公網IP識別不同地域的網路使用者，對本次事件的關注度）
（3）評論者的省份屬性（同上）
3、資料來源：
新浪評論：http://comment5.news.sina.com.cn/comment/skin/default.html?channel=gn&newsid=comos-hnfikve6671738&group=0
4、其他：
準備資料：
（1）中國的行政區劃資料，包括全國的省、市、縣
（2）世界的國家資料

（一）輿情分析專案之資料準備：採集評論資料

1、採集欄位

三個欄位：評論、IP、省份
其他欄位：收到點贊數等等

2、Python實現資料採集

檔案結構

（1）python主程式碼

busremark.py中

import json
import requests
import pymysql
import time as timeimport
from mylog import Logger

logger1 = Logger(logfile='log1.log', logname="log1", logformat=1).getlog()  # 使用自定義日誌物件

# 連線資料庫
connect = pymysql.Connect(
    host='localhost',
    port=3306,
    user='root',
    passwd='root',
    db='analyze',
    charset='utf8'
)
# 獲取遊標
cursor = connect.cursor()

# 建立資料庫語句


for page_num in range(1, 6001):  # 從1採集到6000條評論

    if page_num % 50 == 0:  # 每採集50條資料，休息2秒
        timeimport.sleep(2)

    url = "http://comment5.news.sina.com.cn/page/info?version=1&format=js&channel=gn&newsid=comos-hnfikve6671738&group=0&compress=0&ie=utf8&oe=utf8&page=" + str(
        page_num) + "&page_size=1&jsvar=loader_1541133929419_28637561"
    # url = "http://comment5.news.sina.com.cn/page/info?version=1&format=js&channel=gn&newsid=comos-hnfikve6671738&group=0&compress=0&ie=gbk&oe=gbk&page=1&page_size=2&jsvar=loader_1541133929419_28637561"

    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
    }
    try:  # 嘗試採集
        # 發出請求獲取響應
        response = requests.get(url, headers=headers)
        data_str = response.content.decode('unicode_escape')
        # 排除干擾字串
        data_str = data_str.lstrip("var loader_1541133929419_28637561=")
        # print(data_str)
        # str轉字典
        data_dict = json.loads(data_str)
        print(type(data_dict))
        # 獲取每次響應中的所有評論
        all_remarks = data_dict['result']['cmntlist']
        print(len(all_remarks))

        i = 0
        for c in all_remarks:  # 遍歷每次響應中的評論，並存入mysql
            i += 1
            print(i, "*" * 100)
            nick = c["nick"]  # 暱稱
            content = c["content"]  # 評論
            agree = int(c["agree"])  # 收到點贊
            area = c["area"]  # 地區
            ip = c["ip"]  # 源ip
            time = c["time"]  # 評論釋出時間
            profile_img = c["profile_img"]  # 頭像

            print(nick)
            print(content)
            print(agree)
            print(ip)
            print(time)
            print(profile_img)

            # sql操作
            # 增加資料操作
            sql_1 = "insert into all_remarks(nick, content, agree, area, ip, time, profile_img) values(%s,%s,%s,%s,%s,%s,%s)"
            data = (nick, content, agree, area, ip, time, profile_img)
            cursor.execute(sql_1, data)  # 生成增加sql語句
            connect.commit()  # 確認永久執行增加
    except Exception as e:  # 採集異常處理
        my_e = str(e) + " ==> " + str(url)
        logger1.warning(my_e)  # 定義除錯日誌內容
        # print(my_e)
        continue  # 忽視異常，進行後面的採集

（2）python日誌

mylog.py中

# 開發一個日誌系統， 既要把日誌輸出到控制檯， 還要寫入日誌檔案
import logging

# 用字典儲存輸出格式
format_dict = {
    1: logging.Formatter('%(asctime)s - %(name)s - %(filename)s - %(levelname)s - %(message)s'),
    2: logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'),
    3: logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'),
    4: logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'),
    5: logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
}


class Logger():
    def __init__(self, logfile, logname, logformat):
        '''
           指定儲存日誌的檔案路徑，日誌級別，以及呼叫檔案
           將日誌存入到指定的檔案中
        '''

        # 建立一個logger
        self.logger = logging.getLogger(logname)
        self.logger.setLevel(logging.DEBUG)

        # 建立一個handler，用於寫入日誌檔案
        fh = logging.FileHandler(logfile)
        fh.setLevel(logging.DEBUG)

        # 再建立一個handler，用於輸出到控制檯
        ch = logging.StreamHandler()
        ch.setLevel(logging.DEBUG)

        # 定義handler的輸出格式
        # formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
        formatter = format_dict[int(logformat)]
        fh.setFormatter(formatter)
        ch.setFormatter(formatter)

        # 給logger新增handler
        self.logger.addHandler(fh)
        self.logger.addHandler(ch)

    def getlog(self):
        return self.logger


if __name__ == '__main__':
    logger1 = Logger(logfile='log1.txt', logname="fox1", logformat=1).getlog()
    logger1.debug('i am debug')
    logger1.info('i am info')
    logger1.warning('i am warning')
    logger2 = Logger(logfile='log2.txt', logname="fox2", logformat=2).getlog()
    logger2.debug('i am debug2')
    logger2.info('i am info2')
    logger2.warning('i am warning2')

3、sql建表語句

/*
Navicat MySQL Data Transfer

Source Server         : win7_local
Source Server Version : 50717
Source Host           : localhost:3306
Source Database       : analyze

Target Server Type    : MYSQL
Target Server Version : 50717
File Encoding         : 65001

Date: 2018-11-02 17:12:24
*/

SET FOREIGN_KEY_CHECKS=0;

-- ----------------------------
-- Table structure for all_remarks
-- ----------------------------
DROP TABLE IF EXISTS `all_remarks`;
CREATE TABLE `all_remarks` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `nick` varchar(255) DEFAULT NULL,
  `content` text,
  `agree` int(10) DEFAULT NULL,
  `area` varchar(100) DEFAULT NULL,
  `ip` varchar(20) DEFAULT NULL,
  `time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
  `profile_img` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

4、效果截圖

以上工作僅實現了主要資料的準備，還有一些省份資料、國家名資料的準備。

之後就可以開始做資料分析了。

未完待續，敬請期待。

輿情分析專案-重慶公交墜江原因

輿情分析專案-重慶公交墜江原因

資料恢復在重慶公交墜江事件中的決定性作用

從重慶公交墜江事件看VR交通安全教育的嚴重性！

社交平臺輿情分析專案的總結和感想（SELENIUM,NLTK,貝葉斯分類器）(一)

tomcat--專案啟動兩次的原因分析

windows伺服器java專案cpu佔用較高原因分析

特朗普退出《巴黎協定》：python詞雲圖輿情分析

輿情、網絡輿情、輿情分析

Egg.js 原始碼分析-專案啟動

分析 | 半導體元器件失效的五大原因詳解

大資料技術學習筆記之網站流量日誌分析專案：資料採集層的實現3

大資料技術學習筆記之網站流量日誌分析專案：網站業務與企業架構2

大資料技術學習筆記之網站流量日誌分析專案：Flume日誌採集系統1

使用者行為分析專案介紹

mapReduce：網站日誌分析專案案例：資料清洗

vue專案出現空格警告的原因及其解決辦法

系列：如何找到自己的第一個資料分析專案（表哥篇）2

Spark專案智慧城市車流量分析專案之固定卡口下車輛的行車軌跡

重慶公交問題跟蹤

某招聘網站職位分析專案操作整理

輿情分析專案-重慶公交墜江原因

相關推薦