爬蟲2-python爬取的資料存入mysql**
阿新 • • 發佈:2020-12-19
也可以存入hive、HDFS,這裡選擇存在mysql。
一、安裝mysql(python在pyspark一節已配置好)
https://blog.csdn.net/zhouzezhou/article/details/52446608
安裝後找不到bin目錄解決方法
https://blog.csdn.net/cuicui_ruirui/article/details/105840107
二、python安裝操作mysql的庫
pip install pymysql -i http://pypi.douban.com/simple --trusted-host pypi.douban.com
三、在mysql中建立相應的表
mysql>create database pachong; mysql> use pachong; --##收集欄位url, author, bookname, chubanshe, time_print, prince mysql> create table dangdang_book(url varchar(100) not null,author varchar(20),bookname varchar(50),chubanshe varchar(50),time_print varchar(50),price varchar(10))charset=utf8;
四、測試python灌資料到mysql
conn = pymysql.connect(host='localhost', port=3306, user='root', password='****', db='pachong', charset='utf8') cursor = conn.cursor() cursor.execute("insert into `dangdang_book` values('aa','bb','c','d','e','f');") conn.commit()
mysql> select * from dangdang_book; +-----+--------+----------+-----------+------------+-------+| url | author | bookname | chubanshe | time_print | price | +-----+--------+----------+-----------+------------+-------+ | aa | bb | c | d | e | f | +-----+--------+----------+-----------+------------+-------+
五、接下來可以將url批量存入Excel,讓python讀取Excel中的url逐條將資訊插入mysql