1. 程式人生 > >Python3爬蟲檔案持久化

Python3爬蟲檔案持久化

用json.dumps()將資料儲存到檔案中中文顯示不正常

def write_to_file(content):
    '''
    持久化儲存到txt檔案
    :param content: 字典物件
    :return:
    '''
    # a:追加; ensure_ascii:設定json.dumps()寫入檔案中的中文正常顯示
    with open('maoyanTop100.txt', 'a', encoding='utf8') as f:
        f.write(json.dumps(content) + '\n')

檔案內容如下:

{"the_index": "21", "image_url": "http://p0.meituan.net/movie/[email protected]_220h_1e_1c", "title": "\u6307\u73af\u738b3\uff1a\u738b\u8005\u65e0\u654c", "actor": "\u4f0a\u83b1\u8d3e\u00b7\u4f0d\u5fb7,\u4f0a\u6069\u00b7\u9ea6\u514b\u83b1\u6069,\u4e3d\u8299\u00b7\u6cf0\u52d2", "the_time": "2004-03-15", "score": "9.2"}
...

json.dumps 序列化時對中文預設使用的ascii編碼.想輸出真正的中文需要指定ensure_ascii=False。

新增ensure_ascii=False

def write_to_file(content):
    '''
    持久化儲存到txt檔案
    :param content: 字典物件
    :return:
    '''
    # encoding ensure_ascii設定檔案中的中文正常顯示
    with open('maoyanTop100.txt', 'a', encoding='utf8') as f:
        f.
write(json.dumps(content, ensure_ascii=False) + '\n')

檔案內容如下:

{"the_index": "1", "image_url": "http://p1.meituan.net/movie/[email protected]_220h_1e_1c", "title": "霸王別姬", "actor": "張國榮,張豐毅,鞏俐", "the_time": "1993-01-01", "score": "9.6"}
...