如何生成可匯入資料庫的億級別資料
阿新 • • 發佈:2019-02-09
1. 使用python指令碼可以輕鬆生成滿足條件的資料,具體如下
#coding: utf-8 import os, sys, time, datetime from itertools import izip N = 100000000 def gen_meid(): return def gen_seq(): return def generate_message(meid,seq): ts = time.time(); time_st = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S') print '\t'.join(( meid, seq, '\N', '\N', '\N', '\N', '0', '0', '0', '0', time_st, '\N', '\N', '0', '\N', '\N', '\N', '\N', time_st )) def main(args): print '\t'.join(( 'deviceID', 'battery', ... , 'accumulatedTime', 'createDate' )) // for mongodb, mysql delete for meid,seq in izip(gen_meid(),gen_seq()): generate_message(meid,seq) pass return 0 #============================== if __name__ == "__main__": import sys main(sys.argv) pass #==============================
$ python a.py > device.tsv
2. 切分資料(可選)
tail -n +1 device.csv | head -n 5000000 > part1.txt
tail -n +100001 device.csv | head -n 100000 > part2.txt
tail -n +200001 device.csv | head -n 100000 > part3.txt
tail -n +300001 device.csv | head -n 100000 > part4.txt
3. 生成txt 檔案
python a.py > device.txt