Ne4j如何使用python批量匯入資料
阿新 • • 發佈:2018-12-16
說到neo4j的批量匯入資料,我想你一定會想到如下幾點:
- import tools
- load file
- neo4j driver for python/java…
前兩種必須要資料檔案存在檔案系統才可以執行。
但是如果你的資料是以流資料的形式持續獲取的呢?
這時候會選擇python或者java來進行實時的資料節點的匯入。
也許你使用過python的py2neo
,然後發現匯入節點的速度特別慢。
這裡就告訴你解決方案:批量匯入 + 不重複Merge
def add_names(items, tx): for data in items: tx.append(statement_c, data) tx.process() def main(): with open("./raw.csv", "r") as f: content = f.readlines() items = [] for index, c in enumerate(content): print(">>> {}".format(index)) c = c.strip() person_name, company_name, visit_time = c.split(",") data = { "person_name": person_name, "company_name": company_name, "visit_time": visit_time, } items.append(data) if index % 1000 == 0: tx = graph.begin() add_names(items, tx) items = [] tx.commit() if __name__ == '__main__': s = time.time() statement_c = """MERGE (node1:Person {person_name:{person_name}}) MERGE (node2:Company {company_name:{company_name}}) MERGE (node1)<-[:Query {visit_time: {visit_time}}]-(node2)""" main() e = time.time() print("耗時:{}s".format(e-s))
具體程式碼就不解讀了,有問題可以在下面留言。
這裡的merge語句是參考:https://stackoverflow.com/questions/35381968/cypher-node-already-exists-issue-with-merge
python程式碼參考:https://py2neo.org/2.0/cypher.html#py2neo.cypher.CypherTransaction.process
Neo4j python driver 1.6:https://neo4j.com/docs/api/python-driver/1.7-preview/index.html?highlight=import
py2neo 2.0:https://py2neo.org/2.0/cypher.html#py2neo.cypher.CypherTransaction.process
py2neo v4:https://py2neo.org/v4/