1. 程式人生 > >neo4j︱Cypher完整案例csv匯入、關係聯通、高階查詢(三)

neo4j︱Cypher完整案例csv匯入、關係聯通、高階查詢(三)

圖資料庫常規的有:neo4j(支援超多語言)、JanusGraph/Titan(分散式)、Orientdb,google也開源了圖資料庫Cayley(Go語言構成)、PostgreSQL儲存RDF格式資料。

第三篇,一個比較完整的csv匯入,並進行查詢的案例,涉及的資料量較大,更貼合實際場景。

NorthWind Introduction

如果要全部一次性執行的話,可以鍵入命令:

bin/neo4j-shell -path northwind.db -file import_csv.cypher

本文是官方的一個比較完整的案例,包括三部分:csv載入、建立實體關聯、查詢
其中csv載入與建立實體關聯可以瞭解到如何為Neo4j的資料集;
cypher的查詢也有難易之分,該案例中較好得進行了使用,有初級查詢與高階查詢。

很複雜是吧…來理一下邏輯:
這裡寫圖片描述

一、載入基本實體資訊

保證資料格式
因為neo4j是utf-8的,而CSV預設儲存是ANSI的,需要用記事本另存為成UTF-8的。

// Create customers
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///customers.csv" AS row
CREATE (:Customer {companyName: row.CompanyName, customerID: row.CustomerID, fax: row.Fax, phone: row.Phone});

// Create products
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///products.csv" AS row
CREATE (:Product {productName: row.ProductName, productID: row.ProductID, unitPrice: toFloat(row.UnitPrice)});

// Create suppliers
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///suppliers.csv" AS row
CREATE (:Supplier {companyName: row.CompanyName, supplierID: row.SupplierID});

// Create employees
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///employees.csv" AS row
CREATE (:Employee {employeeID:row.EmployeeID,  firstName: row.FirstName, lastName: row.LastName, title: row.Title});

// Create categories
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///categories.csv" AS row
CREATE (:Category {categoryID: row.CategoryID, categoryName: row.CategoryName, description: row.Description});

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///orders.csv" AS row
MERGE (order:Order {orderID: row.OrderID}) ON CREATE SET order.shipName =  row.ShipName;

注意:
執行兩次會重複載入,注意!
“file:///customers.csv”中的’///’請注意!

CREATE (:Product {productName: row.ProductName)})其中:

  • Product為圖ID,可以通過Match (customers) return customers進行檢視;
  • row.ProductName的用法,跟dataframe差不多;
  • 類似dict,其中的productName為Key

其中有一個比較奇怪的表格,那就是最後一個:orders.csv

為了查詢更快,可以建立索引:

CREATE INDEX ON :Product(productID);
CREATE INDEX ON :Product(productName);
CREATE INDEX ON :Category(categoryID);
CREATE INDEX ON :Employee(employeeID);
CREATE INDEX ON :Supplier(supplierID);
CREATE INDEX ON :Customer(customerID);
CREATE INDEX ON :Customer(customerName);

給每個節點比較重要的ID欄位建立索引。
不能同時執行,不然會報錯:

Neo.ClientError.Statement.SyntaxError

同時新增一個約束:

CREATE CONSTRAINT ON (o:Order) ASSERT o.orderID IS UNIQUE;

同時,如果需要修改其中一部分內容,可參考下面案例:
如果Janet is now reporting to Steven那麼久可以如以下方式進行修改:

MATCH (mgr:Employee {EmployeeID:5})
MATCH (emp:Employee {EmployeeID:3})-[rel:REPORTS_TO]->()
DELETE rel
CREATE (emp)-[:REPORTS_TO]->(mgr)
RETURN *;

定位到emp,把有關聯的都先刪掉DELETE,然後create新的關聯。

同時csv載入的方式有兩種:本地載入+線上文件載入:
線上載入:

LOAD CSV FROM 'https://neo4j.com/docs/developer-manual/3.3/csv/artists.csv' AS line
CREATE (:Artist { name: line[1], year: toInteger(line[2])})

本地載入中有個Bug,就是怎麼寫地址,難道要這麼寫?file:///C:\Users\mattzheng\Desktop\categories.csv,顯然是不對的。
那麼本地的話,需要把內容放到固定的資料夾之中,一個叫import資料夾之中。
有可能在:在XXX\Neo4j\graph.db\import資料夾內
也有可能在其他東西,筆者當時的資料夾路徑藏得很深是:C:\Users\matt\.Neo4jDesktop\neo4jDatabases\database-b82284eb-23ab-4a42-8a83-f13af055ecf0\installation-3.3.4\import
筆者也是誤打誤撞找到了這個連結,是通過報錯提醒得到的:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///C:\\Desktop\\categories.csv" AS row
CREATE (:Customer {companyName: row.CompanyName, customerID: row.CustomerID, fax: row.Fax, phone: row.Phone});

然後他會報錯:

Couldn't load the external resource at: file:/C:\Users\matt\.Neo4jDesktop\neo4jDatabases\database-b82284eb-23ab-4a42-8a83-f13af055ecf0\installation-3.3.4\import\categories.csv

.
.

二、建立關聯

2.1 order與 products/employees關聯

order與 products and employees的關聯:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///orders.csv" AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (product:Product {productID: row.ProductID})
MERGE (order)-[pu:PRODUCT]->(product)
ON CREATE SET pu.unitPrice = toFloat(row.UnitPrice), pu.quantity = toFloat(row.Quantity);
//同時,創立新的關聯屬性,on create的作用

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///orders.csv" AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (employee:Employee {employeeID: row.EmployeeID})
MERGE (employee)-[:SOLD]->(order);

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///orders.csv" AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (customer:Customer {customerID: row.CustomerID})
MERGE (customer)-[:PURCHASED]->(order);

toFloat(row.UnitPrice)當資料中為數值型,則需要規定關係型別。
文字型可以不用規定具體的類似是啥。
MATCH (order:Order {orderID: row.OrderID})的意思為將圖名稱Order賦值為order,同時選中orderID=row.OrderID這些內容;
[pu:PRODUCT]中,pu代表關係的統稱;PRODUCT代表關係的名稱

2.2 products,suppliers,categories關聯

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///products.csv" AS row
MATCH (product:Product {productID: row.ProductID})
MATCH (supplier:Supplier {supplierID: row.SupplierID})
MERGE (supplier)-[:SUPPLIES]->(product);

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///products.csv" AS row
MATCH (product:Product {productID: row.ProductID})
MATCH (category:Category {categoryID: row.CategoryID})
MERGE (product)-[:PART_OF]->(category);

2.3 employees之間的關聯

在employees構建 ‘REPORTS_TO’關係來表達上下級關係。

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///employees.csv" AS row
MATCH (employee:Employee {employeeID: row.EmployeeID})
MATCH (manager:Employee {employeeID: row.ReportsTo})
MERGE (employee)-[:REPORTS_TO]->(manager);

那麼最終就會生成如下的內容:
這裡寫圖片描述

三、初級查詢

查詢一:單獨查詢兩個關聯表

MATCH (:Order)<-[:SOLD]-(e:Employee)
return *

查詢二: product的價格,並排序:

match (p:Product)
return p.productName,p.unitPrice order by p.unitPrice DESC
limit 10;

邏輯:先從圖資料庫中定位p;order by 表示排序;limit 表 顯示限制。

查詢三:product 中’Chocolade’產品價格並排序:where、排序order使用

# 寫法一:
match (p:Product)
where p.productName = 'Chocolade'
return p.productName,p.unitPrice order by p.unitPrice DESC limit 10;

# 寫法二:
match (p:Product {productName : 'Chocolade'})
return p.productName,p.unitPrice order by p.unitPrice DESC limit 10;

寫法一通過where來進行定位,寫法二通過在match變數時,定義產品來進行產品定位。

查詢四:product 中’Chocolade’以及’Chai’產品價格並排序:where、排序order使用

match (p:Product)
where p.productName IN ['Chocolade','Chai']
return p.productName,p.unitPrice order by p.unitPrice DESC limit 10;

查詢五:條件篩選:where使用

MATCH (p:Product)
WHERE p.productName STARTS WITH "C" AND p.unitPrice > 100
RETURN p.productName, p.unitPrice;

意義為:選擇p.productName中,首字母為’C’,同時unitPrice的價格大於100的範圍內。

Indexing的使用

如果要加速某一列屬性的查詢,可以設定Index

CREATE INDEX ON :Product(productName);
CREATE INDEX ON :Product(unitPrice);

查詢六:買了’Chocolade’的人有誰? :join用法

這邊涉及四個表格:

  • Product產品表,productID;
  • Customer顧客表 CustomerID;
  • orders索引表,orderID + CustomerID
  • orders_Details索引表,orderID + productID

    //正確:
    MATCH (p:Product {productName:”Chocolade”})<-[:PRODUCT]-(:Order)<-[:PURCHASED]-(c:Customer)
    RETURN distinct c.companyName;
    //錯誤
    //match後面,跟的是主表,主表不帶關係[],此時主表為Product
    MATCH (c:Customer)-[:PURCHASED]
    RETURN distinct c.companyName
    //思考用法:用optional match之後為什麼錯誤?
    match (c:Customer)
    where (p:Product {productName:”Chocolade”})<-[:Product]-(:Order)<-[:PURCHASED]-(c)
    return distinct c.companyName

這裡筆者的思考是,為什麼Product是主表,需要遵循邏輯關係,邏輯關係是Customer表->order表->Product表,而不是Product表反向。
思考用法:此時命令返回的是全部的c.companyName,而不是買了巧克力的,optional match也是一個根據關係生成變數步驟,不是新增約束的步驟;此時也不能用where,where後面跟的對變數的約束,而不能嫁接關係

查詢七:我買了啥+買了幾件?:統計功能

‘Drachenblut Delikatessen’買了啥,買了幾件東西。
客戶和訂單之間的匹配成為可選匹配,這與外連線相當。

//寫法1+普通match寫法
MATCH  (p:Product)<-[pu:PRODUCT]-(:Order)<-[:PURCHASED]-(c:Customer {companyName:"Drachenblut Delikatessen"})
RETURN p.productName, toInt(sum(pu.unitPrice * pu.quantity)) AS volume
ORDER BY volume DESC;

//寫法2+OPTIONAL MATCH
MATCH (c:Customer {companyName:"Drachenblut Delikatessen"})
OPTIONAL MATCH (p:Product)<-[pu:PRODUCT]-(:Order)<-[:PURCHASED]-(c)
RETURN p.productName, toInt(sum(pu.unitPrice * pu.quantity)) AS volume
ORDER BY volume DESC ;

OPTIONAL MATCH在我看來更多的還是賦值操作,而且可以在match寫不下的時候,補充。
寫法二,match先定義變數,然後在OPTIONAL MATCH後面補充連線關係。
其中:toInt()整數、sum()求和;AS volume生成新一列列名為’volumne’

查詢八:僱員ID計數

MATCH (:Order)<-[:SOLD]-(e:Employee)
RETURN e.employeeID,count(*) AS cnt ORDER BY cnt DESC LIMIT 10

按照e.employeeID,進行分類count(*)計數。

e.employeeID cnt
“4” 156
“3” 127
“1” 123

查詢九:內容返回list/array格式

MATCH (o:Order)<-[:SOLD]-(e:Employee)
RETURN collect(e.lastName)

collect 將內容聚合成 (list,array)

.

四、高階查詢

查詢一:Which Employee had the Highest Cross-Selling Count of ‘Chocolade’ and Which Product?

查詢語句為:

MATCH (choc:Product {productName:'Chocolade'})<-[:PRODUCT]-(:Order)<-[:SOLD]-(employee),
      (employee)-[:SOLD]->(o2)-[:PRODUCT]->(other:Product)
RETURN employee.employeeID, other.productName, count(distinct o2) as count
ORDER BY count DESC
LIMIT 5;

[:PRODUCT]-(:Order)代表的是:[]代表著關係名稱;()代表著圖名稱;
第一條邏輯:(employee)-(:Order)-(choc:Product),定位到employee生產了叫Chocolade的product
第二條邏輯:(employee)-()-(other:Product),定位到的僱員生產了哪些其他Product(所有的)

這裡寫圖片描述

查詢二:How are Employees Organized? Who Reports to Whom?

MATCH path = (e:Employee)<-[:REPORTS_TO]-(sub)
RETURN e.employeeID AS manager, sub.employeeID AS employee;

一個簡單的模式,尋找Employee關係中REPORTS_TO的Employee。此時e代表僱主,sub代表僱員。
請注意,5號員工有人向他報告,但他也向2號員工報告。
這裡有一個邏輯是:僱員、僱主都在Employee庫中,所以要以REPORTS_TO關係為切入點。

這裡寫圖片描述

查詢三:Which Employees Report to Each Other Indirectly?

比查詢二更深入一些,間接的。

MATCH path = (e:Employee)<-[:REPORTS_TO*]-(sub)
WITH e, sub, [person in NODES(path) | person.employeeID][1..-1] AS path
RETURN e.employeeID AS manager, sub.employeeID AS employee, CASE WHEN LENGTH(path) = 0 THEN "Direct Report" ELSE path END AS via
ORDER BY LENGTH(path);

第一步跟查詢二的邏輯一樣,在同一個Employee庫彙總,查詢關係為:REPORTS_TO的employee.
第二步,with用法,with從句可以連線多個查詢的結果,即將上一個查詢的結果用作下一個查詢的開始,
(哈哈哈… 後面有點不明白,查完資料再補充…)

這裡寫圖片描述

查詢四:How Many Orders were Made by Each Part of the Hierarchy?

MATCH (e:Employee)
OPTIONAL MATCH (e)<-[:REPORTS_TO*0..]-(sub)-[:SOLD]->(order)
RETURN e.employeeID, [x IN COLLECT(DISTINCT sub.employeeID) WHERE x <> e.employeeID] AS reports, COUNT(distinct order) AS totalOrders
ORDER BY totalOrders DESC;

這裡寫圖片描述