1. 程式人生 > 資料庫 >構建圖形資料庫應用程式

構建圖形資料庫應用程式

構建圖形資料庫應用程式

In this chapter, we discuss some of the practical issues of working with a graph database. In previous chapters, we’ve looked at graph data; in this chapter, we’ll apply that knowledge in the context of developing a graph database application. We’ll look at some of the data modeling questions that may arise, and at some of the application architecture choices available to us.
In our experience, graph database applications are highly amenable to being developed using the evolutionary, incremental, and iterative software development practices in widespread use today. A key feature of these practices is the prevalence of testing throughout the software development life cycle. Here we’ll show how we develop our data model and our application in a test-driven fashion.
At the end of the chapter, we’ll look at some of the issues we’ll need to consider when planning for production.

在本章中,我們討論使用圖資料庫的一些實際問題。 在前面的章節中,我們研究了圖形資料; 在本章中,我們將在開發圖形資料庫應用程式的背景下應用這些知識。 我們將研究可能出現的一些資料建模問題,以及可供我們使用的一些應用程式體系結構選擇。
根據我們的經驗,圖資料庫應用程式非常適合使用當今廣泛使用的漸進式,增量式和迭代式軟體開發實踐進行開發。 這些實踐的關鍵特徵是在整個軟體開發生命週期中普遍進行測試。 在這裡,我們將展示如何以測試驅動的方式開發資料模型和應用程式。
在本章的最後,我們將討論在計劃生產時需要考慮的一些問題。

Data Modeling

在第3章中,我們詳細介紹了建模和使用圖形資料的方法。在這裡,我們總結了一些更重要的建模準則,並討論了實現圖形資料模型如何與迭代和增量軟體開發技術相適應的方法。

Describe the Model in Terms of the Application’s Needs

The questions we need to ask of the data help identify entities and relationships. Agile user stories provide a concise means for expressing an outside-in, user-centered view of an application’s needs, and the questions that arise in the course of satisfying this need. 1 Here’s an example of a user story for a book review web application:AS A reader who likes a book, I WANT to know which books other readers who like the same book have liked, SO THAT I can find other books to read.
This story expresses a user need, which motivates the shape and content of our data model. From a data modeling point of view, the  AS A clause establishes a context comprising two entities—a reader and a book—plus the  LIKES relationship that connects them. The  I WANT clause then poses a question: which books have the readers who like the book I’m currently reading also liked? This question exposes more  LIKES relationships, and more entities: other readers and other books.
The entities and relationships that we’ve surfaced in analyzing the user story quickly translate into a simple data model, as shown in Figure 4-1.

我們需要對資料提出的問題有助於識別實體和關係。敏捷的使用者故事提供了一種簡潔的方法,用於表達從外到內,以使用者為中心的應用程式需求檢視以及滿足此需求的過程中出現的問題。 這是一個用於書評網路應用程式的使用者故事的示例:AS喜歡一本書的讀者,我想知道其他喜歡同一本書的讀者喜歡哪本書,因此我可以找到其他書籍。
這個故事表達了使用者需求,激發了我們資料模型的形狀和內容。從資料建模的角度來看,“ AS A”子句建立了一個包含兩個實體(讀者和書本)以及連線它們的LIKES關係的上下文。然後,I WANT子句提出了一個問題:喜歡我當前正在閱讀的書的讀者也喜歡哪些書?這個問題揭示了更多的LIKES關係,以及更多的實體:其他讀者和其他書籍。我們在分析使用者故事時浮出水面的實體和關係很快就轉化為一個簡單的資料模型,如圖4-1所示。

由於此資料模型直接對使用者故事所提出的問題進行編碼,因此它可以以類似反映我們要查詢的資料問題的結構的方式進行查詢,因為愛麗絲喜歡沙丘,因此可以找到其他喜歡的書沙丘喜歡:

Nodes for Things, Relationships for Structure

• Use nodes to represent entities—that is, the things in our domain that are of interest to us, and which can be labeled and grouped.
• Use relationships both to express the connections between entities and to establish semantic context for each entity, thereby structuring the domain.
• Use relationship direction to further clarify relationship semantics. Many relationships are asymmetrical, which is why relationships in a property graph are always directed. For bidirectional relationships, we should make our queries ignore direction, rather than using two relationships.
• Use node properties to represent entity attributes, plus any necessary entity metadata, such as timestamps, version numbers, etc.
• Use relationship properties to express the strength, weight, or quality of a relationship, plus any necessary relationship metadata, such as timestamps, version numbers, etc
  • 使用節點來表示實體,即我們所關注的領域中可以標記和分組的事物。
  • 使用關係既可以表示實體之間的連線,也可以為每個實體建立語義上下文,從而構造域。
  • 使用關係方向進一步闡明關係語義。 許多關係是不對稱的,這就是為什麼屬性圖中的關係始終是有向的。 對於雙向關係,我們應該使查詢忽略方向,而不是使用兩個關係。
  • 使用節點屬性表示實體屬性,以及任何必要的實體元資料,例如時間戳,版本號等。
  • 使用關係屬性來表示關係的強度,權重或質量,以及任何必要的關係元資料,例如時間戳,版本號等
It pays to be diligent about discovering and capturing domain entities. As we saw in Chapter 3, it’s relatively easy to model things that really ought to be represented as nodes using carelessly named relationships instead. If we’re tempted to use a relationship to model an entity—an email, or a review, for example—we must make certain that this entity cannot be related to more than two other entities. Remember, a relationship must have a start node and an end node—nothing more, nothing less. If we find later that we need to connect something we’ve modeled as a relationship to more than two other entities, we’ll have to refactor the entity inside the relationship out into a separate node. This is a breaking change to the data model, and will likely require us to make changes to any queries and application code that produce or consume the data.

努力發現和捕獲域實體是值得的。 正如我們在第3章中所看到的,使用不小心命名的關係來建模應該以節點表示的事物相對容易。 如果我們想使用一種關係來為實體建模(例如,電子郵件或評論),則必須確保該實體不能與兩個以上的其他實體相關。 請記住,一個關係必須有一個開始節點和一個結束節點,僅此而已。 如果以後發現我們需要將已建模為關係的物件與其他兩個以上的實體連線,則必須將關係內的實體重構為一個單獨的節點。 這是對資料模型的重大更改,可能需要我們對生成或使用資料的所有查詢和應用程式程式碼進行更改。

Fine-Grained versus Generic Relationships

When designing relationships we should be mindful of the trade-offs between using fine-grained relationship names versus generic relationships qualified with properties. It’s the difference between using  DELIVERY_ADDRESS and  HOME_ADDRESS versus ADDRESS {type:'delivery'} and  ADDRESS {type:'home'} .
Relationships are the royal road into the graph. Differentiating by relationship name is the best way of eliminating large swathes of the graph from a traversal. Using one or more property values to decide whether or not to follow a relationship incurs extra I/O the first time those properties are accessed because the properties reside in a separate store file from the relationships (after that, however, they’re cached).

在設計關係時,我們應注意使用細粒度關係名稱與具有屬性的通用關係之間的權衡。 使用DELIVERY_ADDRESS和HOME_ADDRESS與ADDRESS {type:'delivery'}和ADDRESS {type:'home'}之間的區別。
關係是圖中的皇家之路。通過關係名稱區分是從遍歷中消除大量圖形的最好方法。首次訪問這些屬性時,使用一個或多個屬性值來決定是否遵循某個關係會導致額外的I / O,因為這些屬性與該關係位於不同的儲存檔案中(但是此後將對其進行快取) 。

We use fine-grained relationships whenever we have a closed set of relationship names. Weightings—as required by a shortest-weighted-path algorithm—rarely comprise a closed set, and are usually best represented as properties on relationships.
Sometimes, however, we have a closed set of relationships, but in some traversals we want to follow specific kinds of relationships within that set, whereas in others we want to follow all of them, irrespective of type. Addresses are a good example. Following the closed-set principle, we might choose to create  HOME_ADDRESS ,  WORK_ADDRESS ,and  DELIVERY_ADDRESS relationships. This allows us to follow specific kinds of address relationships ( DELIVERY_ADDRESS , for example) while ignoring all the rest.
But what do we do if we want to find all addresses for a user? There are a couple of options here. First, we can encode knowledge of all the different relationship types in our queries: e.g.,  MATCH (user)-[:HOME_ADDRESS|WORK_ADDRESS| DELIVERY_ADDRESS]->(address) . This, however, quickly becomes unwieldy when there are lots of different kinds of relationships. Alternatively, we can add a more generic  ADDRESS relationship to our model, in addition to the fine-grained relationships. Every node representing an address is then connected to a user using two relationships: a fined-grained relationship (e.g.,  DELIVERY_ADDRESS ) and the more generic  ADDRESS {type:'delivery'} relationship.
As we discussed in “Describe the Model in Terms of the Application’s Needs” on page 66, the key here is to let the questions we want to ask of our data guide the kinds of relationships we introduce into the model.

每當我們有一組封閉的關係名稱時,我們就使用細粒度的關係。權重(按照最短加權路徑演算法的要求)很少包含封閉集,通常最好用關係的屬性表示。
但是,有時我們有一組封閉的關係,但是在某些遍歷中,我們希望遵循該組中的特定型別的關係,而在另一些遍歷中,我們希望遵循所有這些關係,而與型別無關。地址就是一個很好的例子。遵循封閉集原則,我們可能選擇建立HOME_ADDRESS,WORK_ADDRESS和DELIVERY_ADDRESS關係。這使我們可以遵循特定型別的地址關係(例如DELIVERY_ADDRESS),而忽略其餘所有關係。
但是,如果我們要查詢使用者的所有地址怎麼辦?這裡有兩個選擇。首先,我們可以在查詢中對所有不同關係型別的知識進行編碼:例如,MATCH(使用者)-[:HOME_ADDRESS | WORK_ADDRESS |          DELIVERY_ADDRESS]->(地址)。但是,當存在許多不同型別的關係時,這很快變得難以處理。另外,除了細粒度的關係外,我們還可以向模型新增更通用的地址關係。然後,使用兩種關係將表示地址的每個節點連線到使用者:精細關係(例如DELIVERY_ADDRESS)和更通用的ADDRESS {type:'delivery'}關係。
正如我們在第66頁的“根據應用程式的需求描述模型”中所討論的那樣,關鍵是要讓我們要問資料的問題指導我們引入模型中的各種關係。

Model Facts as Nodes

When two or more domain entities interact for a period of time, a fact emerges. We represent a fact as a separate node with connections to each of the entities engaged in that fact. Modeling an action in terms of its product—that is, in terms of the thing that results from the action—produces a similar structure: an intermediate node that represents the outcome of an interaction between two or more entities. We can use timestamp properties on this intermediate node to represent start and end times.
The following examples show how we might model facts and actions using intermediate nodes.

當兩個或多個域實體互動一段時間後,就會出現一個事實。 我們將事實表示為一個單獨的節點,該節點與該事實中涉及的每個實體都有連線。 根據行為的產品(即根據行為產生的事物)對行為進行建模,會產生類似的結構:代表兩個或多個實體之間互動結果的中間節點。 我們可以在此中間節點上使用時間戳屬性來表示開始時間和結束時間。
以下示例說明了如何使用中間節點對事實和動作進行建模。

Employment

Figure 4-2 shows how the fact of Ian being employed by Neo Technology in the role of engineer can be represented in the graph.

圖4-2顯示瞭如何在圖形中表示Ian被Neo Technology聘用為工程師的事實。

Figure 4-3 shows how the fact that William Hartnell played The Doctor in the story
The Sensorites can be represented in the graph