1. 程式人生 > 其它 >From Natural Language Processing to Neural Databases論文學習

From Natural Language Processing to Neural Databases論文學習

This paper introduces neural databases, a class of systems that use NLP transformers as localized answer derivation engines. The authors ground the vision in NeuralDB, a database system in which updates and queries are given as short natural language sentences. Preliminary experiments show that NeuralDB can answer select-project-join-aggregate queries over thousands of natural language sentences with very high accuracy.(本文介紹了神經資料庫,這是一類使用 NLP 轉換器作為本地化答案推導引擎的系統。 作者在 NeuralDB 中建立了願景,這是一個數據庫系統,其中更新和查詢以簡短的自然語言句子的形式給出。初步實驗表明,NeuralDB 可以非常準確地回答數千個自然語言句子中的 select-project-join-aggregate 查詢。

By nature, neural databases are not meant to provide the same correctness guarantees of a traditional database system. Hence, to be clear about the scope of the vision, neural databases should not be considered as an alternative to traditional databases in applications where such guarantees are required.(從本質上講,神經資料庫並不意味著提供與傳統資料庫系統相同的正確性保證。 因此,要明確願景的範圍,在需要此類保證的應用程式中,不應將神經資料庫視為傳統資料庫的替代品。

Two technical challenges for the vision: (1) finding suitable sets of facts from the database to feed to each transformer instance, and (2) further processing the answers of each transformer instance to produce the answer to the query.(該願景面臨的兩個技術挑戰:(1)從資料庫中找到合適的事實集以提供給每個轉換器例項,以及(2)進一步處理每個轉換器例項的答案以生成查詢的答案。

In NeuralDB, data and queries are represented as sentences in natural language, providing two of the key benefjts of neural databases. First, the database has no pre-defjned schema - users can mention any relationship of interest. Second, the database is usable by a broader set of users because updates and queries can be specifjed in whatever linguistic form is most convenient to the user.(在 NeuralDB 中,資料和查詢以自然語言的句子表示,提供了神經資料庫的兩個關鍵優勢。 首先,資料庫沒有預先定義的模式——使用者可以提及任何感興趣的關係。 其次,資料庫可供更廣泛的使用者使用,因為更新和查詢可以以對使用者最方便的任何語言形式進行指定。

The architecture of NeuralDB is based on the following ideas.(NeuralDB 的架構基於以下思想。

  • Running multiple transformers in parallel: In practice, transformers can only take a relatively small input. Hence, to scale to larger data sets, NeuralDB runs multiple copies of a neural SPJ operator in parallel, each outputting structured results. When queries don’t involve aggregation, the union of the outputs of the neural SPJ operators is the answer to the query. When the query does involve aggregation, these machine-readable outputs are fed into the aggregation operator.(並行執行多個變壓器:實際上,變壓器只能接受相對較小的輸入。 因此,為了擴充套件到更大的資料集,NeuralDB 並行執行神經 SPJ 運算元的多個副本,每個副本輸出結構化結果。 當查詢不涉及聚合時,神經 SPJ 運算子的輸出的並集就是查詢的答案。 當查詢確實涉及聚合時,這些機器可讀的輸出被送入聚合運算子。
  • Aggregation with a conventional operator: Since the neural SPJ was designed to output structured results, the architecture can use a separate conventional aggregation operator. The aggregation operator is selected through a classifjer that maps a query to an aggregation function.(使用傳統運算元進行聚合:由於神經 SPJ 旨在輸出結構化結果,因此該架構可以使用單獨的傳統聚合運算元。 聚合運算子是通過將查詢對映到聚合函式的分類器來選擇的。

The results of the experiment show that for lookup and join queries the model attained near perfect scores (above 99% exact match) on the templategenerated data. However, the model performs poorly for queries that require an aggregation or when the query result is a large set. Importantly, the results indicate that the model can be robust to simple linguistic variations when processing queries.(實驗結果表明,對於查詢和連線查詢,模型在模板生成的資料上獲得了接近完美的分數(超過 99% 的精確匹配)。 但是,該模型對於需要聚合的查詢或查詢結果是一個大集合時表現不佳。 重要的是,結果表明該模型在處理查詢時對簡單的語言變化具有魯棒性。