《BLINKS: Ranked Keyword Searches on Graphs》——論文筆記

阿新 • • 發佈：2019-01-05

ABSTRACT

目前關鍵詞查詢的技術缺陷：poor worst-case performance, not taking full advantage of indexes, and high memory requirements.
本文方法：BLINKS, a bi-level indexing and query processing scheme for top-k keyword search on graphs.
BLINKS遵循的搜尋策略保證了效能下限，另外二級目錄可幫助進行剪枝和加速查詢。二級目錄是為了減少目錄的大小，首先將目錄分塊，二級目錄只儲存塊級的資訊。

1 Introduction

現在有各種各樣的圖資料。
樹結構或圖結構資料上的關鍵詞搜尋流行的原因：

對使用者友好
很多圖結構資料沒有schema，很多查詢語言不適用

we focus on implementing efficient ranked keyword searches on schemaless node-labeled graphs.

Challenges 利用了樹的層級結構應用於XML的方法不再適用。另外缺少schema也排除了一些在編譯階段的優化。之前的工作有以下缺點：

現有的許多演算法採用啟發式的圖搜尋策略，缺乏效能的保證。
現有演算法沒有充分利用index。他們只用index確定點集中有沒有包含關鍵字，找子結構連線的節點依靠圖遍歷。但是naive的目錄會導致很高的儲存需求。

Contributions BLINKS (Bi-Leval INdexing for Keyword Search)

Better search strategy. 基於cost-balanced expansion——一個新的backward search strategy。增加了引數m（關鍵詞個數）後，效果提升很多。
Combining indexing with search. 索引預先計算並儲存了一些最短路的資訊。不僅為backward search提速，還支援forward search。BLINKS是第一個在通用圖上廣泛使用索引加速關鍵字搜尋的方式。
Partitioning-based indexing.

儲存所有最短路資訊的目錄太大了，BLINKS把圖分為幾份，二級目錄儲存塊級資訊。可以在空間和搜尋效率上得到平衡。

實驗證明了BLINKS取得了數量級上的領先，以及BLINKS支援複雜的scoring functions。

The rest of the paper is organized as follows. We formally define the problem and describe our scoring function in Section 2. We review existing graph search strategies and propose the new cost-balanced expansion policy in Section 3. To help illustrate how indexing helps search, we present a conceptually simple (but practically infeasible) single-level index and the associated search algorithm in Section 4. In Sections 5 and 6, we introduce our full bi-level index and search algorithm. We discuss optimizations in Section 7 and present results of experiments in Section 8. Finally, we survey the related work in Section 9 and conclude in Section 10.

2 Problem Definition

Data and Query

Definition 1. Given a query q=(w1,...,wm)and a directed graph G, an answer to q is a pair <r,(n1,...,nm)>, where r and ni’s are nodes (not necessarily distinct) in G satisfying the following properties:
(Coverage) For every i, node ni contains keyword wi.
(Connectivity) For every i, there exists a directed path in G from r to ni.

r是答案的root，ni’s 是答案的matches。連通性使得答案必須為一個根節點能達到所有關鍵詞的子樹。
這裡寫圖片描述
Top-k Query

Definition 2. Given a query and a scoring function S, the (best) score of a node r is the maximum S(T) over all answers T rooted at r (or 0 if there are no such answers). An answer rooted at r with the best score is called a best answer rooted at r. A top-k query returns the k nodes in the graph with the highest best scores, and , for each node returned, the best score and a best answer rooted at the node.

上述定義中返回的k個答案root並不同。原因如下

避免了某個指向很多帶有關鍵詞的子節點的節點作為根的情況。
這樣會使目錄更加有效。（在第七節會討論）

Scoring Function. 本文主要關注索引和查詢處理，因此並不對scoring function進行深入研究。
本文的得分函式同時考慮了圖結構和內容，並且包含了資料庫和IR社群最先進的測量方法。對於T=<r,(n1,...,nm)>和查詢(w1,...,wm)得分函式為

S(T)=f(S¯r(r)+∑i=1mS¯n(ni,wi)+∑i−1mS¯p(r,ni))
其中S¯p(r,ni))表示從root r到match ni的距離。
f(⋅)的輸入是三個部分的和，即(1) the answer root. (2) the matches. (3) the paths from the answer root to the matches.
得分函式的兩個properties:

Match-distributive semantics.在S(T)的定義中，matches和paths from the answer root to the matches被累加計算，即每個路徑單獨的作用於得分函式，即便有公共邊。這種計算方式偏向於圖2的右側結果。
Graph-distance semantics. S¯p(r,ni)被定義為從root到matches的最短路徑，這樣吧關鍵詞搜尋問題歸約成最短路徑問題。

這裡寫圖片描述
An Assumption for Convenience. 為了表示簡單，我們忽略掉root和match對得分的貢獻，只考慮path部分，即∑mi−1S¯p(r,ni)。現在演算法歸約成找k個節點，其中每個節點都能達到所有的查詢關鍵詞，而且到這些關鍵詞的距離還要最小。

3 Towards Optimal Graph Search Strategies

Backward Search 在沒有索引提供超過一跳的圖連線資訊的情況下，我們可以從包含至少一個關鍵詞的節點開始圖搜尋。這種節點可以被inverted-list index輕鬆識別。該方法導致了後向搜尋演算法：

讓Ei代表能夠到達關鍵詞ki的節點。Ei——the cluster for k.
Ei來自Oi。Oi是直接包含關鍵詞ki的點集。Oi——the cluster origin。Oi的節點——keyword nodes。、
每一步搜尋，我們都根據Ei的入邊進行擴充套件。
判斷root節點是否找到——對於每個Ei要麼x∈Ei要麼x有邊連線到Ei

第一個後向關鍵詞搜尋演算法被Bhalotia等人提出，他們主要使用以下兩種策略：

Equi-distance expansion in each cluster: 決定擴充套件keyword時，訪問哪個節點。找距離Oi最近的點，為了增加到the cluster origin距離。
Distance-balanced expansion across clusters: 決定哪個關鍵詞被拓展。為了平衡各個關鍵詞到邊界的距離，每次擴充套件(u,Ei)中距離最小的。

下面對上述兩個策略進行優化，首先是equi-distance expansion in each cluster策略。

Theorem 1. An optimal backward search algorithm must follow the s

相關推薦

《BLINKS: Ranked Keyword Searches on Graphs》——論文筆記

ABSTRACT 目前關鍵詞查詢的技術缺陷：poor worst-case performance, not taking full advantage of indexes, and high memory requirements. 本文方法：BLIN

論文筆記：Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

感想最近深度學習面試的時候，有個面試官問了我LSTM，我一下子傻眼了，確實不怎麼好懂，學LSTM已經有半年的時間了，但是對這個玩意兒卻還不怎麼明白，可能是沒用過它的緣故吧，我找了一篇它和GRU比較的論文，這篇論文沒有從理論上證明哪個模型的好壞，只是從實驗，應用場景的角度發現GRU在一些場景比LST

Semantic Parsing on Freebase from Question-Answer Pairs【論文筆記】

參考：https://zhuanlan.zhihu.com/p/25759682 原文：https://cs.stanford.edu/~pliang/papers/freebase-emnlp2013.pdf 一、摘要作者訓練了一個可擴充套件到Freebase的語義解析器，由於

論文筆記：Visual Object Tracking based on Adaptive Siamese and Motion Estimation Network

Visual Object Tracking based on Adaptive Siamese and Motion Estimation 本文提出一種利用上一幀目標位置座標，在本幀中找出目標可能出現的位置的網路--motion es

On Deep Learning-Based Channel Decoding 論文筆記

摘要我們重新考慮使用深度神經網路對隨機和結構化碼字（例如極性碼字）進行一次性解碼。雖然可以為碼字族和短碼字長度實現最大後驗（MAP）誤位元速率（BER）效能，但我們觀察到（i）結構化碼字更容易學習和（ii）神經網路能夠生成在結構化訓練期間從未見過的碼字，而不是隨機碼字。這些結果提供了一些證據，表明神經

論文筆記之No-reference blur assessment based on edge modeling

No-reference blur assessment based on edge modeling 摘要本文展示一種客觀的無參考模糊度量演算法，基於邊緣模型的模糊度量，Blur metric based on edge modeling(EMBM)。對邊緣的每一個畫

【論文筆記】用形狀做擋風玻璃上的雨滴檢測《Detection Of Raindrop With Various Shapes On A Windshield》

《Detection of Raindrop with Various Shapes on a Windshield》 1 介紹 2 雨滴檢測方法在白天和夜晚使用不同的演算法。通過整幅影象的強度水平判斷是白天還是夜晚。 2.1 白天的雨滴檢測方法這個方法假設

[深度學習論文筆記][總結]Invariant gait feature extraction based on image transformation

近期有兩篇來自於同一第一作者單位的工作，使用基於神經網路的影象變換模型來處理不同視角、不同衣著或手持物的CEI特徵到統一的90°正常特徵(SPAE與GaitGAN)。在這裡加以簡單總結與對比。 [Neurocomputing 17] Invariant fea

論文筆記：Fisher Kernels on Visual Vocab ularies for Image Categorization

Fisher Kernels on Visual Vocabularies for Image Categorization 論文連結：CVPR 2006 在模式分類領域，Fisher Vector（FV）是一個強有力的構架，因為他結合了生成式（概率密

論文筆記之Learning Convolutional Neural Networks for Graphs

本篇論文是2016ICML上的一篇論文，對於如何將cnn應用在graph上提供了一種新的思路。架構：總體上講，就是用w個固定size=（k+1）的子圖來表示輸入的graph，再將這w個子圖正

論文筆記：目標追蹤-CVPR2014-Adaptive Color Attributes for Real-time Visual Tracking

exploit orm dom ons tail red 最好早期形式化基於自適應顏色屬性的目標追蹤 Adaptive Color Attributes for Real-Time Visual Tracking 基於自適應顏色屬性的實時視覺追蹤 3月講的第一

論文筆記之 SST: Single-Stream Temporal Action Proposals

ron 我們裁剪只需要 lock proposal 數據 function 性能 SST: Single-Stream Temporal Action Proposals 2017-06-11 14:28:00 　　本文提出一種時間維度上的 proposal

Selective Search for Object Recognition 論文筆記【圖片目標分割】

line 單個介紹 images 分層什麽但是如果抽樣這篇筆記，僅僅是對選擇性算法介紹一下原理性知識，不對公式進行推倒. 前言：這篇論文介紹的是，如果快速的找到的可能是物體目標的區域，不像使用傳統的滑動窗口來暴力進行區域識別.這裏是使用算法從多個維度對找

Deep Learning論文筆記之（二）Sparse Filtering稀疏濾波

structure 分布的確 tlab bolt 期望有一個尋找 mean Deep Learning論文筆記之（二）Sparse Filtering稀疏濾波自己平時看了一些論文，但老感覺看完過後就會慢慢的淡忘，某一天重新拾起來的時候又好像沒有

Semi-supervised Segmentation of Optic Cup in Retinal Fundus Images Using Variational Autoencoder 論文筆記

str 很好流程 Coding 測試 eat www tin nal MICCAI 2017年論文 Overview：視杯視盤精確分割後，就可以計算杯盤比了，杯盤比是青光眼疾病的主要manifestation。以往的方法往往采用監督學習的方法，這樣需要大量的精確像素

論文筆記之：Collaborative Deep Reinforcement Learning for Joint Object Search

region format es2017 join sid col str bottom respond Collaborative Deep Reinforcement Learning for Joint Object Search CVPR 2017 Motiva

論文筆記-Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

mach default rap lin -s rnn alias for wrap 針對機器翻譯，提出 RNN encoder-decoder. encoder與decoder是兩個RNN，它們放在一起進行參數學習，最大化條件似然函數。網絡結構：註意輸入語句與

論文筆記-Sequence to Sequence Learning with Neural Networks

map tran between work down all 9.png ever onf 大體思想和RNN encoder-decoder是一樣的，只是用來LSTM來實現。 paper提到三個important point： 1）encoder和decoder的LSTM

論文筆記-Deep Interest Network for Click-Through Rate Prediction

圖片 res 興趣 log through deep pre 出發 amp 重點：認為不同的廣告會觸發用戶的興趣點不同導致user embedding隨之改變。 DIN網絡結構如下圖右邊 DIN的出發點：認為不同的廣告會觸發用戶的興趣點不同導致user embedd

論文筆記-DeepFM: A Factorization-Machine based Neural Network for CTR Prediction

contain feature 比較 san date res 離散 edi post 針對交叉（高階）特征學習提出的DeepFM是一個end-to-end模型，不需要像wide&deep那樣在wide端人工構造特征。網絡結構： sparse feature