Query-Oriented Multi-Document Summarization via Unsupervised Deep Learning

阿新 • • 發佈：2018-11-07

Liu Y, Zhong S H, Li W. Query-oriented multi-document summarization via unsupervised deep learning[C]//
Twenty-Sixth AAAI Conference on Artificial Intelligence. AAAI Press, 2012:1699-1705.
##Abstract

Achieve the largest coverage of the documents content.
目的是達到最大的文件內容覆蓋
Concentrate distributed information to hidden units layer by layer.
通過一層一層的隱藏單元，聚集分散的資訊

The whole deep architecture is fine tuned by minimizing the information loss of reconstruction validation.
最小化重構確認時的資訊丟失
##Relatework
It is very difficult to bridge the gap between
the semantic meanings of the documents and the
basic textual units
建立語義資訊和基本文字單元連線的橋樑很困難
propose a novel framework by referencing the architecture of the human neocortex and the procedure of
intelligent perception

via deep learning
參照人類大腦皮層的結構和智慧感知過程
###已有模型
support vector machine (SVM)
支援向量機-CSDN
支援向量機-部落格園
deep belief network (DBN),
深度信念網路
這是第一篇把深度網路應用到面向查詢的MDS
###文章內容
query-oriented concepts extraction, reconstruction validation for global adjustment,
and summary generation via dynamic programming

##Model
###Deep Learning for Query-oriented Multidocuments Summarization

Dozens of cortical layers are involved in generating even the simplest lexical-semantic processing.
對每一個簡單的詞彙語義加工都要經過數十皮質層
Deep learning has two attractive characters

多重隱層的非線性結構使深度模型能把複雜的問題表示得很簡明，這個特性很好的適應摘要的特性，在可允許的長度
儘量包含更多的資訊。
由於大多數深部模型中的成對隱層重構學習，即使在無監督的情況下，分散式資訊也可以逐層逐層地集中。
這個特性會在大的資料集中受益更多

深度學習可適用大多數領域
eg. image classification，image generation，audio event classification

Deep Architecture

The feature vector fd=[fd1,fd2,…,fdv,…,fdV]
dm，tf value of word in teh vocabulary
For the hidden layer, Restricted Boltzmann Machines (RBMs) are used as building blocks
out put S=[s1,s2,s3,…,sT]
受限玻爾茲曼機學習筆記-很完整
RBM是一種雙層遞迴神經網路，其中隨機二進位制輸入和輸出使用對稱加權連線來連線。 RBM被用作深層模型的構建塊，因為自下而上的連線可以用來從低層特徵推斷更緊湊的高層表示，並且自上而下的連線可以用來驗證所生成的緊湊表示的有效性。除了輸入層以外，深層架構的引數空間是隨機初始化的。第一個RBM的初始引數也由查詢詞決定
In the concept extraction stage, three hidden layers H1 , H2 , and H3 are used to abstract the documents using greedy layer-wise extraction algorithm.使用貪心分層提取演算法
Implement：

H1 used to filter out the words appearing accidentally
H2 is supposed to discover the key words
H3 is used to candidate sentence extraction
Reconstruction validation part intends to reconstruct the data distribution by fine-tuning the whole deep architecture
globally

Query-oriented Concept Extraction

為了整合文件摘要的查詢資訊，我們有兩個不同的過程，包括：查詢面向初始權重設定和查詢導向懲罰
處理。經典的神經網路，初始化都是從u（0,0.01）高斯分佈中隨機得到的。
與此不同的是，我們強化了查詢的影響力。在隨機初始化設定後，如果第i個H0中的節點單詞vi屬於查詢。
在懲罰過程中，查詢詞的重構錯誤比其他懲罰更多

-AF importance matrix
DP is utilized to maximize the query oriented importance of generated summary with the constraint of summary length.
###Reconstruction Validation for Global Adjustment
-Using greedy layer-by-layer algorithm to learn a deep model for concept extraction. 該演算法有良好的全域性搜尋能力
-Using backpropagation through the whole deep model to finetune the parameters [A,b,c] for optimal reconstruction在這個過程中使用反向傳播來調整引數,該演算法有良好的區域性最優解的搜尋能力
###Summary Generation via Dynamic Programming
DP is utilized to maximize the importance of the summary with the length constraint
狀態轉移方程：

##Conclusion
提出了一種新的面向查詢的多文件摘要深度學習模型。該框架繼承了深層學習中優秀的抽取能力，有效地推匯出了重要概念。根據實證驗證在三個標準資料集，結果不僅表明區分QODE提取能力，也清楚地表明我們提供的類似人類的自然語言處理的多文件摘要的意圖。

Query-Oriented Multi-Document Summarization via Unsupervised Deep Learning

Liu Y, Zhong S H, Li W. Query-oriented multi-document summarization via unsupervised deep learning[C]// Twenty-Sixth AAAI Conference on Artificial

多文檔自己主動文摘：Multi-Document Summarization,MDS

tex con src img log multi fontsize doc fill 多文檔自己主動文摘：Multi-Document Summarization,MDS

Ranking with Recursive Neural Networks and Its Application to Multi-document Summarization

Cao Z, Wei F, Dong L, et al. Ranking with recursive neural networks and its application to multi-document summarization[C]// Twenty-Ninth AAAI Con

Fear the REAPER A System for Automatic Multi-Document Summarization with Reinforcement Learning

Cody Rioux, Sadid A. Hasan, Yllias Chali ##Abstract Achieve the largest coverage of the docu ments content.目標的覆蓋整個文件的內容 Concentrate dis

Unsupervised deep learning for data interpolation

Ideally if training data with reference is available we could train the network to reconstruct missing values by comparing reconstruction to the target. Bu

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

摘要先前基於深度學習的最先進的場景文字檢測方法可大致分為兩類。第一類將場景文字視為一般物件的型別，並遵循一般物件檢測範例，通過迴歸文字框位置來定位場景文字，但是受到場景文字的任意方向和大縱橫比的困擾。第二個直接分割文字區域，但大多數需要複雜的後期處理。在本文中，我們提出了一種方法，它結合了兩種

【論文速讀】Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation[2018-CPVR]

方法概述該方法用一個端到端網路完成文字檢測整個過程——除了基礎卷積網路（backbone）外，包括兩個並行分支和一個後處理。第一個分支是通過一個DSSD網路進行角點檢測來提取候選文字區域，第二個分支是利用類似於RFCN進行網格劃分的方式來做position-sensitive的segmentation。後

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation 論文詳解

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation發表於2018年的cvpr，該文章通過結合角點檢測和影象分割來對影象文字進行定位 Introduction 目前文字定

RBM-An approach for text summarization using deep learning algorithm

Padmapriya G, Duraiswamy K. AN APPROACH FOR TEXT SUMMARIZATION USING DEEP LEARNING ALGORITHM[J]. Journal of Computer Science, 2014, 10(1):1-9. ##A

18、Effect of Automatic Hyperparameter Tuning for Residential Load Forecasting via Deep Learning

亮點：自動超引數調整，數學很高深，不容易理解。自動超引數調整的有效性基於深度學習的住宅負荷預測短期住宅負荷預測在本文中，我們擴充套件了一個基於深度長期記憶體（LSTM）的負載，具有自動超引數調整的預測框架針對高度不穩定的住宅負荷解決STLF問題。基於樹結構的Pa

《Transform- and multi-domain deep learning for single-frame rapid autofocusing》筆記

作者的快速聚焦方法是使用卷積網路從單個成像圖片中預測圖片的離焦距。之前的聚焦方法大多需要測量多張成像圖片的聚焦值來預測聚焦鏡頭的移動方向和移動距離，但是論文的方法可以直接預測出聚焦位置的方向和距離。作者使用不同的圖片特徵，包括圖片的空間域特徵、頻域特徵、自相

自動文摘（Automatic document summarization）方法綜述（三）——基於次模函式（submodular function）最大化的方法

自動文摘（Automatic document summarization）方法綜述的第一篇文章（一）總結了基於中心的（Centroid-based）方法和基於圖的（graph-based）方法，第二篇文章（二）總結了基於最優化的（optimization-ba

『論文閱讀』A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems

AbstractMULTI-VIEW-DNN聯合了多個域做的豐富特徵，使用multi-view DNN模型構建推薦，包括app、新聞、電影和TV，相比於最好的演算法，老使用者提升49%，新使用者提升110%。並且可以輕鬆的涵蓋大量使用者，解決冷啟動問題。主要做user embedding的過程，通多使用者在多

遷移學習（transfer learning）、多工學習（multi-task learning）、深度學習（deep learning）概念摘抄

本文在寫作過程中參考了諸多前輩的部落格、論文、筆記等。由於人數太多，在此不一一列出，若有侵權，敬請告知，方便我進行刪改，謝謝！！！遷移學習（Transfer Learning）遷移學習出現的背景如下：在一些新興領域很難得到我們需要的大量的訓練資料，另外，傳統的機器學習

影象隱寫術分析論文筆記：Deep learning for steganalysis via convolutional neural networks

好久沒有寫論文筆記了，這裡開始一個新任務，即影象的steganalysis任務的深度網路模型。現在是論文閱讀階段，會陸續分享一些相關論文，以及基礎知識，以及傳統方法的思路，以資借鑑。這一篇是Media Watermarking, Security, and Forensi

【Person Re-ID】Person Re-Identification by Deep Learning Multi-Scale Representations

Introduction 本文主要的考慮是：目前所有re-id的方法，不管是不是基於CNN的，第一步都是將影象resize到統一大小然後再來提取特徵。這種做法會模糊掉影象在不同尺度上的顯著性特徵，因此作者認為這樣的方法不是最優的，本文提出了一種在多尺度上提

【Deep Learning】深度學習影象標註工具VGG Image Annotator (VIA)使用教程

VGG Image Annotator (VIA)是一款開源的影象標註工具，由Visual Geometry Group開發。可以線上和離線使用，可標註矩形、圓、橢圓、多邊形、點和線。標註完成後，可以匯出為csv和json檔案格式。一、新增圖片點選via.ht

自動文摘（Automatic document summarization）方法綜述（四）——基於神經網路的（neural summarization）方法

前三篇部落格（一）、（二）、（三）總結了抽取式自動文摘的一些經典方法，傳統抽取式自動文摘方法將文件簡單地看作是一組文字單元（短語、句子等）的集合，忽略了文件所表達的全域性語義，難免“斷章取義”。隨著算力的提升，深度學習在很多應用中非常的火熱也取得了state-o

無監督特徵學習——Unsupervised feature learning and deep learning

無監督學習近年來很熱，先後應用於computer vision, audio classification和 NLP等問題，通過機器進行無監督學習feature得到的結果，其accuracy大多明顯優於其他方法進行training。本文將主要針對Andrew的unsuperv

(轉) Learning Deep Learning with Keras

trees create pda sse caffe latex .py encode you Learning Deep Learning with Keras Piotr Migda? - blog Projects Articles Publications Res

Query-Oriented Multi-Document Summarization via Unsupervised Deep Learning

Deep Architecture

Query-oriented Concept Extraction

相關推薦