Life's Joy & Comfortable

阿新 • • 發佈：2018-12-31

學習句子分類，使用深度學習的方法對句子資料集進行分類。

問題

句子分類（Sentence Classification）是指給定一個句子，標註預先設定的若干類別中的一個類別。

句子分類包括情感分析（Sentiment Analysis）、問題分類（Question
Classification）等任務。情感分析又稱傾向性分析、意見抽取（Opinion extraction）、意見挖掘（Opinion mining）、情感挖掘（Sentiment mining）、主觀分析（Subjectivity analysis），它是對帶有情感色彩的主觀性文字進行分析、處理、歸納和推理的過程，如從評論文字中分析使用者對“數碼相機”的“變焦、價格、大小、重量、閃光、易用性”等屬性的情感傾向。

應用

瞭解對電影、商品、Twitter 等的褒貶評價，以此來改善產品和服務、發現競爭對手的優劣勢、預測股票走勢等。

資料集

Data	c	l	N	\|V\|	\|V_pre\|	Test
MR	2	20	10662	18765	16448	CV
SST-1	5	18	11855	17836	16262	2210
SST-2	2	19	9613	16185	14838	1821
Subj	2	23	10000	21323	17913	CV
TREC	6	10	5952	9592	9125	500
CR	2	19	3775	5340	5046	CV
MPQA	2	3	10606	6246	6083	CV

- MR: Movie reviews 電影評論，每條評論包含一個句子。1

SST-1: Stanford Sentiment Treebank，MR 的擴充套件但劃分了 train/dev/test 集合並提供 5 個細粒度標籤（非常積極的，積極的，中性的，負面的，非常消極的）。
SST-2: 與 SST-1 一樣但移除中性評論並用二進位制標籤。

2
Subj: Subjectivity 主觀性資料集，任務是將句子分類為主觀或客觀的。3
TREC: TREC question dataset TREC 問題資料集，任務是將一個問題分成 6 類（關於人、位置、數字資訊等）。4
CR: Customer reviews 各種產品的客戶評論，任務是預測正面/負面評論。5
MPQA: MPQA 資料集意見極性檢測任務。6

方法

通常會把任務拆分成幾個子任務：

分詞

把句子根據意思分成多個詞，有時可能還需要去掉停用詞、瞭解詞性、轉換成詞向量等操作。
提取特徵

有時我們不會直接使用分詞後的多個詞來直接分類，這時需要提取特徵來方便分類。

常用特徵：TF-IDF、LDA、LSI
構建分類器

輸入特徵或詞向量等，通過一些模型，對該句子進行分類。

Naive Bayes

NBSVM: Naive Bayes SVM

MNB: Multinomial Naive Bayes 7

combine-skip

combine-skip + NB 8

Model	MR	SST-1	SST-2	Subj	TREC	CR	MPQA
NBSVM	79.4	-	-	93.2	-	81.8	86.3
MNB	79.0	-	-	93.6	-	80.0	86.3
combine-skip	76.5	-	-	93.6	92.2	80.1	87.1
combine-skip+NB	80.4	-	-	93.6	-	81.3	87.5

RNN

RCNN: Recurrent Convolutional Neural Networks 9

S-LSTM: Long Short-Term Memory Over Recursive Structures 10

LSTM: Long Short-Term Memory

BLSTM: Bidirectional Long Short-Term Memory

Tree-LSTM: Tree-structured Long Short-Term Memory 11

LSTMN: Long Short-Term Memory-Network 12

Multi-Task: Recurrent Neural Network for Text Classification with Multi-Task Learning 13

BLSTM-Att: Bidirectional Long Short-Term Memory, attention-based model

BLSTM-2DPooling: Bidirectional Long Short-Term Memory Networks with Two-Dimensional Max Pooling

BLSTM-2DCNN: Bidirectional Long Short-Term Memory Networks with 2D convolution 14

Model	MR	SST-1	SST-2	Subj	TREC	CR	MPQA
RCNN	-	47.21	-	-	-	-	-
S-LSTM	-	-	81.9	-	-	-	-
LSTM	-	46.4	84.9	-	-	-	-
BLSTM	-	49.1	87.5	-	-	-	-
Tree-LSTM	-	51.0	88.0	-	-	-	-
LSTMN	-	49.3	87.3	-	-	-	-
Multi-Task	-	49.6	87.9	94.1	-	-	-
BLSTM	80.0	49.1	87.6	92.1	93.0	-	-
BLSTM-Att	81.0	49.8	88.2	93.5	93.8	-	-
BLSTM-2DPooling	81.5	50.5	88.3	93.7	94.8	-	-
BLSTM-2DCNN	82.3	52.4	89.5	94.0	96.1	-	-

CNN

DCNN: Dynamic Convolutional Neural Network 15

CNN-non-static: Convolutional Neural Networks, the pretrained vectors are fine-tuned for each task

CNN-multichannel: Convolutional Neural Networks with two sets of word vectors 16

TBCNN: Tree-based Convolutional Neural Network 17

Molding-CNN: Molding Convolutional Neural Networks 18

CNN-Ana: Non-static GloVe+word2vec CNN 19

MVCNN: Multichannel Variable-Size Convolution 20

DSCNN: Dependency Sensitive Convolutional Neural Networks 21

Model	MR	SST-1	SST-2	Subj	TREC	CR	MPQA
DCNN	-	48.5	86.8	-	93.0	-	-
CNN-non-static	81.5	48.0	87.2	93.4	93.6	84.3	89.5
CNN-multichannel	81.1	47.4	88.1	93.2	92.2	85.0	89.4
TBCNN	-	51.4	87.9	-	96.0	-	-
Molding-CNN	-	51.2	88.6	-	-	-	-
CNN-Ana	81.02	45.98	85.45	93.66	91.37	84.65	89.55
MVCNN	-	49.6	89.4	-	-	-	-
DSCNN	81.5	49.7	89.1	93.2	95.4	-	-

Others

RAE: Recursive Autoencoders with pre-trained word vectors from Wikipedia 22

AdaSent: self-adaptive hierarchical sentence model 23

RNTN: Recursive Neural Tensor Network 24

DRNN: Deep Recursive Neural Networks 25

Model	MR	SST-1	SST-2	Subj	TREC	CR	MPQA
RAE	77.7	43.2	82.4	-	-	-	86.4
AdaSent	83.1	-	-	95.5	92.4	86.3	93.3
RNTN	-	45.7	85.4	-	-	-	-
DRNN	-	49.8	86.6	-	-	-	-

參考

(ACL 2005) Seeing Stars: Exploiting Class Relationships For Sentiment Categorization With Respect To Rating Scales https://www.cs.cornell.edu/people/pabo/movie-review-data/ ↩
(EMNLP 2013) Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank https://nlp.stanford.edu/sentiment/ ↩
(ACL 2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts http://www.cs.cornell.edu/people/pabo/movie-review-data ↩
(Language Resources and Evaluation 2005) Annotating Expressions Of Opinions And Emotions In Language http://mpqa.cs.pitt.edu/ ↩
(ACL 2012) Baselines and Bigrams: Simple, Good Sentiment and Topic Classification ↩
(NIPS 2015) Skip-Thought Vectors ↩
(AAAI 2015) Recurrent Convolutional Neural Networks for Text Classification ↩
(ICML 2015) Long Short-Term Memory Over Recursive Structures ↩
(ACL 2015) Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks ↩
(EMNLP2016) Long Short-Term Memory-Networks for Machine Reading ↩
(IJCAI 2016) Recurrent Neural Network for Text Classification with Multi-Task Learning ↩
(COLING 2016) Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling ↩
(ACL 2014) A Convolutional Neural Network for Modelling Sentences ↩
(EMNLP 2014) Convolutional Neural Networks for Sentence Classification ↩
(EMNLP 2015) Discriminative Neural Sentence Modeling by Tree-Based Convolution ↩
(EMNLP 2015) Molding CNNs for text: non-linear, non-consecutive convolutions ↩
(IJCNLP 2017) A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification ↩
(CoNLL 2015) Multichannel Variable-Size Convolution for Sentence Classification ↩
(NAACL 2016) Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents ↩
(EMNLP 2011) Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions ↩
(IJCAI 2015) Self-adaptive hierarchical sentence model ↩
(EMNLP 2013) Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank ↩
(NIPS 2014) Deep Recursive Neural Networks for Compositionality in Language ↩

Life's Joy & Comfortable

問題

應用

資料集

方法

Naive Bayes

RNN

CNN

Others

參考

Life's Joy & Comfortable

Tech Shapes Life's Better !

【天道酬勤】Mama said, life's like a box of chocolate, you never know what you gonna get next.

Digital immortality: How your life's data means a version of you could live forever

The Architecture of Life’s Source Code

Life 's a struggle!

poj 2492 a bug's life 簡單種類並查集

poj2492--A Bug's Life(並查集變形)

everything of people’s life can changed in their twenties

POJ 2492 A Bug's Life （並查集）

【轉】Life of a triangle - NVIDIA's logical pipeline

asp.net: what's the page life cycle order of a control/page compared to a user contorl inside it?

poj2492 A Bug's Life（帶權並查集）

poj2492A Bug's Life——帶權並查集

HDU1289 A Bug's Life （帶權並查集）

poj 2492 A Bug's Life

18.9.17 poj2492 A Bug's Life

POJ 2492 J-A Bug's Life

A Bug's Life （並查集）

【ZCMU1437】A Bug's Life（種類並查集）

Life's Joy & Comfortable

問題

應用

資料集

方法

Naive Bayes

RNN

CNN

Others

參考

相關推薦