1. 程式人生 > 實用技巧 >NLP常見任務介紹

NLP常見任務介紹

一、英文常見任務

Glue資料下載地址:https://gluebenchmark.com/tasks

1、CoLA

1.1 概念

CoLA(The Corpus of Linguistic Acceptability),在nlp裡面是一個單句分類任務,該任務目的是:The CoLA taskis to predict whether an English sentence is grammaticallyplausible.即預測英語句子在語法上是否合理。

1.2 資料介紹

下載後的train.tsv和dev.tsv檔案中的每一行包含4個製表符(‘\t’)分隔的列。

第1列: 代表句子的來源。
第2列: 語法是否可接受(0 =不可接受,1 =可接受)。
第3列: 作者最初指定的可接受性判斷。
第4列: 文字描述

test.tsv每一行包含2個製表符分隔的列。

第一列  樣本標號,從0開始計數,0表示第一條樣本,1表示第二條,以此類推
第二列   文字描述

train與dev舉例:

clc95 表示句子的來源,0和1代表label標籤,*表示作者最初指定的label

clc95    0    *    They noticed the painting, but I don't know for how long.
clc95    0    *    John was tall, but I don'
t know on what occasions. clc95 1 Joan ate dinner with someone but I don't know who. clc95 1 Joan ate dinner with someone but I don't know who with. clc95 0 * I know that Meg's attracted to Harry, but they don't know who. clc95 0 * Since Jill said Joe had invited Sue, we didn'
t have to ask who.

test舉例:

index表示句子編號,sentence就是文字內容

index    sentence
0    Bill whistled past the house.
1    The car honked its way down the road.
2    Bill pushed Harry off the sofa.
3    the kittens yawned awake and played.
4    I demand that the more John eats, the more he pay.
5    If John eats more, keep your mouth shut tighter, OK?
6    His expectations are always lower than mine are.

1.3 評估指標

MCC:TheMatthews correlation coefficient is used inmachine learningas a measure of the quality of binary (two-class)classifications.即馬修斯相關係數在機器學習中用於衡量二進位制(兩類)分類的質量

MCC本質上是觀察到的和預測的二分類之間的相關係數, 考慮了TP、TN、FP、FN,即使正負類別差異較大時也可以當做一種度量方式,值介於[-1, 1]之間,1表示完美的預測,0不比隨機預測的好,-1表示觀察到的和預測完全不一致

計算公式:

如果分母中的四個和中的任何一個為零,則分母可以設定為1,此時馬修斯相關係數為零,可以通過求極限值進行證明。

2、SST-2

2.1 概念

SST-2(The Stanford Sentiment Treebank),在nlp裡面是一個單句分類任務,該任務目的是:The SST-2 task is to determinewhether the sentiment of a sentence extractedfrom movie reviews is positive or negative.即判斷電影評論的情感是差評還是好評。

2.2 資料介紹

train.tsv與dev.tsv,每一行包含2個製表符(‘\t’)分隔的列。

第一列 評論語句
第二列 情感標籤,1是好評,0是差評

test.tsv每一行包含2個製表符分隔的列。

第一列  樣本編號,從0開始計數,0表示第一條樣本,1表示第二條,以此類推
第二列   評論語句

train與dev舉例:

sentence就是評論語句,label是情感標籤

sentence    label
hide new secretions from the parental units     0
contains no wit , only labored gags     0
that loves its characters and communicates something rather beautiful about human nature     1
remains utterly satisfied to remain the same throughout     0
on the worst revenge-of-the-nerds clichés the filmmakers could dredge up     0
that 's far too tragic to merit such superficial treatment     0
demonstrates that the director of such hollywood blockbusters as patriot games can still turn out a small , personal film with an 

test舉例:

index表示句子編號,sentence就是文字內容

index    sentence
0    uneasy mishmash of styles and genres .
1    this film 's relationship to actual tension is the same as what christmas-tree flocking in a spray can is to actual snow : a poor -- if durable -- imitation .
2    by the end of no such thing the audience , like beatrice , has a watchful affection for the monster .
3    director rob marshall went out gunning to make a great one .
4    lathan and diggs have considerable personal charm , and their screen rapport makes the old story seem new .
5    a well-made and often lovely depiction of the mysteries of friendship .

2.3 評估指標

ACC:準確率,正確預測的正反例數 / 總數

計算公式:

TP:正例預測正確的個數
FP:負例預測錯誤的個數
TN:負例預測正確的個數
FN:正例預測錯誤的個數

3、STS-B

3.1 概念

STS-B(Semantic Textual Similarity Benchmark),迴歸問題,給定一個句子對兒,模型預測一個[0, 5]之間的分數表示兩句話的語義相似度。

3.2 資料介紹

STS基準測試包括在2012年至2017年之間根據SemEval組織的STS任務中使用英語資料集。資料集的選擇包括影象標題(image captions),新聞標題(news headlines)和使用者論壇(user forums)中的文字。

train.tsv與dev.tsv,每一行包含10個製表符(‘\t’)分隔的列,主要看後三列。

index 樣本編號,從0開始計數,0表示第一條樣本,1表示第二條,以此類推
genre 三個來源,captions、news和forum
filename 檔名
year 年份
old_index 舊的索引
source1 句子1來源
source2 句子2來源
sentence1 句子1
sentence2 句子2
score 情感相似度分數

舉例:

index    genre    filename    year    old_index    source1    source2    sentence1    sentence2    score
0    main-captions    MSRvid    2012test    0001    none    none    A plane is taking off.    An air plane is taking off.    5.000
1    main-captions    MSRvid    2012test    0004    none    none    A man is playing a large flute.    A man is playing a flute.    3.800
2    main-captions    MSRvid    2012test    0005    none    none    A man is spreading shreded cheese on a pizza.    A man is spreading shredded cheese on an uncooked pizza.    3.800
3    main-captions    MSRvid    2012test    0006    none    none    Three men are playing chess.    Two men are playing chess.    2.600
4    main-captions    MSRvid    2012test    0009    none    none    A man is playing the cello.    A man seated is playing the cello.    4.250
5    main-captions    MSRvid    2012test    0011    none    none    Some men are fighting.    Two men are fighting.    4.250
6    main-captions    MSRvid    2012test    0012    none    none    A man is smoking.    A man is skating.    0.500

test.tsv,每一行包含9個製表符(‘\t’)分隔的列,主要看後2列。

index 樣本編號,從0開始計數,0表示第一條樣本,1表示第二條,以此類推
genre 三個來源,captions、news和forum
filename 檔名
year 年份
old_index 舊的索引
source1 句子1來源
source2 句子2來源
sentence1 句子1
sentence2 句子2

舉例:

index    genre    filename    year    old_index    source1    source2    sentence1    sentence2
0    main-captions    MSRvid    2012test    0024    none    none    A girl is styling her hair.    A girl is brushing her hair.
1    main-captions    MSRvid    2012test    0033    none    none    A group of men play soccer on the beach.    A group of boys are playing soccer on the beach.
2    main-captions    MSRvid    2012test    0045    none    none    One woman is measuring another woman's ankle.    A woman measures another woman's ankle.
3    main-captions    MSRvid    2012test    0063    none    none    A man is cutting up a cucumber.    A man is slicing a cucumber.
4    main-captions    MSRvid    2012test    0066    none    none    A man is playing a harp.    A man is playing a keyboard.
5    main-captions    MSRvid    2012test    0074    none    none    A woman is cutting onions.    A woman is cutting tofu.
6    main-captions    MSRvid    2012test    0076    none    none    A man is riding an electric bicycle.    A man is riding a bicycle.
7    main-captions    MSRvid    2012test    0082    none    none    A man is playing the drums.    A man is playing the guitar.
8    main-captions    MSRvid    2012test    0092    none    none    A man is playing guitar.    A lady is playing the guitar.
9    main-captions    MSRvid    2012test    0095    none    none    A man is playing a guitar.    A man is playing a trumpet.

3.3 評估指標

Pearson-Spearman Corr 皮爾遜-斯皮爾曼相關係數,這裡是兩個方法的合稱,實則是兩種計算方式。(在百度的ERNIE模型中,計算了三個值,分別為皮爾遜相關係數,斯皮爾曼相關係數以及二者的平均值。)

相關係數衡量X、Y量變數之間的相關程度,取值在[-1, 1]之間。

0表示X和Y兩個變數無關;當X和Y同趨增大或者減少時,二者呈正相關,相關係數取值在(0, 1];當X和Y異趨增大或者減少時,即X和Y變化相反,二者呈負相關,取值在[-1, 0)。相關係數的絕對值越大,相關性越強,相關係數越接近於1或-1,相關度越強,相關係數越接近於0,相關度越弱。

注意: 計算係數的時候會涉及兩個值,p值和r值,r值表就是上面公式計算的相關性的大小;p值是檢驗值,表示顯著性,一般P小於0.05時表示顯著,即在當前的樣本下可以明顯的觀察到兩變數的相關,兩個變數的相關有統計學意義。如果只看r值是有偏差的,兩者之間的相關可能由於偶然因素引起的。

4、MRPC

4.1 概念

MRPC(Microsoft Research Paraphrase Corpus),句子對兒分類問題,The task is to predict whether each pair captures a paraphrase/semantic equivalence,即給定一個句子對兒,判斷他們在語義上是否相同。

4.2 資料介紹

資料包含從網路新聞源中提取的5800對句子,以及表示每個句對是否在釋義​​/語義上是相同的。

train.tsv和dev.tsv,每一行包含5個製表符(‘\t’)分隔的列。

Quality label標籤,句對語義相同label=1,否則label=0
#1 ID 第一句話的id
#2 ID 第二句話的id
#1 String 第一句話文字
#2 String 第二句話的文字

舉例:

Quality    #1 ID    #2 ID    #1 String    #2 String
1    702876    702977    Amrozi accused his brother , whom he called " the witness " , of deliberately distorting his evidence .    Referring to him as only " the witness " , Amrozi accused his brother of deliberately distorting his evidence .
0    2108705    2108831    Yucaipa owned Dominick 's before selling the chain to Safeway in 1998 for $ 2.5 billion .    Yucaipa bought Dominick 's in 1995 for $ 693 million and sold it to Safeway for $ 1.8 billion in 1998 .
1    1330381    1330521    They had published an advertisement on the Internet on June 10 , offering the cargo for sale , he added .    On June 10 , the ship 's owners had published an advertisement on the Internet , offering the explosives for sale .
0    3344667    3344648    Around 0335 GMT , Tab shares were up 19 cents , or 4.4 % , at A $ 4.56 , having earlier set a record high of A $ 4.57 .    Tab shares jumped 20 cents , or 4.6 % , to set a record closing high at A $ 4.57 .
1    1236820    1236712    The stock rose $ 2.11 , or about 11 percent , to close Friday at $ 21.51 on the New York Stock Exchange .    PG & E Corp. shares jumped $ 1.63 or 8 percent to $ 21.03 on the New York Stock Exchange on Friday .

test.tsv,每一行包含5個製表符(‘\t’)分隔的列。

index 樣本編號,從0開始計數,0表示第一條樣本,1表示第二條,以此類推
#1 ID 第一句話的id
#2 ID 第二句話的id
#1 String 第一句話文字
#2 String 第二句話的文字

舉例:

index    #1 ID    #2 ID    #1 String    #2 String
0    1089874    1089925    PCCW 's chief operating officer , Mike Butcher , and Alex Arena , the chief financial officer , will report directly to Mr So .    Current Chief Operating Officer Mike Butcher and Group Chief Financial Officer Alex Arena will report to So .
1    3019446    3019327    The world 's two largest automakers said their U.S. sales declined more than predicted last month as a late summer sales frenzy caused more of an industry backlash than expected .    Domestic sales at both GM and No. 2 Ford Motor Co. declined more than predicted as a late summer sales frenzy prompted a larger-than-expected industry backlash .
2    1945605    1945824    According to the federal Centers for Disease Control and Prevention ( news - web sites ) , there were 19 reported cases of measles in the United States in 2002 .    The Centers for Disease Control and Prevention said there were 19 reported cases of measles in the United States in 2002 .
3    1430402    1430329    A tropical storm rapidly developed in the Gulf of Mexico Sunday and was expected to hit somewhere along the Texas or Louisiana coasts by Monday night .    A tropical storm rapidly developed in the Gulf of Mexico on Sunday and could have hurricane-force winds when it hits land somewhere along the Louisiana coast Monday night .
4    3354381    3354396    The company didn 't detail the costs of the replacement and repairs .    But company officials expect the costs of the replacement work to run into the millions of dollars .
5    1390995    1391183    The settling companies would also assign their possible claims against the underwriters to the investor plaintiffs , he added .    Under the agreement , the settling companies will also assign their potential claims against the underwriters to the investors , he added .

4.3 評估指標

1)ACC

2)F1:precision和recall的綜合平均[調和平均數],既可以兼顧precision又可以兼顧recall。F1_score越高說明precision和recall達到了一個很高的平衡點。

計算公式:

其中:

5、QQP

5.1 概念

QQP(Quora Question Pairs) ,類似MRPC,也是句子對兒分類任務,檢測成對的文字是否實際上對應於語義等效的查詢。

5.2 資料介紹

資料集的釋出是針對與Quora相關的各種問題,資料集包含超過40萬行潛在的問題重複對兒。每行包含該對中每個問題的ID,每個問題的全文以及指示該行是否確實包含重複對的二進位制值。

train.tsv和dev.tsv,每一行包含5個製表符(‘\t’)分隔的列。

id id編號
qid1 第一句話的id
qid2 第二句話的id
question1 第一個問題的文字
question2 第二個問題的文字
is_duplicate label標籤,兩句話語義是否重複,重複label=1,否則label=0

舉例:

id    qid1    qid2    question1    question2    is_duplicate
133273    213221    213222    How is the life of a math student? Could you describe your own experiences?    Which level of prepration is enough for the exam jlpt5?    0
402555    536040    536041    How do I control my horny emotions?    How do you control your horniness?    1
360472    364011    490273    What causes stool color to change to yellow?    What can cause stool to come out as little balls?    0
150662    155721    7256    What can one do after MBBS?    What do i do after my MBBS ?    1
183004    279958    279959    Where can I find a power outlet for my laptop at Melbourne Airport?    Would a second airport in Sydney, Australia be needed if a high-speed rail link was created between Melbourne and Sydney?    0
119056    193387    193388    How not to feel guilty since I am Muslim and I'm conscious we won't have sex together?    I don't beleive I am bulimic, but I force throw up atleast once a day after I eat something and feel guilty. Should I tell somebody, and if so who?    0

test.tsv,每一行包含3個製表符(‘\t’)分隔的列。

id 樣本編號,從0開始計數,0表示第一條樣本,1表示第二條,以此類推
question1 問題1文字
question2 問題2文字

舉例:

id    question1    question2
0    Would the idea of Trump and Putin in bed together scare you, given the geopolitical implications?    Do you think that if Donald Trump were elected President, he would be able to restore relations with Putin and Russia as he said he could, based on the rocky relationship Putin had with Obama and Bush?
1    What are the top ten Consumer-to-Consumer E-commerce online?    What are the top ten Consumer-to-Business E-commerce online?
2    Why don't people simply 'Google' instead of asking questions on Quora?    Why do people ask Quora questions instead of just searching google?
3    Is it safe to invest in social trade biz?    Is social trade geniune?
4    If the universe is expanding then does matter also expand?    If universe and space is expanding? Does that mean anything that occupies space is also expanding?
5    What is the plural of hypothesis?    What is the plural of thesis?

5.3 評估指標

1)ACC

2)F1

6、MNLI

6.1 概念

MNLI(Multi-Genre Natural Language Inference),自然語言推斷任務,where the goal is to predictwhether a sentence is an entailment,contradiction,or neutral with respect to the other.即預測兩個句子,是entailment(相近的),contradiction(矛盾的)還是neutral(中立的)

6.2 資料介紹

MultiNLI自然推斷語料庫是一個眾包的433k句子對的集合,帶有文字蘊含資訊。語料庫以SNLI語料庫為模型,但是不同之處在於它涵蓋了多種口語和書面語體,並支援獨特的跨語體泛化評估。MNLI測試集與驗證集分為兩類,matched和mismatched,訓練的時候直接使用train.tsv訓練,驗證和測試的時候分別用matched和mismatched的資料集進行評估。

train.tsv,每一行包含12個製表符(‘\t’)分隔的列;dev_matched.tsv和dev_dismatched.tsv,每一行包含16個製表符(‘\t’)分隔的列。無論train還是dev,主要看0、8、9與最後一列。

index 樣本編號,從0開始計數,0表示第一條樣本,1表示第二條,以此類推
sentence1 第一句話
sentence2 第二句話
gold_label label=(entailment、contradiction、neutral)

test_matched.tsv和test_dismatched.tsv,每一行包含10個製表符(‘\t’)分隔的列。無論train還是dev,主要看0、8、9三列。

index 樣本編號,從0開始計數,0表示第一條樣本,1表示第二條,以此類推
sentence1 第一句話
sentence2 第二句話

6.3 評估指標

ACC

7、QNLI

7.1 概念

QNLI(Question Natural Language Inference),二分類任務,The task involvesassessing whether a sentence contains thecorrect answer to a given query.即評估sentence中是否包含了question的答案。

7.2 資料介紹

資料集是斯坦福問答資料集StanfordQuestionAnsweringDataset (SQuAD),是一個閱讀理解資料集,由工作者在一組Wikipedia文章上提出的問題組成,其中每個問題的答案都是對應閱讀段落的一段文字或跨度,否則該問題可能無法回答。

train.tsv和dev.tsv,每一行包含4個製表符(‘\t’)分隔的列

index 資料集索引
question  問題
sentence 答案
label 問題與答案匹配label=entailment,否則label=not_entailment

舉例

index    question    sentence    label
0    What came into force after the new constitution was herald?    As of that day, the new constitution heralding the Second Republic came into force.    entailment
1    What is the first major city in the stream of the Rhine?    The most important tributaries in this area are the Ill below of Strasbourg, the Neckar in Mannheim and the Main across from Mainz.    not_entailment
2    What is the minimum required if you want to teach in Canada?    In most provinces a second Bachelor's Degree such as a Bachelor of Education is required to become a qualified teacher.    not_entailment
3    How was Temüjin kept imprisoned by the Tayichi'ud?    The Tayichi'ud enslaved Temüjin (reportedly with a cangue, a sort of portable stocks), but with the help of a sympathetic guard, the father of Chilaun (who later became a general of Genghis Khan), he was able to escape from the ger (yurt) in the middle of the night by hiding in a river crevice.[citation needed]    entailment
4    What did Herr Gott, dich loben wir become known as ?    He paraphrased the Te Deum as "Herr Gott, dich loben wir" with a simplified form of the melody.    not_entailment

test.tsv,每一行包含3個製表符(‘\t’)分隔的列

index 資料集索引
question  問題
sentence 答案

舉例

index    question    sentence
0    What organization is devoted to Jihad against Israel?    For some decades prior to the First Palestine Intifada in 1987, the Muslim Brotherhood in Palestine took a "quiescent" stance towards Israel, focusing on preaching, education and social services, and benefiting from Israel's "indulgence" to build up a network of mosques and charitable organizations.
1    In what century was the Yarrow-Schlick-Tweedy balancing system used?    In the late 19th century, the Yarrow-Schlick-Tweedy balancing 'system' was used on some marine triple expansion engines.
2    The largest brand of what store in the UK is located in Kingston Park?    Close to Newcastle, the largest indoor shopping centre in Europe, the MetroCentre, is located in Gateshead.
3    What does the IPCC rely on for research?    In principle, this means that any significant new evidence or events that change our understanding of climate science between this deadline and publication of an IPCC report cannot be included.
4    What is the principle about relating spin and space variables?    Thus in the case of two fermions there is a strictly negative correlation between spatial and spin variables, whereas for two bosons (e.g. quanta of electromagnetic waves, photons) the correlation is strictly positive.

7.3 評估指標

ACC

8、RTE

8.1 概念

RTE(Recognizing Textual Entailment),分類任務,類似MNLI,This task requires to recognize, given two text fragments, whether the meaning of one text is entailed (can be inferred) from the other text.即在給定兩個文字片段的情況下,此任務需要識別一個文字的含義是否可以被另一文字推斷出來。

8.2 資料介紹

RTE一項通用任務,可以捕獲許多NLP應用程式中的主要語義推理需求,例如問題回答,資訊檢索,資訊提取和文字摘要。

train.tsv和dev.tsv,每一行包含4個製表符(‘\t’)分隔的列。

index 樣本編號,從0開始計數,0表示第一條樣本,1表示第二條,以此類推
sentence1 第一句話
sentence2 第二句話
label label=(entailment、not_entailment)
index    sentence1    sentence2    label
0    Dana Reeve, the widow of the actor Christopher Reeve, has died of lung cancer at age 44, according to the Christopher Reeve Foundation.    Christopher Reeve had an accident.    not_entailment
1    Yet, we now are discovering that antibiotics are losing their effectiveness against illness. Disease-causing bacteria are mutating faster than we can come up with new antibiotics to fight the new variations.    Bacteria is winning the war against antibiotics.    entailment
2    Cairo is now home to some 15 million people - a burgeoning population that produces approximately 10,000 tonnes of rubbish per day, putting an enormous strain on public services. In the past 10 years, the government has tried hard to encourage private investment in the refuse sector, but some estimate 4,000 tonnes of waste is left behind every day, festering in the heat as it waits for someone to clear it up. It is often the people in the poorest neighbourhoods that are worst affected. But in some areas they are fighting back. In Shubra, one of the northern districts of the city, the residents have taken to the streets armed with dustpans and brushes to clean up public areas which have been used as public dumps.    15 million tonnes of rubbish are produced daily in Cairo.    not_entailment
3    The Amish community in Pennsylvania, which numbers about 55,000, lives an agrarian lifestyle, shunning technological advances like electricity and automobiles. And many say their insular lifestyle gives them a sense that they are protected from the violence of American society. But as residents gathered near the school, some wearing traditional garb and arriving in horse-drawn buggies, they said that sense of safety had been shattered. "If someone snaps and wants to do something stupid, there's no distance that's going to stop them," said Jake King, 56, an Amish lantern maker who knew several families whose children had been shot.    Pennsylvania has the biggest Amish community in the U.S.    not_entailment
4    Security forces were on high alert after an election campaign in which more than 1,000 people, including seven election candidates, have been killed.    Security forces were on high alert after a campaign marred by violence.    entailment
5    In 1979, the leaders signed the Egypt-Israel peace treaty on the White House lawn. Both President Begin and Sadat received the Nobel Peace Prize for their work. The two nations have enjoyed peaceful relations to this day.    The Israel-Egypt Peace Agreement was signed in 1979.    entailment

test.tsv,每一行包含3個製表符(‘\t’)分隔的列。

index 資料集索引
sentence1  句子1
sentence2 句子2

舉例

index    sentence1    sentence2
0    Mangla was summoned after Madhumita's sister Nidhi Shukla, who was the first witness in the case.    Shukla is related to Mangla.
1    Authorities in Brazil say that more than 200 people are being held hostage in a prison in the country's remote, Amazonian-jungle state of Rondonia.    Authorities in Brazil hold 200 people as hostage.
2    A mercenary group faithful to the warmongering policy of former Somozist colonel Enrique Bermudez attacked an IFA truck belonging to the interior ministry at 0900 on 26 March in El Jicote, wounded and killed an interior ministry worker and wounded five others.    An interior ministry worker was killed by a mercenary group.
3    The British ambassador to Egypt, Derek Plumbly, told Reuters on Monday that authorities had compiled the list of 10 based on lists from tour companies and from families whose relatives have not been in contact since the bombings.    Derek Plumbly resides in Egypt.
4    Tibone estimated diamond production at four mines operated by Debswana -- Botswana's 50-50 joint venture with De Beers -- could reach 33 million carats this year.    Botswana is a business partner of De Beers.
5    His wife Strida won a seat in parliament after forging an alliance with the main anti-Syrian coalition in the recent election.    Strida elected to parliament.

8.3 評估指標

ACC

9、WNLI

9.1 概念

WNLI(Winograd NLI),二分類任務,判斷兩個句子含義是否一樣。

9.2 資料介紹

Winograd模式是一對句子,它們之間只有一個或兩個單詞不同,並且有可能包含歧義。

例如:

1)警察逮捕了所有團伙成員,他們試圖阻止附近的毒品交易。 警察試圖阻止附近的毒品交易。這兩句話都表示警察阻止毒品交易,label=1

2)史蒂夫在所有事情上都遵循弗雷德的榜樣,他對他影響很大。 史蒂夫對他的影響很大。第一句話可能表示弗雷德對史蒂夫影響很大,這就與第二句意思不同,兩句話有歧義,label=0

train.tsv和dev.tsv,每一行包含4個製表符(‘\t’)分隔的列。

index 樣本編號,從0開始計數,0表示第一條樣本,1表示第二條,以此類推
sentence1 第一句話
sentence2 第二句話
label 兩句話意思相同label=1,否則label=0

舉例

index    sentence1    sentence2    label
0    I stuck a pin through a carrot. When I pulled the pin out, it had a hole.    The carrot had a hole.    1
1    John couldn't see the stage with Billy in front of him because he is so short.    John is so short.    1
2    The police arrested all of the gang members. They were trying to stop the drug trade in the neighborhood.    The police were trying to stop the drug trade in the neighborhood.    1
3    Steve follows Fred's example in everything. He influences him hugely.    Steve influences him hugely.    0
4    When Tatyana reached the cabin, her mother was sleeping. She was careful not to disturb her, undressing and climbing back into her berth.    mother was careful not to disturb her, undressing and climbing back into her berth.    0
5    George got free tickets to the play, but he gave them to Eric, because he was particularly eager to see it.    George was particularly eager to see it.    0

test.tsv,每一行包含3個製表符(‘\t’)分隔的列。

index 資料集索引
sentence1  句子1
sentence2 句子2

舉例

index    sentence1    sentence2
0    Maude and Dora had seen the trains rushing across the prairie, with long, rolling puffs of black smoke streaming back from the engine. Their roars and their wild, clear whistles could be heard from far away. Horses ran away when they came in sight.    Horses ran away when Maude and Dora came in sight.
1    Maude and Dora had seen the trains rushing across the prairie, with long, rolling puffs of black smoke streaming back from the engine. Their roars and their wild, clear whistles could be heard from far away. Horses ran away when they came in sight.    Horses ran away when the trains came in sight.

9.3 評估指標

ACC

10、Diagnostics Main

10.1 概念

Diagnostics Main,分類任務,下載下來的無標籤資料集,任務最接近MultiNLI,提交結果時,應在診斷資料上執行模型的MultiNLI預測變數。官網也有帶標籤的資料集

10.1 資料介紹

資料由數百個句子對組成,它們在兩個方向上都標記了它們的蘊含關係(蘊含,矛盾或中立),並標記了一組與證明蘊含標籤相關的語言現象。它是由GLUE的作者手動構建的,並且從幾種不同的來源中提取文字,包括新聞,學術和百科全書文字以及社交媒體。句子對經過精心設計,使得一對句子中的每個句子都非常相似,從而使依賴簡單詞彙提示和統計資訊的系統的問題更加棘手。

資料集每一行包含3個製表符(‘\t’)分隔的列。

index 資料集索引
sentence1  句子1
sentence2 句子2

舉例

index    sentence1    sentence2
0    The cat sat on the mat.    The cat did not sit on the mat.
1    The cat did not sit on the mat.    The cat sat on the mat.
2    When you've got no snow, it's really hard to learn a snow sport so we looked at all the different ways I could mimic being on snow without actually being on snow.    When you've got snow, it's really hard to learn a snow sport so we looked at all the different ways I could mimic being on snow without actually being on snow.
3    When you've got snow, it's really hard to learn a snow sport so we looked at all the different ways I could mimic being on snow without actually being on snow.    When you've got no snow, it's really hard to learn a snow sport so we looked at all the different ways I could mimic being on snow without actually being on snow.

10.3 評估指標

MCC