Perplexity Intuition (and Derivation)

阿新 • • 發佈：2018-12-28

Perplexity Intuition (and its derivation)

Never be perplexed again by perplexity.

You might have seen something like this in an NLP class:

A slide from the NLP class at U. of Washington that I’m taking with Dr. Luke Zettlemoyer

A slide of CS 124 at Stanford with Dr. Dan Jurafsky

During the class, we don’t really spend time to derive the perplexity. Maybe perplexity is a basic concept that you probably already know? This post is for those who don’t.

In general, perplexity is a measurement of how well a probability model predicts a sample. In the context of Natural Language Processing, perplexity is one way to evaluate language models.

But why is perplexity in NLP defined the way it is?

The perplexity of a discrete probability distribution is defined as:

where H(p) is the entropy of the distribution p(x) and x is a random variable over all possible events.

In the previous post, we derived H(p) from scratch and intuitively showed why entropy is the average number of bits that we need to encode the information. If you don’t understand H(p), please read this⇩before reading further.

Now we agree that H(p) =-Σ p(x) log p(x).

Then, perplexity is just an exponentiation of the entropy!

Yes. Entropy is the average number of bits to encode the information contained in a random variable, so the exponentiation of the entropy should be the total amount of all possible information, or more precisely, the weighted average number of choices a random variable has.

For example, if the average sentence in the test set could be coded in 100 bits, the model perplexity is 2¹⁰⁰ per sentence.

Perplexity Intuition (and Derivation)

Perplexity Intuition (and its derivation)Never be perplexed again by perplexity.You might have seen something like this in an NLP class:A slide from the NL

wyh2000 and pupil

cor other miss 貪心 -i mission spa size foo wyh2000 and pupil Accepts: 93 Submissions: 925 Time Limit: 3000/1500 MS (Java/Other

[Binary Hacking] ABI and EABI

about specific pap spec ica app .debian rpc cati Following are some general papers about ABI and EABI. Entrance https://en.wikiped

《Thinking in Java》 And 《Effective Java》啃起來

大學前言技術數據結構和算法解決一句話定義應該太多的前言　　今天從京東入手了兩本書，《Thinking in Java》(第四版) 和《Effective Java》(第二版)。都可以稱得上是硬書，需要慢慢啃的，預定計劃是在今年前把這兩本書啃完。哈哈，可

The connection to adb is down, and a severe error has occured

真的 findstr ole pla a10 tool fcm ott art 相信不少人在android中都遇到了你下面不好解決的問題：首先描寫敘述癥狀，例如以下圖解決方法：方法1：先在cmd中adb kill-server，然後adb -startser

A - Mike and palindrome

nbsp == 字符串長度 cda problem eal iostream quotes aaa A - Mike and palindrome Mike has a string s consisting of only lowercase English le

code force 798cMike and gcd problem

sub while 並且 cati put ins i++ des 替代 Mike has a sequence A?=?[a1,?a2,?...,?an] of length n. He considers the sequence B?=?[b1,?b2,?...,?b

codeforces 798C Mike and gcd problem

opera can sample pan using str ssl else font C.Mike and gcd problem Mike has a sequence A?=?[a1,?a2,?...,?an] of length n. He cons

HDU 1078 FatMouse and Cheese

blog pan con turn str while fat 記憶 sort 記憶化搜索，$dp$。每一個點走到的最長距離是固定的，也就是只會算一次，那麽記憶化一下即可，也可以按值從小到大排序之後進行$dp$。記憶化搜索： #include <cstd

湘潭邀請賽——Alice and Bob

indicate pro scan printf turn name pap 100% %d Alice and Bob Accepted : 133 Submit : 268 Time Limit : 1000 MS Memory Lim

H - Alyona and Spreadsheet

遞增 title logs example otherwise 個數 quest 數組a sorted H - Alyona and Spreadsheet During the lesson small girl Alyona works with one famous

No result defined for action com.action.Actionxxx and result xxx

jsp 特殊 cti nbsp 方法 def no result 使用 for 報錯：No result defined for action com.action.Actionxxx and result xxx 剛學Struts2不久，寫的第一個Action就遇到這個問

聖杯布局and雙飛翼布局

content 尺寸元素不一定 oot alt ctype 間距 mar *聖杯布局 <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <ti

codeforces_346A Alice and Bob(數學)

win sam deb 兩個 uri ems nal contains statement 題目鏈接:http://codeforces.com/problemset/problem/346/A 參考鏈接:http://blog.csdn.net/loy_184548/ar

python 時間模塊小結（time and datetime）

間隔 -i date對象 per inf ear macbook port 兩個一：經常使用的時間方法 1.得到當前時間使用time模塊，首先得到當前的時間戳 In [42]: time.time() Out[42]: 1408066927.208922 將時間戳轉換

LightOJ 1370 Bi-shoe and Phi-shoe（歐拉函數）

cas 數字 url col div ase style while gin http://lightoj.com/login_main.php?url=volume_showproblem.php?problem=1370 題意：給一些數Ai（第 i 個數），Ai這

Codeforces Round #263 (Div.1) B. Appleman and Tree

ace apple n+1 test right art [0 pan target 題目地址：http://codeforces.com/contest/461/problem/B 題目大意：給一棵樹。每一個點為白色或黑色。切斷一些邊，使得每一個連通塊有且僅有一個黑點

LightOJ 1341 Aladdin and the Flying Carpet（唯一分解定理）

void 都是 scanf esp for space tar sqrt lld http://lightoj.com/volume_showproblem.php?problem=1341 題意：給你矩形的面積（矩形的邊長都是正整數），讓你求最小的邊大於等於b的矩形的個

解決mysql報錯：- Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'information_schema.PROFILING.SEQ'

_for tran contains column schema mysql eat table express mysql執行報錯： - Expression #1 of ORDER BY clause is not in GROUP BY clause and cont

HDU 5054 Alice and Bob（數學）

esp contain before mod see min roc axis factor 題目鏈接：http://acm.hdu.edu.cn/showproblem.php?pid=5054 Problem Description Bob and A

Perplexity Intuition (and Derivation)

Perplexity Intuition (and its derivation)

Never be perplexed again by perplexity.

But why is perplexity in NLP defined the way it is?

Then, perplexity is just an exponentiation of the entropy!

相關推薦