Coursera | Andrew Ng (02-week-2-2.3)—指數加權平均
該系列僅在原課程基礎上部分知識點添加個人學習筆記,或相關推導補充等。如有錯誤,還請批評指教。在學習了 Andrew Ng 課程的基礎上,為了更方便的查閱複習,將其整理成文字。因本人一直在學習英語,所以該系列以英文為主,同時也建議讀者以英文為主,中文輔助,以便後期進階時,為學習相關領域的學術論文做鋪墊。- ZJ
轉載請註明作者和出處:ZJ 微信公眾號-「SelfImprovementLab」
2.3 Exponentially weighted averages (指數加權平均)
(字幕來源:網易雲課堂)
I want to show you a few optimization algorithms.They are faster than gradient descent.In order to understand those algorithms,you need to be able use something called exponentially weighted averages
我想向你展示幾個優化演算法,它們比梯度下降法快,要理解這些演算法,你需要用到指數加權平均,在統計中也叫作指數加權移動平均,我們首先講這個,然後再來講更加複雜的優化演算法,雖然現在我生活在美國,實際上我生於英國倫敦,比如我這兒有去年倫敦的每日溫度,所以 1月1號 溫度是 40 華氏度,我知道世界上大部分地區使用攝氏度,但是美國使用華氏度,相當於 4 攝氏度,在 1 月 2 號是 9攝氏度等等,在年中的時候,一年 365 天 年中就是說,大概 180 天的樣子 也就是 5 月末,溫度是 60 華氏度 也就是 15 攝氏度等等,夏季溫度轉暖 然後冬季降溫。
So, you plot the data you end up with this.Where day one being sometime in January, that you know,being the, beginning of summer,and that’s the end of the year, kind of late December.So, this would be January, January 1,It is the middle of the year approaching summer,and this would be the data from the end of the year.So, this data looks a little bit noisy and if you want to compute the trends,the local average or a moving average of the temperature,here’s what you can do.Let’s initialize V zero equals zero.And then, on every day, we’re going to average it with a weight of 0.9 times whatever appears as value,plus 0.1 times that day temperature.So, data one here would be the temperature from the first day.And on the second day, we’re again going to take a weighted average.0.9 times the previous value plus 0.1 times today’s temperature and so on.Day two plus 0.1 times data three and so on.And the more general formula is V on a given day is 0.9 times V from the previous day,plus 0.1 times the temperature of that day.So, if you compute this and plot it in red,this is what you get.You get a moving average of what’s calledan exponentially weighted average of the daily temperature.
你用資料作圖 可以得到以下結果,起始日在 1 月份,這裡是夏季初,這裡是年末 相當於 12 月末,這裡是 1 月 1 號,年中接近夏季的時候,隨後就是年末的資料,看起來有些雜亂 如果要計算趨勢的話,也就是溫度的區域性平均值 或者說移動平均值,你要做的是,首先使
So, let’s look at the equation we had from the previous slide,it was
看一下上一張幻燈片裡的公式,
Now, let’s try something else.Let’s set beta to be very close to one,let’s say it’s 0.98 .Then, if you look at 1/1 minus 0.98 ,this is equal to 50 .So, this is, you know, think of this as averaging over roughly,the last 50 days temperature.And if you plot that you get this green line.So, notice a couple of things with this very high value of beta.The plot you get is much smoother because you’re now averaging over more days of temperature.So, the curve is just, you know,less wavy is now smoother,but on the flip side the curve has now shifted further to the rightbecause you’re now averaging over a much larger window of temperatures.And by averaging over a larger window,this formula, this exponentially weighted average formula.It adapts more slowly, when the temperature changes.So, there’s just a bit more latency.And the reason for that is when Beta 0.98 then it’sgiving a lot of weight to the previous valueand a much smaller weight just 0.02, to whatever you’re seeing right now.So, when the temperature changes,when temperature goes up or down,there’s exponentially weighted average,just adapts more slowly when beta is so large.
我們來試試別的,將 β 設定成接近 1 的一個值,比如 0.98 ,如果計算1/(1- 0.98 ),答案是 50 ,這就是粗略平均了一下,過去 50 天的溫度,這時作圖可以得到綠線,這個高值 β 要注意幾點,你得到的曲線要平坦一些 原因在於,你多平均了幾天的溫度,所以這個曲線,波動更小 更加平坦,缺點是曲線進一步右移,因為現在平均的溫度值更多,要平均更多的值,指數加權平均公式,在溫度變化時 適應地更緩慢一些,所以會出現一定延遲,因為當 β 等於 0.98 相當於,給前一天地值加了太多權重,只有 0.02 的權重給了當日的值,所以溫度變化時,溫度上下起伏,當 β 較大時,指數加權平均值適應地更慢一些。
Now, let’s try another value.If you set beta to another extreme,let’s say it is 0.5 ,then this by the formula we have on the right.This is something like averaging over just two days temperature,and you plot that you get this yellow line.And by averaging only over two days temperature,you have a much, as if you’re averaging over much shorter window.So, you’re much more noisy,much more susceptible to outliers.But this adapts much more quickly to what the temperature changes.So, this formula is highly implemented, exponentially weighted average.Again, it’s called an exponentially weighted,moving average in the statistics literature.We’re going to call it exponentially weighted average for short andby varying this parameter,or later we’ll see such a hyper parameter if you’re learning algorithm,you can get slightly different effectsand there will usually be some value in between that works best.That gives you the red curve which you know maybe looks likebetter average of the temperature are either the green or the yellow curve.You now know the basics of how to compute exponentially weighted averages.In the next video, let’s get a bit more intuition about what it’s doing.
我們可以再換一個值試一試,如果 β 是另一個極端值,比如說 0.5 ,根據右邊公式,這是平均了兩天的溫度,作圖執行後得到黃線,由於僅平均了兩天的溫度,平均的資料太少,所以得到的曲線有更多的噪聲,更有可能出現異常值,但是這個曲線能夠更快適應溫度變化,所以指數加權平均數經常被使用,再說一次 它在統計學中被稱為,指數加權移動平均值,我們就簡稱為指數加權平均數,通過調整這個引數,或者說後面的演算法學習你會發現這是一個很重要的引數,可以取得稍微不同的效果,往往中間有某個值效果最好, β 為中間值時得到的紅色曲線,比起綠線和黃線更好地平均了溫度,現在你知道計算指數加權平均數的基本原理,下一個視訊中 我們再聊聊它的本質作用。
重點總結:
指數加權平均
指數加權平均的關鍵函式:
下圖是一個關於天數和溫度的散點圖:
- 當
β=0.9 時,指數加權平均最後的結果如圖中紅色線所示; - 當
β=0.98 時,指數加權平均最後的結果如圖中綠色線所示; - 當
β=0.5 時,指數加權平均最後的結果如下圖中黃色線所示;
參考文獻:
PS: 歡迎掃碼關注公眾號:「SelfImprovementLab」!專注「深度學習」,「機器學習」,「人工智慧」。以及 「早起」,「閱讀」,「運動」,「英語 」「其他」不定期建群 打卡互助活動。