翻譯 | Improving Distributional Similarity with Lessons Learned from Word Embeddings
翻譯 | Improving Distributional Similarity with Lessons Learned from Word Embeddings
葉娜老師說:“讀懂論文的最好方法是翻譯它”。我認為這是很好的科研訓練,更加適合一個陌生領域的探索。因為論文讀不懂,我總結無非是因為這個領域不熟悉。如果是自己熟悉的領域,那麽讀起來肯定會比較順暢。
原文
摘要
[1] Recent trends suggest that neural-network-inspired word embedding models outperform traditional count-based distributional models on word similarity and analogy detection tasks.
[2] We reveal that much of the performance gains of word embeddings are due to certain system design choices and hyper-parameter optimizations, rather than the embedding algorithms themselves.
[3] Furthermore, we show that these modifications can be transferred to traditional distributional models, yielding similar gains.
[4] In contrast to prior reports, we observe mostly local or insignificant performance differences between the methods, with no global advantage to any single approach over the others.
結論
[1] Recent embedding methods introduce a plethora of design choices beyond network architecture and optimization algorithms.
[2] We reveal that these seemingly minor variations can have a large impact on the success of word representation methods.
[3] By showing how to adapt and tune these hyper-parameters in traditional methods, we allow a proper comparison between representations, and challenge various claims of superiority from the word embedding literature.
(下啟第二段)
[4] This study also exposes the need for more controlled-variable experiments, and extending the concept of “variable” from the obvious task, data, and method to the often ignored preprocessing steps and hyper-parameter settings.
[5] We also stress the need for transparent and reproducible experiments, and commend authors such as Mikolov, Pennington, and others for making their code publicly available.
[6] In this spirit, we make our code available as well.
譯文
摘要
[1] 最近的趨勢表明,神經網絡啟發的嵌入詞模型在詞語相似度和詞語類比檢測任務上優於傳統的基於計數的分布式模型。
[2] 我們發現,詞嵌入的性能提高很大程度上是由於特定系統設計選擇和超參數優化,而不是詞嵌入算法本身(帶來的性能提升)。
[3] 此外,我們還表明,這些修改可以轉移到傳統的分布模型,從而產生類似的增益。
[4] 與之前的報告相比,我們觀察到方法之間主要存在局部或微小的性能差異,與其他方法相比,沒有任何整體優勢。
結論
[1] 最近的嵌入方法引入了過剩的網絡體系結構和優化算法之外的設計選擇。
[2] 我們發現,這些看似微小的變化可能會對單詞表達方法的效果產生很大的影響。
[3] 通過展示如何在傳統方法中采納以及調整這些超參數,我進行了在各種表示方法之間的適當比較,並從詞嵌入文獻中挑戰各種主張。
[4] 這項研究還暴露了對更多可控變量實驗的需求,並將“變量”的概念從明顯的任務、數據和方法擴展到經常被忽略的預處理步驟和超參數設置。
[5] 我們還強調需要透明和可重復的實驗,並贊揚 Mikolov,Pennington 等作者公開提供其代碼。
[6] 本著這種精神,我們也提供了代碼。
感悟
這篇文章是一項對比研究,旨在揭示基於神經網絡的詞表示學習方法所帶來的效果提升,在於超參數的設置,而不是網絡結構的改進。
翻譯 | Improving Distributional Similarity with Lessons Learned from Word Embeddings