BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

阿新 • • 發佈：2022-01-11

本文是對BERT本文的翻譯和名詞透析

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova (Google AI Language)

Abstract

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pre-train deep bidirectional representations from the unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.
BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 points absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 points absolute improvement).

名詞透析

Transformer: 一種語言表示模型
empirically: by means of observation or experience rather than theory or pure logic.

Introduction

ChangeLog

2022/1/10 20:32 未完待續……

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

本文是對BERT本文的翻譯和名詞透析 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

關於Training deep neural networks for binary communication with the Whetstone method的程式碼實現

技術標籤：文獻閱讀脈衝神經網路 GitHub網址如下： https://github.com/SNL-NERL/Whetstone/blob/master/examples/adaptive_mnist.py 實現過程中解決的問題： 1.Ubuntu下，python+TensorFlow+Keras版本問題經檢

論文筆記1：Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Kaleido-BERT 引入了一種新穎的 kaleido 策略，基於transformer的時尚領域跨模態表示。同時設計了一種 alignment guided masking 策略，使模型更加關注影象-文字之間的語義關係。模型採用 NLP 中標準的 transformer

[論文理解]An artificial intelligence-based deep learning algorithm for the diagnosis of diabetic neuropathy using corneal confocal microscopy: a development and validation study

基於人工智慧的角膜共焦顯微鏡診斷糖尿病神經病變的深度學習演算法：開發和驗證研究，2019

2017-Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Key Gradient Descent+TRPO+policy Gradient 訓練模型的初始引數，模型在新任務上只需引數通過一個或多個用新任務的少量資料計算的梯度步驟更新後，就可以最大的效能。而不是通過大量的新任務重新學習，而是調整學習

論文解讀（GCC）《GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training》

論文資訊論文標題：GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training論文作者：Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, Jie Tang論文來

《The Design of a Practical System for Fault-Tolerant Virtual Machines》論文總結

VM-FT 論文總結說明：本文為論文《The Design of a Practical System for Fault-Tolerant Virtual Machines》的個人總結，難免有理解不到位之處，歡迎交流與指正。

《The Design of a Practical System for Fault-Tolerant Virtual Machines》論文研讀

VM-FT 論文研讀說明：本文為論文《The Design of a Practical System for Fault-Tolerant Virtual Machines》的個人理解，難免有理解不到位之處，歡迎交流與指正。

深度學習論文翻譯解析（九）：Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

論文標題：Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition　　　　　　標題翻譯：用於視覺識別的深度卷積神經網路中的空間金字塔池

Very Deep Convolutional Networks for Large-Scale Image Recognition-VGGNet解讀

作者：HYH 日期：2020-9-10 論文期刊：ICLR2015 標籤：VGG 論文：《Very Deep Convolutional Networks for Large-Scale Image Recognition》

[CVPR 2020] 3DRegNet: A Deep Neural Network for 3D Point Registration

零、概要論文: 3DRegNet: A Deep Neural Network for 3D Point Registrationtag: CVPR 2020; Registration程式碼: https://github.com/3DVisionISR/3DRegNet作者: G. Dias Pais, Srikumar Ramalingam, Ven

論文閱讀筆記《RelationNet2: Deep Comparison Columns for Few-Shot Learning》

小樣本學習&元學習經典論文整理||持續更新核心思想本文提出一種基於度量學習的小樣本學習演算法（DCN），從論文的題目能夠看出本文是基於RelationNet進行改進的，但與RelationNet相比本文有許多新

Deep Residual Learning for Image Recognition 筆記

轉載於部落格 http://blog.csdn.net/cv_family_z/article/details/50328175 http://blog.csdn.net/u014114990/article/details/50505331

The understand of modular Multimodal Architecture for Document Classifification

一、Text Extraction the main way: We utilize the open source16 Tesseract OCR engine17 to extract text from all images in the RVL-CDIP dataset.We use the the combined legacy/LSTM engine (oem 3

讀書筆記-多工學習-A Novel Multi-task Deep Learning Model for Skin Lesion Segmentation and Classification

一篇2017年的論文，A Novel Multi-task Deep Learning Model for Skin Lesion Segmentation and Classification，基於多工學習的面板病變分割與分類。

BART: Denoising Sequence-to-Sequence Pre-training翻譯

摘要我們介紹了BART，一種用於預訓練序列到序列模型的去噪自編碼器。通過（1）使用任意噪聲函式來對文字進行加噪，並（2）學習模型以重建原始文字來訓練BART。它使用基於標準Tranformer的神經機器翻譯架構

Synchronous Bidirectional Inference for Neural Sequence Generation

abstract：目前seq2seq任務大多是從左到右一個詞一個詞生成的神經網路的方法，比如LSTM或者self-attention，可以充分利用歷史資訊，但是不能利用未來資訊（future information），從而導致結果的不平衡（

論文閱讀筆記《Deep Active Learning for Civil Infrastructure Defect Detection and Classification》

小樣本學習&元學習經典論文整理||持續更新核心思想本文提出一種基於主動學習的民用設施缺陷檢測方法，其思路主要是考慮到在樣本較少的情況下，訓練得到的網路可能不能很好的對各種型別的缺陷都進

殘差網路：《Deep Residual Learning for Image Recognition》

殘差網路：《Deep Residual Learning for Image Recognition》摘要：網路結構深度的表達對視覺識別任務而言至關重要，論文提出了一種殘差網路結構塊，使得網路的準確度能夠隨著深度的加深而升高。網路結構

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

目錄問題環境配置解決過程總結問題在用pytorch跑生成對抗網路的時候，出現錯誤Runtime Error: one of the variables needed for gradient computation has been modified by an inplace operation，特記錄排坑記錄

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Abstract

名詞透析

Introduction

ChangeLog

相關推薦