1. 程式人生 > 其它 >Extracting COVID-19 diagnoses and symptoms from clinical text: A new annotated corpus and neural event extraction framework

Extracting COVID-19 diagnoses and symptoms from clinical text: A new annotated corpus and neural event extraction framework


Extracting COVID-19 diagnoses and symptoms from clinical text: A new annotated corpus and neural event extraction framework


We introduce a span-based event extraction model that jointly extracts all annotated phenomena, achieving high performance in identifying COVID-19 and symptom events with associated assertion values (0.83-0.97 F1 for events and 0.73-0.79 F1 for assertions). Our span-based event extraction model outperforms an extractor built on MetaMapLite for the identification of symptoms with assertion values. In a secondary use application, we predicted COVID-19 test results using structured patient data (e.g. vital signs and laboratory results) and automatically extracted symptom information, to explore the clinical presentation of COVID-19. Automatically extracted symptoms improve COVID-19 prediction performance, beyond structured data alone.

我們引入了一個基於跨領域的事件提取模型,該模型聯合提取所有標註的現象,在使用相關的斷言值(事件為0.83-0.97 F1,斷言為0.73-0.79 F1)識別COVID-19和症狀事件方面實現了高效能。我們基於跨事件提取模型在使用斷言值識別症狀方面優於基於MetaMapLite構建的提取器。在二次使用應用中,我們使用結構化的患者資料(如生命體徵和實驗室結果)預測COVID-19檢測結果,並自動提取症狀資訊,以探索COVID-19的臨床表現。自動提取的症狀改善了COVID-19預測效能,不僅僅是結構化資料。


MetaMap is a widely used named entity recognition tool that identifies concepts from the Unified Medical Language System Metathesaurus in text. This study presents MetaMap Lite, an implementation of some of the basic MetaMap functions in Java. On several collections of biomedical literature and clinical text, MetaMap Lite demonstrated real-time speed and precision, recall, and F1 scores comparable to or exceeding those of MetaMap and other popular biomedical text processing tools, clinical Text Analysis and Knowledge Extraction System (cTAKES) and DNorm.

MateMapLite是一種廣泛使用的命名實體識別工具,用於以文字形式識別來自統一醫學語言系統元同義詞典的概念。本研究介紹了MetaMap Lite,它是用Java實現的一些基本MetaMap功能。
在一些生物醫學文獻和臨床文字的收集中,MetaMap Lite顯示了實時的速度和精度、召回率和F1分數,與MetaMap和其他流行的生物醫學文字處理工具、臨床文字分析和知識提取系統(cTAKES)和DNorm相當或超過。



Two major aims

1) to describe the presence,character, and changes in symptoms associated with clinical conditions, where delays or misdiagnoses occur in clinical practice and impact patient outcomes (e.g. infectious diseases, cancer) , and

2) to provide a more efficient and cost-effective mechanism to validate clinical prediction rules previously derived from large prospective cohort studies .



Annotated Corpora-註釋全集

CACT( COVID-19 Annotated Clinical Text)

This work presents a new corpus of clinical text annotated for COVID-19, referred to as the COVID-19 Annotated Clinical Text (CACT) Corpus. CACT consists of 1,472 notes from the University of Washington (UW) clinical repository with detailed event-based annotations for COVID-19 diagnosis, testing, and symptoms.

We used these structured fields to assign aCOVID-19 Testlabel describing COVID-19 polymerase chain reaction (PCR) testing to each note based on patient test status within the UW system (no data external to UW was used):
• none: patient testing information is not available
• positive: patient will have at least one future positive test
• negative: patient will only have future negative tests



• none:無法獲得患者的檢測資訊

• positive:病人將來至少會有一次檢測呈陽性

• negative:病人以後的檢測結果只有陰性

Annotation Scheme

Each event includes a trigger that identifies and anchors the event and arguments that characterize the event. The annotation scheme includes two types of arguments:labeled argumentsand span-only arguments.Labeled arguments(e.g.Assertion) include an argument span, type, and subtype (e.g. present). The subtype label normalizes the span information to a fixed set of classes and allows the extracted information to be directly used in secondary use applications.Span-only arguments(e.g.Characteristics) include an argument span and type but do not include a subtype label, because the argument information is not easily mapped to a fixed set of classes.





Annotation Scoring and Evaluation

Trigger:Triggers,Ti, are represented by a pair (event type,ei; token indices,xi). Trigger equivalence is defined as

Arguments:Events are aligned based on trigger equivalence. The arguments of events with equivalent triggers are compared using different criteria for labeled arguments and span-only arguments.

Labeled arguments,Li, are represented as a triple (argument type,ai; token indices,xi; subtype,li). For labeled arguments capture the salient information and equivalence is defined as

Span-only arguments with equivalent triggers and argument types, are compared at the token-level (rather than the span-level) to allow partial matches.





For Symptom events, the trigger identifies the specific symptom, for example “wheezing” or “fever,” which is characterized through Assertion,Change, Severity,Anatomy,Characteristics,Duration, and Frequency arguments. Symptoms were annotated for all conditions/diseases, not just COVID-19. Notes were annotated using the BRA T annotation tool. Figure 1 presents BRA T annotation examples.


BRAT 是一個基於 web 的文字標註工具, 主要用於對文字的結構化標註, 用 BRAT 生成的標註結果能夠把無結構化的原始文字結構化, 供計算機處理。利用該工具可以方便的獲得各項 NLP 任務需要的標註語料。提供【實體】【關係】【事件】【屬性】四種類型的自定義文字標註, 可以在單詞、句子或任何粒度的文字上進行標註, 滿足大多數有監督 NLP 任務需要 。

改進: 大多數先前的醫療問題提取工作(包括症狀提取)都側重於確定具體問題、對提取的現象進行規範化和預測斷言值(例如,存在還是不存在)。這種方法忽略了許多臨床醫生記錄的症狀細節,這些細節構成了許多臨床筆記的核心。症狀細節描述變化(如改善、惡化、缺乏變化)、嚴重程度(如強度和對日常活動的影響)、特殊特徵(如生產性、乾性或因咳嗽而吠叫)和位置。我們假設這種症狀粒度對於許多臨床情況是需要的,以提高及時診斷和驗證診斷預測規則。

Annotation Statistics

CACT includes 1,472 notes with a 70%/30% train/test split and 29.9K annotated events (5.4K COVID and 24.4K Symptom).

The hypothetical subtype applies to sentences like, “She is mildly concerned about the coronavirus” and “She cancelled nexplanon replacement due to COVID-19.”

The possible subtype applies to sentences like, “risk of Covid exposure” and “Concern for respiratory illness (including COVID-19 and influenza).”





The extracted symptoms in Figure 4 were manually normalized to aggregate different extracted spans with similar meanings (e.g. “sob” and “short of breath”→“shortness of breath”; “febrile” and “fevers”→“fever”).


Annotator Agreement


Event Extraction

1. Methods

Event extraction tasks typically require prediction of the following event phenomena:

• trigger span identification
• trigger type (event type) classification
• argument span identification
• argument type/role classification

The CACT annotation scheme differs from this configuration in that labeled arguments require the argument type (e.g.Assertion) and the subtype (e.g.present,absent, etc.) to be predicted.


We implement a span-based, end-to-end, multi-layer event extraction model that jointly predicts all event phenomena, including the trigger span, event type, and argument spans, types, and subtypes.


每個輸入的句子由若干個標記組成,,n 是標記的數量。

每個句子的所有可能的span集合是列舉的 ,m 是標記長度小於等於 M 個標記的跨度數量。

對於每個在 S 中的 span, 這個模型都會生成觸發器和引數預測,並且從每個span預測中會預測每對觸發器和引數來生成事件。

Input encoding:使用 Bio+Clinical BERT 對每個句子對映到上下文詞嵌入,然後將其交付給 bi-LSTM , bi-LSTM 有個隱藏大小 vh, 向前和向後的狀態分別標記為 和 , 然後連線成向量,t 代表標記位置。

Span representation:每個 span 都用 bi-LSTM 隱藏狀態的注意力權值之和表示。對於不同的注意力機制 c ,用觸發器,每個標記引數和實現,對於單個的注意力機制還要有所有的span-only引數實現。 ,1 代表觸發器, 4 代表標記資料, 1代表 span-only引數。




對於在標記位置 t 處的 span representation C 的注意力得分計算方式為:, 是經過學習得到的向量。

對於 span representation C, span i , and token position t , 標準化的注意力權重計算方法是 ,其中 和 是 span i 起始標記下標和結束標記下標。

對於每個 span i 的 span representation C 被 bi-LSTM 隱藏狀態的注意力權重和計算 。

Span prediction: Span representation 類似,不同的 span 分類器 c 會用觸發器和每個標記引數實現,單個 span 分類器預測所有的 span-only 引數, ,1 代表觸發器, 4 代表標記資料, 1代表 span-only引數。

對於 span i 和 classifier c 的標記得分計算為 , 函式會產出一個標記得分向量, 是一個非線性投影。



觸發器預測標籤集 , 不同的分類器用於每個標記引數 (Assertion,Change,Severity, and Test Status) 的標記集合, 例如,對於單個分類器預測所有的 span-only 引數的標記集 。

Argument role prediction: 引數角色層使用單獨的二進位制分類器 d 來預測,將引數分配給觸發器, 4 代表標記引數,1 代表span-only引數。

對於觸發器 j 和 引數 k 的引數角色得分使用引數角色分類器 d 來計算 ,該函式會產出一個大小為2的向量。

Span pruning: 為了控制成對發生的引數角色預測的時間和空間複雜度,在此期間每個span分類器只考慮頂端 K 個 span,同時 span得分計算為最大的標記分數,除了 null 標記分數。

2.Model Configuration

訓練集上採用三倍交叉驗證 3-fold cross validation (CV) ,訓練損失通過對所有span和引數角色分類器的交叉熵進行求和來計算。模型是使用Python PyTorch模組實現的。

在訓練過程中,經常會出現過擬合問題,模型在驗證資料中的評估常用的是交叉驗證,又稱迴圈驗證,它將原始資料分成K組(K-Fold),將每個子集資料分別做一次驗證集,其餘的K-1組子集資料作為訓練集,這樣會得到K個模型。這K個模型分別在驗證集中評估結果,最後的誤差MSE(Mean Squared Error)加和平均就得到交叉驗證誤差。交叉驗證有效利用了有限的資料,並且評估結果能夠儘可能接近模型在測試集上的表現,可以做為模型優化的指標使用。



可以看出驗證集和測試集得出的結果相似,相比較於MetaMapLite++ 來說黃金斷言標籤#Gold,精確度P,召回率R,F1(micro-average)都提高不少。但在 change, Severity, Characteristics, Duration,and Frequency這些子類上相對於assert表現較差。


