kaggle 入門系列翻譯(四) RSNA 肺炎預測
上述是官方提供的一個教學,點進去之後共有四個章節,本文先翻譯第一個章節:
針對使用深度學習進行醫療影象識別
第一課:胸部和腹部x光的分類
這是對用於醫學影象分類的實用機器學習的高階介紹。本教程的目標是建立一個深度學習分類器來精確區分胸部和腹部x光。該模型使用從Open-i中獲得的75幅影象去識別影象進行訓練。
使用MD.ai註釋器檢視DICOM影象,並建立影象級別註釋。然後使用MD.ai python客戶端庫下載影象和註釋,準備資料集,然後用於訓練模型進行分類。
課程目錄如下:
-
Lesson 1. Classification of chest vs. adominal X-rays using TensorFlow/Keras
-
Lesson 2. Lung X-Rays Semantic Segmentation using UNets. Github Annotator
-
Lesson 3. RSNA Pneumonia detection using Kaggle data format Github Annotator
-
Lesson 3. RSNA Pneumonia detection using MD.ai python client library Github Annotator
首先安裝 mdai 模組:
pip install mdai
建立一個mdai客戶端
mdai客戶機需要一個訪問令牌,它將您驗證為使用者。要建立新的令牌或選擇現有令牌,請在指定的MD.ai域中導航到使用者設定頁面上的“Personal Access token”選項卡(例如,public.md.ai)
mdai_client = mdai.Client(domain='public.md.ai', access_token="")
建立專案
通過傳遞專案id來定義您可以訪問的專案。專案id可以在URL中找到,格式如下:https://public.md.ai/annotator/project/{project_id}。
例如,project_id為PVq9raBJ (https://public.md.ai/annotator/project/PVq9raBJ)。
指定可選路徑作為資料目錄(如果留空,將預設為當前工作目錄)。
p = mdai_client.project('PVq9raBJ', path='./lesson1-data')
設定id
為了準備資料集,選定的標籤id必須由專案#set_label_ids方法顯式地設定。
p.show_label_groups()
# this maps label ids to class ids as a dict obj
labels_dict = {'L_38Y7Jl':0, # Abdomen
'L_z8xEkB':1, # Chest
}
print(labels_dict)
p.set_labels_dict(labels_dict)
Label Group, Id: G_3lv, Name: Default group Labels: Id: L_38Y7Jl, Name: Abdomen Id: L_z8xEkB, Name: Chest
{'L_38Y7Jl': 0, 'L_z8xEkB': 1}
建立訓練集和測試集
p.show_datasets()
# create training dataset
train_dataset = p.get_dataset_by_name('TRAIN')
train_dataset.prepare()
train_image_ids = train_dataset.get_image_ids()
print(len(train_image_ids))
# create the validation dataset
val_dataset = p.get_dataset_by_name('VAL')
val_dataset.prepare()
val_image_ids = val_dataset.get_image_ids()
print(len(val_image_ids))
Datasets: Id: D_8ogmzN, Name: TRAIN Id: D_OoJ98E, Name: VAL Id: D_8oAvmQ, Name: TEST
65 10
展示部分圖片
# visualize a few train images
mdai.visualize.display_images(train_image_ids[:2], cols=2)
mdai.visualize.display_images(val_image_ids[:2], cols=2)
使用keras進行訓練和驗證
from keras import applications
from keras.models import Model, Sequential
from keras.layers import Dropout, Flatten, Dense, GlobalAveragePooling2D
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping, ModelCheckpoint
# Define model parameters
img_width = 192
img_height = 192
epochs = 20
params = {
'dim': (img_width, img_height),
'batch_size': 5,
'n_classes': 2,
'n_channels': 3,
'shuffle': True,
}
base_model = applications.mobilenet.MobileNet(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
model_top = Sequential()
model_top.add(GlobalAveragePooling2D(input_shape=base_model.output_shape[1:], data_format=None))
model_top.add(Dense(256, activation='relu'))
model_top.add(Dropout(0.5))
model_top.add(Dense(2, activation='softmax'))
model = Model(inputs=base_model.input, outputs=model_top(base_model.output))
model.compile(optimizer=Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08,decay=0.0),
loss='categorical_crossentropy', metrics=['accuracy'])
from mdai.utils import keras_utils
train_generator = keras_utils.DataGenerator(train_dataset, **params)
val_generator = keras_utils.DataGenerator(val_dataset, **params)
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
# Set callback functions to early stop training and save the best model so far
callbacks = [
EarlyStopping(monitor='val_loss', patience=2, verbose=2),
ModelCheckpoint(filepath='best_model.h5', monitor='val_loss',
save_best_only=True, verbose=2)
]
history = model.fit_generator(
generator=train_generator,
epochs=epochs,
callbacks=callbacks,
verbose=1,
validation_data=val_generator,
use_multiprocessing=True,
workers=6)
import matplotlib.pyplot as plt
print(history.history.keys())
plt.figure()
plt.plot(history.history['acc'], 'orange', label='Training accuracy')
plt.plot(history.history['val_acc'], 'blue', label='Validation accuracy')
plt.plot(history.history['loss'], 'red', label='Training loss')
plt.plot(history.history['val_loss'], 'green', label='Validation loss')
plt.legend()
plt.show()
dict_keys(['val_loss', 'val_acc', 'loss', 'acc'])
建立測試集
model.load_weights('best_model.h5')
test_dataset = p.get_dataset_by_name('TEST')
test_dataset.prepare()
import numpy as np
#from skimage.transform import resize
from PIL import Image
for image_id in test_dataset.image_ids:
image = mdai.visualize.load_dicom_image(image_id, to_RGB=True)
image = Image.fromarray(image)
image = image.resize((img_width, img_height))
x = np.expand_dims(image, axis=0)
y_prob = model.predict(x)
y_classes = y_prob.argmax(axis=-1)
title = 'Pred: ' + test_dataset.class_id_to_class_text(y_classes[0]) + ', Prob:' + str(round(y_prob[0][y_classes[0]], 3))
plt.figure()
plt.title(title)
plt.imshow(image)
plt.axis('off')
plt.show()