分割資料預處理

阿新 • • 發佈：2021-06-12

日常報錯（累~）：

　　小編最近用yolact對BraTS資料集做預測，驗證結果如下：

　　發現ET對於Dice和PPV太小了，根據公式，我一開始以為是模型預測的區域過多導致的。

後面生成圖片觀察：

　　忽然之間，意識到，是自己的target生成錯了。筆者對這個三個區域，首先是採用邊緣提取，獲取邊緣的座標，之後進行一次判斷，只保留點數大於20的區域，所以就可能出現上述圖片的情況。我們真是target是由ET區域（黃色區域），但我最後生成的target是沒有ET區域的。

　　所以再模型預測的階段，也不會預測出ET，造成的PPV值偏小，進而Dice偏小。

如下是我生成邊緣座標（改進後，每個if價格else：continue）的部分程式碼：

f=open('F:/DATA/train_mask4.txt','w')

for r in tqdm(range(len(image_traintest_paths))):
# for r in tqdm(range(4)):
    # 獲取路徑
    a=0
    b=0
    c=0
#     fold_path=mask_train_paths[r]
    fold_path=mask_traintest_paths[r]
    npmask=np.load(fold_path)
    #print(npmask.shape)
    txt_line = ''
    vertices_list_WT 
=[]
    vertices_list_ET=[]
    vertices_list_TC=[]
    # 獲取邊緣
    WT_Label = npmask.copy()
    WT_Label[npmask == 1] = 1.
    WT_Label[npmask == 2] = 1.
    WT_Label[npmask == 4] = 1.
    WT_Label=WT_Label.astype(np.uint8)
    WT_Label=WT_Label*255
    TC_Label = npmask.copy()
    TC_Label[npmask == 1] = 1.
    TC_Label[npmask  
== 2] = 0.
    TC_Label[npmask == 4] = 1.
    TC_Label=TC_Label.astype(np.uint8)
    TC_Label=TC_Label*255
    ET_Label = npmask.copy()
    ET_Label[npmask == 1] = 0.
    ET_Label[npmask == 2] = 0.
    ET_Label[npmask == 4] = 1
    ET_Label=ET_Label.astype(np.uint8)
    ET_Label=ET_Label*255
    # 寫入到text line
    path_forward=fold_path.split('\\')[-1]
    txt_line=txt_line+path_forward+" "
    # 獲取WT邊緣座標
    detected_edges = cv2.Canny(WT_Label, lowThreshold, lowThreshold+10 * ratio, apertureSize=kernel_size)
#     print(detected_edges.shape)
    if SumNum(detected_edges)>20:
        a=1
        for i in range(detected_edges.shape[0]):
            for j in range(detected_edges.shape[1]):
                if detected_edges[i,j] !=0:
                    vertices_list_WT.append([j,i])
        vertices_list_WT=np.array(vertices_list_WT)
        # 獲取WTbounding box
        max_x_WT=np.max(vertices_list_WT[:,0])
        max_y_WT=np.max(vertices_list_WT[:,1])
        min_x_WT=np.min(vertices_list_WT[:,0])
        min_y_WT=np.min(vertices_list_WT[:,1])
        # 新增WT bounding box , vertices
        txt_line=txt_line+ str(min_x_WT)+','+str(min_y_WT)+','+str(max_x_WT)+','+str(max_y_WT)+','+'1'+','
        for i in range(len(vertices_list_WT)):
            txt_line=txt_line+str(vertices_list_WT[i][0])+','
            txt_line=txt_line+str(vertices_list_WT[i][1])+','
        txt_line=txt_line[:-1]+" "
    else:continue
    # 獲取TC邊緣座標
    detected_edges = cv2.Canny(TC_Label, lowThreshold, lowThreshold+10 * ratio, apertureSize=kernel_size)
    if SumNum(detected_edges)>20:
        b=1
        for i in range(detected_edges.shape[0]):
            for j in range(detected_edges.shape[1]):
                if detected_edges[i,j] !=0:
                    vertices_list_TC.append([j,i])
        vertices_list_TC=np.array(vertices_list_TC)
        max_x_TC=np.max(vertices_list_TC[:,0])
        max_y_TC=np.max(vertices_list_TC[:,1])
        min_x_TC=np.min(vertices_list_TC[:,0])
        min_y_TC=np.min(vertices_list_TC[:,1])
        # 新增TC bounding box, vertices
        txt_line=txt_line+ str(min_x_TC)+','+str(min_y_TC)+','+str(max_x_TC)+','+str(max_y_TC)+','+'2'+','
        for i in range(len(vertices_list_TC)):
            txt_line=txt_line+str(vertices_list_TC[i][0])+','
            txt_line=txt_line+str(vertices_list_TC[i][1])+','
        txt_line=txt_line[:-1]+" "
    else:continue
    # 獲取ET邊緣座標
    detected_edges = cv2.Canny(ET_Label, lowThreshold, lowThreshold+10 * ratio, apertureSize=kernel_size)
    if SumNum(detected_edges)>20:
        c=1
        for i in range(detected_edges.shape[0]):
            for j in range(detected_edges.shape[1]):
                if detected_edges[i,j] !=0:
                    vertices_list_ET.append([j,i])
        vertices_list_ET=np.array(vertices_list_ET)
        max_x_ET=np.max(vertices_list_ET[:,0])
        max_y_ET=np.max(vertices_list_ET[:,1])
        min_x_ET=np.min(vertices_list_ET[:,0])
        min_y_ET=np.min(vertices_list_ET[:,1])
        # 新增ET bounding box, vertices
        txt_line=txt_line+ str(min_x_ET)+','+str(min_y_ET)+','+str(max_x_ET)+','+str(max_y_ET)+','+'3'+','
        for i in range(len(vertices_list_ET)):
            txt_line=txt_line+str(vertices_list_ET[i][0])+','
            txt_line=txt_line+str(vertices_list_ET[i][1])+','
    else:continue
    
    txt_line=txt_line[:-1]+'\n'
    if (a==b==c==0):
        continue

    f.write(txt_line)

f.close()

　　不知再次訓練結果會怎樣。。。

分割資料預處理

日常報錯（累~）：　　小編最近用yolact對BraTS資料集做預測，驗證結果如下：

Pytorch 資料載入與資料預處理方式

資料載入分為載入torchvision.datasets中的資料集以及載入自己使用的資料集兩種情況。

pytorch 影象中的資料預處理和批標準化例項

目前資料預處理最常見的方法就是中心化和標準化。中心化相當於修正資料的中心位置，實現方法非常簡單，就是在每個特徵維度上減去對應的均值，最後得到 0 均值的特徵。

pytorch資料預處理錯誤的解決

出錯： Traceback (most recent call last): File \"train.py\",line 305,in <module> train_model(model_conv,criterion,optimizer_conv,exp_lr_scheduler)

python資料預處理方式 :資料降維

資料為何要降維資料降維可以降低模型的計算量並減少模型執行時間、降低噪音變數資訊對於模型結果的影響、便於通過視覺化方式展示歸約後的維度資訊並減少資料儲存空間。因此，大多數情況下，當我們面臨高維資料時，都

python資料預處理 :資料抽樣解析

何為資料抽樣：抽樣是資料處理的一種基本方法，常常伴隨著計算資源不足、獲取全部資料困難、時效性要求等情況使用。

python資料預處理 :資料共線性處理詳解

何為共線性：共線性問題指的是輸入的自變數之間存在較高的線性相關度。共線性問題會導致迴歸模型的穩定性和準確性大大降低，另外，過多無關的維度計算也很浪費時間

python資料預處理 :樣本分佈不均的解決(過取樣和欠取樣)

何為樣本分佈不均：樣本分佈不均衡就是指樣本差異非常大，例如共1000條資料樣本的資料集中，其中佔有10條樣本分類，其特徵無論如何你和也無法實現完整特徵值的覆蓋，此時屬於嚴重的樣本分佈不均衡。

spark | 手把手教你用spark進行資料預處理

本文始發於個人公眾號：TechFlow，原創不易，求個關注今天是spark專題的第七篇文章，我們一起看看spark的資料分析和處理。

python 刪除excel表格重複行,資料預處理操作

使用python刪除excel表格重複行。 # 匯入pandas包並重命名為pd import pandas as pd # 讀取Excel中Sheet1中的資料

Alink漫談(十) ：線性迴歸實現之資料預處理

Alink漫談(十) ：線性迴歸實現之資料預處理目錄 Alink漫談(十) ：線性迴歸實現之資料預處理

數學建模省賽小結：資料預處理（按照關鍵字提取行/列並進行簡單運算）

function []=datapro714()% 處理的資料截止7/14/20% [csvdata,~,rawcsvdata] = xlsread(\'who_covid_19_sit_rep_time_series.csv\');[~,~,rawconfirmed] = xlsread(\'time_series_covid_19_confirmed.csv\');[~,~,r

天池nlp新人賽_task2：資料預處理改進和一些思路

今天想解決下面幾個問題。 1.lightgbm cpu太慢了，我裝了gpu的版本，對比了之後發現訓練速度從10min縮短到8min。感覺很少，不知道是不是我姿勢錯誤。

資料預處理和特徵工程

目錄資料探勘的五大流程資料預處理(preprocessing)資料歸一化資料標準化缺失值處理處理離散型特徵和非數值型標籤處理連續型特徵二值化分箱特徵選擇(feature selection)特徵提取(feature extraction)Filter過濾法方差

情感分析資料預處理過程

# 訓練資料預處理 import numpy as np from sklearn.utils import shuffle import os import matplotlib.pyplot as plt

python有關資料預處理的庫

1、sklearn.preprocessing 資料預處理StandardScaler：如果某個特徵的方差遠大於其它特徵的方差，那麼它將會在演算法學習中佔據主導位置，導致我們的學習器不能像我們期望的那樣，去學習其他的特徵，這將導致最後的模

13-Pandas資料預處理之資料轉換（applymap()、df.map()、df.replace()）

　　在資料分析中，根據需求，有時候需要將一些資料進行轉換，而在Pandas中，實現資料轉換的常用方法有：

13-Pandas資料預處理之資料轉換（啞變數編碼pd.get_dummies()）

說明：本片博文接上篇博文【 Pandas資料預處理之資料轉換（df.map()、df.replace()）】

網站流量日誌分析（模組開發——資料預處理）

目錄資料預處理預處理的程式設計思路問題MapReduce程式設計技巧點選流模型的概述會話（session）程式碼pom.xmllog4j.propertiespreprocess 模組WebLogBeanWebLogMainWebLogMapperpageviews 模組ClickStreamPageViewP

資料預處理總結

1. 離散化　　a) 無序變數離散化—— OneHotEncoder 　　b）有序變數離散化

分割資料預處理

日常報錯（累~）：

相關推薦