1. 程式人生 > >使用FCN做影象語義分割(實踐篇)

使用FCN做影象語義分割(實踐篇)

http://blog.csdn.net/gavin__zhou/article/details/52142696

FCN原理

原理我已經在上篇部落格說過,大家可以參考FCN原理篇

程式碼

FCN有官方的程式碼,具體地址是FCN官方程式碼
不過我用的不是這個程式碼,我用的是別人修改官方的版本的程式碼,使用Chainer框架實現的,Chainer的原始碼連結:
Chainer框架原始碼,如果大家使用過Keras的話,應該對它不會感到特別的陌生,Chainer: a neural network framework

好了,我使用的程式碼是FCN的Chainer implementation, 具體地址是

FCN Chainer implementation

安裝

安裝很簡單,直接pip或者原始碼安裝都可以,但是我在我的機器上裝過幾次,發現使用pip的方式最後fcn.data_dir這個變數的值會指向到你係統的Python下的dist-packages這個目錄,但是這個目錄需要root許可權,所以不推薦使用pip直接安裝的方式; 關於此問題的說明見:
fcn.data_dir的問題

所以我最後使用的是原始碼安裝的方式,這裡推薦使用virtualenv工具建立虛擬環境,實踐中發現這是最不會出錯的方式,推薦使用!

clone程式碼

使用virtualenv安裝

sudo pip install virtualenv #安裝virtualenv
建立虛擬目錄
virtualenv test-fcn
cd test-fcn
啟用虛擬環境
source ./bin/activate
克隆fcn程式碼
git clone

https://github.com/wkentaro/fcn.git –recursive
cd fcn
安裝fcn
python setup.py develop

demo

下載VOC2012資料集,放入fcn-data-pascal-VOC2012路徑下
1. 轉換caffe model為Chainer model
./scripts/caffe_to_chainermodel.py
2. load model,進行分割
./scripts/fcn_forward.py –img-files data/pascal/VOC2012/JPEGImages/2007_000129.jpg

訓練自己的資料

這個前後搞了快一個月,才把最終的訓練搞定,其中艱辛很多,在這裡寫出來供大家參考

準備自己的資料集

資料集做成VOC2012segementClass的樣子,下圖是示例,上面一張是原圖,下面一張是分割圖

ori
seg

但是每一種label指定的物體都有對應的具體的顏色,這個我們犯了很多錯,最後跟蹤程式碼找出來的,具體的每一類的RGB值如下:

Index RGB值
0 (0,0,0)
1 (0,128,0)
2 (128,128,0)
3 (0,0,128)
4 (128,0,128)
5 (0,128,128)
6 (128,128,128)
7 (64,0,0)
8 (192,0,0)
9 (62,128,0)
10 (192,128,0

這裡只列出10類的值,更多類的可以看下面這段程式碼:

<code class="language-python hljs  has-numbering"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">bitget</span><span class="hljs-params">(byteval, idx)</span>:</span>
    <span class="hljs-keyword">return</span> ((byteval & (<span class="hljs-number">1</span> << idx)) != <span class="hljs-number">0</span>)


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">labelcolormap</span><span class="hljs-params">(N=<span class="hljs-number">256</span>)</span>:</span>
    cmap = np.zeros((N, <span class="hljs-number">3</span>))  <span class="hljs-comment">#N是類別數目</span>
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> xrange(<span class="hljs-number">0</span>, N):
        id = i
        r, g, b = <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>
        <span class="hljs-keyword">for</span> j <span class="hljs-keyword">in</span> xrange(<span class="hljs-number">0</span>, <span class="hljs-number">8</span>):
            r = np.bitwise_or(r, (bitget(id, <span class="hljs-number">0</span>) << <span class="hljs-number">7</span>-j))
            g = np.bitwise_or(g, (bitget(id, <span class="hljs-number">1</span>) << <span class="hljs-number">7</span>-j))
            b = np.bitwise_or(b, (bitget(id, <span class="hljs-number">2</span>) << <span class="hljs-number">7</span>-j))
            id = (id >> <span class="hljs-number">3</span>)
        cmap[i, <span class="hljs-number">0</span>] = r
        cmap[i, <span class="hljs-number">1</span>] = g
        cmap[i, <span class="hljs-number">2</span>] = b
    cmap = cmap.astype(np.float32) / <span class="hljs-number">255</span> <span class="hljs-comment">#獲得Cmap的RGB值</span>
    <span class="hljs-keyword">return</span> cmap

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_label_rgb_to_32sc1</span><span class="hljs-params">(self, label_rgb)</span>:</span>
        <span class="hljs-keyword">assert</span> label_rgb.dtype == np.uint8
        label = np.zeros(label_rgb.shape[:<span class="hljs-number">2</span>], dtype=np.int32)
        label.fill(-<span class="hljs-number">1</span>)
        cmap = fcn.util.labelcolormap(len(self.target_names)) 
        cmap = (cmap * <span class="hljs-number">255</span>).astype(np.uint8)  <span class="hljs-comment">#轉換為整數值</span>
        <span class="hljs-keyword">for</span> l, rgb <span class="hljs-keyword">in</span> enumerate(cmap):
            mask = np.all(label_rgb == rgb, axis=-<span class="hljs-number">1</span>)
            label[mask] = l
        <span class="hljs-keyword">return</span> label</code><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li><li>26</li><li>27</li><li>28</li><li>29</li><li>30</li></ul><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li><li>26</li><li>27</li><li>28</li><li>29</li><li>30</li></ul>

按照此顏色表做圖就沒有問題,程式碼可以正確的讀取分割的ground-truth結果
原始的影象放在fcn/data/pascal/VOC2012/JPEGImages
分割的影象放在fcn/data/pascal/VOC2012/SegmentationClass

之後在fcn/data/pascal/VOC2012/ImageSets/Segmentationtrain.txt,trainval.txt,val.txt,寫入需要進行相應任務的圖片的編號

修改程式碼

  1. fcn/scripts/fcn_train.py
<code class="language-python hljs  has-numbering"><span class="hljs-comment"># setup optimizer</span>
    optimizer = O.MomentumSGD(lr=<span class="hljs-number">1e-10</span>, momentum=<span class="hljs-number">0.99</span>) <span class="hljs-comment">#這裡的lr一定要小,大的話程式會報錯,我使用的是1e-9</span>
    optimizer.setup(model)

    <span class="hljs-comment"># train</span>
    trainer = fcn.Trainer(
        dataset=dataset,
        model=model,
        optimizer=optimizer,
        weight_decay=<span class="hljs-number">0.0005</span>,
        test_interval=<span class="hljs-number">1000</span>,
        max_iter=<span class="hljs-number">100000</span>,
        snapshot=<span class="hljs-number">4000</span>,
        gpu=gpu,
    )</code><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li></ul><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li></ul>
  1. fcn/fcn/pascal.py
<code class="language-python hljs  has-numbering">target_names = np.array([
        <span class="hljs-string">'background'</span>,
        <span class="hljs-string">'aeroplane'</span>,
        <span class="hljs-string">'bicycle'</span>,
        <span class="hljs-string">'bird'</span>,
        <span class="hljs-string">'boat'</span>,
        <span class="hljs-string">'bottle'</span>,
        <span class="hljs-string">'bus'</span>,
        <span class="hljs-string">'car'</span>,
        <span class="hljs-string">'cat'</span>,
        <span class="hljs-string">'chair'</span>,
        <span class="hljs-string">'cow'</span>,
        <span class="hljs-string">'diningtable'</span>,
        <span class="hljs-string">'dog'</span>,
        <span class="hljs-string">'horse'</span>,
        <span class="hljs-string">'motorbike'</span>,
        <span class="hljs-string">'person'</span>,
        <span class="hljs-string">'potted plant'</span>,
        <span class="hljs-string">'sheep'</span>,
        <span class="hljs-string">'sofa'</span>,
        <span class="hljs-string">'train'</span>,
        <span class="hljs-string">'tv/monitor'</span>,
    ]) <span class="hljs-comment">#修改成自己的,記得按照顏色表寫</span>
</code><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li></ul><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li></ul>
  1. fcn/fcn/util.py
<code class="language-python hljs  has-numbering"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">resize_img_with_max_size</span><span class="hljs-params">(img, max_size=<span class="hljs-number">500</span>*<span class="hljs-number">500</span>)</span>:</span>  <span class="hljs-comment">#修改max_size,按照實際寫</span>
    <span class="hljs-string">"""Resize image with max size (height x width)"""</span>
    <span class="hljs-keyword">from</span> skimage.transform <span class="hljs-keyword">import</span> rescale
    height, width = img.shape[:<span class="hljs-number">2</span>]
    scale = max_size / (height * width)
    resizing_scale = <span class="hljs-number">1</span>
    <span class="hljs-keyword">if</span> scale < <span class="hljs-number">1</span>:
        resizing_scale = np.sqrt(scale)
        img = rescale(img, resizing_scale, preserve_range=<span class="hljs-keyword">True</span>)
        img = img.astype(np.uint8)
    <span class="hljs-keyword">return</span> img, resizing_scale</code><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li></ul><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li></ul>
  1. fcn/fcn/models/fcn32s.py
<code class="language-python hljs  has-numbering"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, n_class=<span class="hljs-number">21</span>)</span>:</span>  <span class="hljs-comment">#修改類別n_class</span>
        self.n_class = n_class
        super(self.__class__, self).__init__(
            conv1_1=L.Convolution2D(<span class="hljs-number">3</span>, <span class="hljs-number">64</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">100</span>),
            conv1_2=L.Convolution2D(<span class="hljs-number">64</span>, <span class="hljs-number">64</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>),

            conv2_1=L.Convolution2D(<span class="hljs-number">64</span>, <span class="hljs-number">128</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>),
            conv2_2=L.Convolution2D(<span class="hljs-number">128</span>, <span class="hljs-number">128</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>),

            conv3_1=L.Convolution2D(<span class="hljs-number">128</span>, <span class="hljs-number">256</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>),
            conv3_2=L.Convolution2D(<span class="hljs-number">256</span>, <span class="hljs-number">256</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>),
            conv3_3=L.Convolution2D(<span class="hljs-number">256</span>, <span class="hljs-number">256</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>),

            conv4_1=L.Convolution2D(<span class="hljs-number">256</span>, <span class="hljs-number">512</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>),
            conv4_2=L.Convolution2D(<span class="hljs-number">512</span>, <span class="hljs-number">512</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>),
            conv4_3=L.Convolution2D(<span class="hljs-number">512</span>, <span class="hljs-number">512</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>),

            conv5_1=L.Convolution2D(<span class="hljs-number">512</span>, <span class="hljs-number">512</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>),
            conv5_2=L.Convolution2D(<span class="hljs-number">512</span>, <span class="hljs-number">512</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>),
            conv5_3=L.Convolution2D(<span class="hljs-number">512</span>, <span class="hljs-number">512</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>),

            fc6=L.Convolution2D(<span class="hljs-number">512</span>, <span class="hljs-number">4096</span>, <span class="hljs-number">7</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">0</span>),
            fc7=L.Convolution2D(<span class="hljs-number">4096</span>, <span class="hljs-number">4096</span>, <span class="hljs-number">1</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">0</span>),

            score_fr=L.Convolution2D(<span class="hljs-number">4096</span>, self.n_class, <span class="hljs-number">1</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">0</span>),

            upscore=L.Deconvolution2D(self.n_class, self.n_class, <span class="hljs-number">64</span>,
                                      stride=<span class="hljs-number">32</span>, pad=<span class="hljs-number">0</span>),
        )
        self.train = <span class="hljs-keyword">False</span></code><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li><li>26</li><li>27</li><li>28</li><li>29</li><li>30</li></ul><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li><li>26</li><li>27</li><li>28</li><li>29</li><li>30</li></ul>

訓練

./scripts/fcn_train.py

  1. 其會在fcn/data/ 下建立一個目錄叫做SegmentationClassDataset_db,裡面存放訓練的圖片的pickle資料,如果需要修改原始的訓練圖片則需要將此目錄刪除,否則預設讀取此目錄內的pickle資料作為影象的原始資料

  2. 會在fcn下建立snapshot這個目錄,裡面有訓練儲存的model,日誌檔案等,重新訓練的話,建議刪除此目錄

使用自己訓練的model

./scripts/fcn_forward.py -c path/to/your/model -i path/to/your/image
結果存放在fcn/data/forward_out