車道線檢測LaneNet

阿新 • • 發佈：2020-03-06

# LaneNet - LanNet　 - Segmentation branch　完成語義分割,即判斷出畫素屬於車道or背景 - Embedding branch　完成畫素的向量表示,用於後續聚類,以完成例項分割 - H-Net ## Segmentation branch 解決樣本分佈不均衡　　　車道線畫素遠小於背景畫素.loss函式的設計對不同畫素賦給不同權重,降低背景權重. 該分支的輸出為(w,h,2)． ## Embedding branch loss的設計思路為使得屬於同一條車道線的畫素距離儘量小,屬於不同車道線的畫素距離儘可能大.即Discriminative loss. 該分支的輸出為(w,h,n)．n為表示畫素的向量的維度. ## 例項分割在Segmentation branch完成語義分割,Embedding branch完成畫素的向量表示後,做聚類,完成例項分割. ![](https://img2018.cnblogs.com/blog/583030/202003/583030-20200302112317801-1679623589.png) ## H-net ### 透視變換 to do ### 車道線擬合 LaneNet的輸出是每條車道線的畫素集合，還需要根據這些畫素點回歸出一條車道線。傳統的做法是將圖片投影到鳥瞰圖中，然後使用二次或三次多項式進行擬合。在這種方法中，轉換矩陣H只被計算一次，所有的圖片使用的是相同的轉換矩陣，這會導致坡度變化下的誤差。為了解決這個問題，論文訓練了一個可以預測變換矩陣H的神經網路HNet，網路的輸入是圖片，輸出是轉置矩陣H。之前移植過Opencv逆透視變換矩陣的原始碼，裡面轉換矩陣需要8個引數，這兒只給了6個引數的自由度，一開始有些疑惑，後來仔細閱讀paper，發現作者已經給出瞭解釋，是為了對轉換矩陣在水平方向上的變換進行約束。 ## 程式碼分析 ``` binary_seg_image, instance_seg_image = sess.run( [binary_seg_ret, instance_seg_ret], feed_dict={input_tensor: [image]} ) ``` 輸入(１,256,512,3)　輸出binary_seg_image:(1, 256, 512) instance_seg_image:(1, 256, 512, 4) ### 完成畫素級別的分類和向量表示 class LaneNet的inference分為兩步．　　第一步提取分割的特徵,包括了用於語義分割的特徵和用以例項分割的特徵. ``` python class LaneNet(cnn_basenet.CNNBaseModel): def inference(self, input_tensor, name): """ :param input_tensor: :param name: :return: """ with tf.variable_scope(name_or_scope=name, reuse=self._reuse): # first extract image features extract_feats_result = self._frontend.build_model( input_tensor=input_tensor, name='{:s}_frontend'.format(self._net_flag), reuse=self._reuse ) #得到一個字典,包含了用於語義分割的feature map和用於例項分割的feature map． #binary_segment_logits　(1,256,512,2) 2是類別數目.即車道/背景． #instance_segment_logits　(1,256,512,64) 用以後面再做卷積為每個畫素生成一個向量表示 print('features:',extract_feats_result) # second apply backend process binary_seg_prediction, instance_seg_prediction = self._backend.inference( binary_seg_logits=extract_feats_result['binary_segment_logits']['data'], instance_seg_logits=extract_feats_result['instance_segment_logits']['data'], name='{:s}_backend'.format(self._net_flag), reuse=self._reuse ) if not self._reuse: self._reuse = True return binary_seg_prediction, instance_seg_prediction ``` 第一步得到的features如下: ``` features : OrderedDict([('encode_stage_1_share', {'data': , 'shape': [1, 256, 512, 64]}), ('encode_stage_2_share', {'data': , 'shape': [1, 128, 256, 128]}), ('encode_stage_3_share', {'data': , 'shape': [1, 64, 128, 256]}), ('encode_stage_4_share', {'data': , 'shape': [1, 32, 64, 512]}), ('encode_stage_5_binary', {'data': , 'shape': [1, 16, 32, 512]}), ('encode_stage_5_instance', {'data':

, 'shape': [1, 16, 32, 512]}), ('binary_segment_logits', {'data': , 'shape': [1, 256, 512, 2]}), ('instance_segment_logits', {'data': , 'shape': [1, 256, 512, 64]})]) ``` 特徵提取完畢,做後處理 ``` python class LaneNetBackEnd(cnn_basenet.CNNBaseModel): def inference(self, binary_seg_logits, instance_seg_logits, name, reuse): """ :param binary_seg_logits: :param instance_seg_logits: :param name: :param reuse: :return: """ with tf.variable_scope(name_or_scope=name, reuse=reuse): with tf.variable_scope(name_or_scope='binary_seg'): binary_seg_score = tf.nn.softmax(logits=binary_seg_logits) binary_seg_prediction = tf.argmax(binary_seg_score, axis=-1) with tf.variable_scope(name_or_scope='instance_seg'): pix_bn = self.layerbn( inputdata=instance_seg_logits, is_training=self._is_training, name='pix_bn') pix_relu = self.relu(inputdata=pix_bn, name='pix_relu') instance_seg_prediction = self.conv2d( inputdata=pix_relu, out_channel=CFG.TRAIN.EMBEDDING_FEATS_DIMS, kernel_size=1, use_bias=False, name='pix_embedding_conv' ) return binary_seg_prediction, instance_seg_prediction ``` 對每個畫素的分類,做softmax轉成概率.再argmax求概率較大值的下標.　　對每個畫素的向量表示,用1x1卷積核做卷積,得到channel維度=CFG.TRAIN.EMBEDDING_FEATS_DIMS(配置為4).即(1,256,512,64)卷積得到(1,256,512,4)的tensor．即每個畫素用一個四維向量表示. 所以,整個LaneNet的inference返回的是兩個tensor.一個shape為(1,256,512) 一個為(1,256,512,4). ### 後處理 ``` python class LaneNetPostProcessor(object): def postprocess(self, binary_seg_result, instance_seg_result=None, min_area_threshold=100, source_image=None, data_source='tusimple'): ``` 對binary_seg_result,先通過形態學操作將小的空洞去除.參考

然後做聚類. ``` python def _get_lane_embedding_feats(binary_seg_ret, instance_seg_ret): """ get lane embedding features according the binary seg result :param binary_seg_ret: :param instance_seg_ret: :return: """ idx = np.where(binary_seg_ret == 255) #idx (b,h,w) lane_embedding_feats = instance_seg_ret[idx] # idx_scale = np.vstack((idx[0] / 256.0, idx[1] / 512.0)).transpose() # lane_embedding_feats = np.hstack((lane_embedding_feats, idx_scale)) lane_coordinate = np.vstack((idx[1], idx[0])).transpose() assert lane_embedding_feats.shape[0] == lane_coordinate.shape[0] ret = { 'lane_embedding_feats': lane_embedding_feats, 'lane_coordinates': lane_coordinate } return ret ``` 獲取到座標及對應座標畫素對應的向量表示. np.where(condition) 只有條件 (condition)，沒有x和y，則輸出滿足條件 (即非0) 元素的座標 (等價於numpy.nonzero)。這裡的座標以tuple的形式給出，通常原陣列有多少維，輸出的tuple中就包含幾個陣列，分別對應符合條件元素的各維座標。 ## 測試結果 tensorflow-gpu 1.15.2 4張titan xp (4, 256, 512) (4, 256, 512, 4) I0302 17:04:31.276140 29376 test_lanenet.py:222] imgae inference cost time: 2.58794s (32, 256, 512) (32, 256, 512, 4) I0302 17:05:50.322593 29632 test_lanenet.py:222] imgae inference cost time: 4.31036s 類似於高吞吐量,高延遲.對單幀圖片處理在1-2s,多幅圖片同時處理,平均下來的處理速度在0.1s. 論文裡的backbone為enet,在nvida 1080 ti上推理速度52fps. 對於這個問題的解釋,作者的解釋是 >

2.Origin paper use Enet as backbone net but I use vgg16 as backbone net so speed will not get as fast as that. 3.Gpu need a short time to warm up and you can adjust your batch size to test the speed again:) 一個是特徵提取網路和論文裡不一致,一個是gpu有一個短暫的warm up的時間. 我自己的測試結果是在extract image features耗時較多.換一個backbone可能會有改善. ```　python def inference(self, input_tensor, name): """ :param input_tensor: :param name: :return: """ print("***************,input_tensor shape:",input_tensor.shape) with tf.variable_scope(name_or_scope=name, reuse=self._reuse): t_start = time.time() # first extract image features extract_feats_result = self._frontend.build_model( input_tensor=input_tensor, name='{:s}_frontend'.format(self._net_flag), reuse=self._reuse ) t_cost = time.time() - t_start glog.info('extract image features cost time: {:.5f}s'.format(t_cost)) # second apply backend process t_start = time.time() binary_seg_prediction, instance_seg_prediction = self._backend.inference( binary_seg_logits=extract_feats_result['binary_segment_logits']['data'], instance_seg_logits=extract_feats_result['instance_segment_logits']['data'], name='{:s}_backend'.format(self._net_flag), reuse=self._reuse ) t_cost = time.time() - t_start glog.info('backend process cost time: {:.5f}s'.format(t_cost)) if not self._reuse: self._reuse = True return binary_seg_prediction, instance_seg_prediction ``` 參考:https://www.cnblogs.com/xuanyuyt/p/11523192.html　　https://zhuanlan.zhihu.com/p/

車道線檢測LaneNet

車道線檢測LaneNet

基於Spatial CNN的車道線檢測和交通場景理解

車道線檢測參考學習資料

車道線檢測最全資料集錦

【智慧駕駛】車道線檢測中的新IPM（逆透視變換）演算法實驗效果

基於python的車道線檢測

卷積神經網路CNN（8）—— Pix2Pix Application -- Aerialmap Lane Line Detection (Pix2Pix應用：航拍圖車道線檢測)

車道線檢測之-sobel運算元邊緣檢測原理介紹

無人駕駛之高階車道線檢測-AdvanceLane_finding_release

優達學城無人駕駛工程師——P4車道線檢測功能

車道線檢測

車道線檢測演算法

無人駕駛--車道線檢測實戰（附原始碼）

無人駕駛之車道線檢測簡易版

車道線檢測方法總結

車道線檢測資源

Lane-Detection 近期車道線檢測論文閱讀總結

檢測車道線——2.選擇興趣區域 Region Masking

手把手教用matlab做無人駕駛（九）--專案1：使用單目相機檢測車道線

LIDAR系列之2：用鐳射雷達檢測車道線

車道線檢測LaneNet

相關推薦