1. 程式人生 > >Dex-Net 2.0 論文翻譯

Dex-Net 2.0 論文翻譯

一、緒論

1)本文的主要貢獻
1、製作dex-net2.0資料集,該資料集包括670萬點雲資料,又從1500個
3D模型通過GWS(抓手執行空間分析)得到手爪的執行規劃
2、設計Grasp Quality Convolutional Neural Network (GQ-CNN),去得到一系列魯棒性良好的抓取規劃
3、設定一種抓取機制,可以對得到的魯棒性良好的一組抓取規劃進行
rank排序,最終得到最優的抓取規劃
2)相關工作
為了執行機械手爪對物體的抓取,通常的方法是預先計算被抓物體的狀態(形狀、大小、物體位姿、相機位姿、摩擦係數等),使用點雲註冊的方法對預先計算的抓取物體進行索引:使用視覺和幾何相似度將匹配點雲與資料庫中的已知3D物件模型進行匹配,並執行最優的抓取規劃。
一個魯棒性抓取規劃(RGP),需要最大限度的提高規劃的魯棒性,或者在度量和控制的誤差相對較大下儘量提高規劃的準確性。
因此,dex-net 2.0資料集對dex-net 1.0版本進行了擴充套件,極大提高了RGP的取樣複雜度(結合了點雲和候選魯棒性抓取),之後通過訓練一個卷積神經網路模型來達到最優大區的目的。
3)運作流圖
這裡寫圖片描述

二、問題闡述

這裡寫圖片描述
1)假設
平行手爪夾持器,且已知形狀大小
可在平面工件表面上劃分的剛性區域
用深度相機拍攝的單檢視(2.5D)點雲
單個深度相機,且已知大小形狀
2)名詞解釋
這裡寫圖片描述:為狀態函式,其中O為待抓物體的形狀,To為物體座標系,Tc為相機座標系,γ為摩擦係數
image y:為深度圖或者2.5D點雲
Grasp g:定義為這裡寫圖片描述可以看作是一次抓取規劃,其中p為手爪的三維座標(物體座標系下), ψ為手抓相對於抓取點對的旋轉
Succsee Metric S:為一次抓取規劃成功的度量,定義為
這裡寫圖片描述
,其中Eq定義為epsilon質量,包括摩擦係數和夾持器姿態的不確定性帶來的姿態誤差的魯棒性度量,collfree(u,x)為執行抓取u,狀態為x時無碰撞 ,對此進行魯棒性分析。
此時y為觀測值,x為實際狀態值
定義:
魯棒性函式:
這裡寫圖片描述

是一次抓取規劃與觀測值(深度圖/點雲)聯合分佈情況下,成功規劃S的數學期望
我們的最終目的是學習一個魯棒性函式這裡寫圖片描述,使得
策略函式滿足條件
這裡寫圖片描述,C為對跖點對集
即訓練一個網路,使得網路引數θ:這裡寫圖片描述
這裡寫圖片描述為網路引數集合,L為交叉熵損失函式

三、資料集的產生

這裡寫圖片描述
1)1500個原始3D網格模型,通過Dex-Net 1.0的方式在模型表面生成數百個垂直於表面的模擬抓取點,通過對跖的方式找到對應點對。
2)通過渲染的方式還原物體模型,為了去除學習旋轉不變性的需要,通過旋轉的方式將每個對跖點對連線方向與定義座標系橫軸對齊。並通過縮放的方式將對跖點距離縮放為統一大小,並擷取32*32區域為資料集。
3)通過以上方式可以獲得670萬個抓取資料集圖片
由於外界噪聲、測量誤差、相機引數誤差等因素,使得狀態x
服從:這裡寫圖片描述

聯合分佈
在該聯合分佈下,通過epsilon質量(包括摩擦係數和夾持器姿態的不確定性帶來的姿態誤差的魯棒性度量)這裡寫圖片描述來將670萬資料分成正負樣本

四、GQ-CNN

這裡寫圖片描述
1)輸入與輸出
輸入為:1、記錄Grasp Candidate抓取點(i,j),對跖點對連線旋轉置
與對應座標系橫軸平行,記錄旋轉角度θ,縮放置固定大小後擷取32*32圖片,成為Aligned Image(對準圖片)
2、抓取點深度Z
輸出:魯棒性函式:這裡寫圖片描述再通過rank得分得到魯棒性排序函式這裡寫圖片描述從而得到初期最優抓取規劃
2)第一層卷積核的特殊作用
這裡寫圖片描述對於第一層卷積核,可以表達出影象的梯度資訊,根據此梯度資訊可以推測手爪與物體的碰撞資訊,然後根據collfree與這裡寫圖片描述可以判斷出最優抓取規劃
3)前期工作
1、預處理:影象進行正則化處理
2、為增加資料集,對det-net2.0資料集進行水平翻轉
3、採用零均值高斯噪聲來近似取代噪聲影響
4)待抓取物體要求
1、具有對抗性幾何物體特性
2、體積小於工作空間
3、重量小於0.25kg,抓取點高於平臺1cm(因為yumi手抓特性決定的)

五、GQ-CNN精度對比

對於Adv-Synth(189K-datapoints)、Adv-Phys(400K-datapoints)、Dex-Net-Small(670K-datapoints)、Dex-Net-Large(6.7m-datapoints)四個資料集,80%置為訓練集、20%為驗證集,精度對比如下:
這裡寫圖片描述
其中,真正類率(true positive rate ,TPR), 計算公式為TPR=TP/ (TP+ FN),刻畫的是分類器所識別出的正例項佔所有正例項的比例。另外一個是假正類率(false positive rate, FPR),計算公式為FPR= FP / (FP + TN),計算的是分類器錯認為正類的負例項佔所有負例項的比例。

六、GQ-CNN幾種參照效能對比

Success Rate:在認為隨意放置的待抓物體,手爪通過移動旋轉操作之後能抓起物體的成功比率
Precision:用比率表示魯棒性,50%為閾值
Robust Grasp Rate:在Precision高於50%的抓取規劃中,真正去實施抓取的比例
Planning Time:得到深度圖到去執行抓取動作所需要的時間

這裡寫圖片描述

七、抓取失敗的兩因素與分析

最主要失敗的原因:
1、RGB-D圖中有細小部分沒有被檢測出
2、沒有正確的分析到哪些為非碰撞區域

▲:要用更精確的深度感測器以及碰撞區分析方法

八、參考文獻

[1] Mart´ ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo,
Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey
Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine
learning on heterogeneous distributed systems. arXiv preprint
arXiv:1603.04467, 2016.
[2] Ravi Balasubramanian, Ling Xu, Peter D Brook, Joshua R Smith,
and Yoky Matsuoka. Physical human interactive guidance: Iden-
tifying grasping principles from human-planned grasps. IEEE
Trans. Robotics, 28(4):899–910, 2012.
[3] Jeannette Bohg and Danica Kragic. Learning grasping points with
shape context. Robotics and Autonomous Systems, 58(4):362–
377,
2010.
[4] Jeannette Bohg, Antonio Morales, Tamim Asfour, and Danica
Kragic. Data-driven grasp synthesisa survey. IEEE Trans.
Robotics, 30(2):289–309, 2014.
[5] Peter Brook, Matei Ciocarlie, and Kaijen Hsiao. Collaborative
grasp planning with multiple object representations.
In Proc. IEEEInt. Conf. Robotics and Automation (ICRA), pages 2851–2858.
IEEE, 2011.
[6] I-Ming Chen and Joel W Burdick. Finding antipodal point grasps
on irregularly shaped objects. IEEE Trans. Robotics and Automa-
tion, 9(4):507–512, 1993.
[7] Matei Ciocarlie, Kaijen Hsiao, Edward Gil Jones, Sachin Chitta,
Radu Bogdan Rusu, and Ioan A S¸ucan. Towards reliable grasping
and manipulation in household environments. In Experimental
Robotics, pages 241–252. Springer, 2014.
[8] Navneet Dalal and Bill Triggs. Histograms of oriented gradients
for human detection. In Proc. IEEE Conf. on Computer Vision
andPatternRecognition(CVPR),volume1,pages886–893.IEEE,
2005.
[9] Renaud Detry, Carl Henrik Ek, Marianna Madry, and Danica
Kragic. Learning a dictionary of prototypical grasp-predicting
parts from grasping experience. In Proc. IEEE Int. Conf. Robotics
and Automation (ICRA), pages 601–608. IEEE, 2013.
[10] Andreas Eitel, Jost Tobias Springenberg, Luciano Spinello, Martin
Riedmiller, and Wolfram Burgard. Multimodal deep learning for
robust rgb-d object recognition. In Proc. IEEE/RSJ Int. Conf.
on Intelligent Robots and Systems (IROS), pages 681–687. IEEE,
2015.
[11] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The
elements of statistical learning, volume 1. Springer series in
statistics Springer, Berlin, 2001.
[12] Ken Goldberg, Brian V Mirtich, Yan Zhuang, John Craig, Brian R
Carlisle, and John Canny. Part pose statistics: Estimators and
experiments. IEEE Trans. Robotics and Automation, 15(5):849–
857, 1999.
[13] Corey Goldfeder and Peter K Allen. Data-driven grasping. Au-
tonomous Robots, 31(1):1–20, 2011.
[14] Corey Goldfeder, Matei Ciocarlie, Hao Dang, and Peter K Allen.
The columbia grasp database. In Proc. IEEE Int. Conf. Robotics
and Automation (ICRA), pages 1710–1716. IEEE, 2009.
[15] Marcus Gualtieri, Andreas ten Pas, Kate Saenko, and Robert Platt.
Highprecisiongraspposedetectionindenseclutter. arXivpreprint
arXiv:1603.01564, 2016.
[16] Menglong Guo, David V Gealy, Jacky Liang, Jeffrey Mahler,
Aimee Goncalves, Stephen McKinley, and Ken Goldberg. Design
of parallel-jaw gripper tip surfaces for robust grasping. In Proc.
IEEE Int. Conf. Robotics and Automation (ICRA), 2017.
[17] Saurabh Gupta, Pablo Arbeláez, Ross Girshick, and Jitendra Ma-
lik. Aligning 3d models to rgb-d images of cluttered scenes. In
Proc. IEEE Conf. on Computer Vision and Pattern Recognition
(CVPR), pages 4731–4740, 2015.
[18] Richard Hartley and Andrew Zisserman. Multiple view geometry
in computer vision. Cambridge university press, 2003.
[19] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delv-
ing deep into rectifiers: Surpassing human-level performance on
imagenet classification. In Proc. IEEE Int. Conf. on Computer
Vision (ICCV), pages 1026–1034, 2015.
[20] Carlos Hernandez, Mukunda Bharatheesha, Wilson Ko, Hans
Gaiser, Jethro Tan, Kanter van Deurzen, Maarten de Vries, Bas
Van Mil, Jeff van Egmond, Ruben Burger, et al. Team delft’s
robotwinneroftheamazonpickingchallenge2016. arXivpreprint
arXiv:1610.05514, 2016.
[21] Alexander Herzog, Peter Pastor, Mrinal Kalakrishnan, Ludovic
Righetti, Jeannette Bohg, Tamim Asfour, and Stefan Schaal.
Learning of grasp selection based on shape-templates. Au-
tonomous Robots, 36(1-2):51–65, 2014.
[22] Stefan Hinterstoisser, Stefan Holzer, Cedric Cagniart, Slobodan
Ilic, Kurt Konolige, Nassir Navab, and Vincent Lepetit. Multi-
modal templates for real-time detection of texture-less objects in
heavily cluttered scenes. In Proc. IEEE Int. Conf. on Computer
Vision (ICCV), pages 858–865. IEEE, 2011.
[23] MaxJaderberg,KarenSimonyan,AndrewZisserman,etal. Spatial
transformer networks. In Proc. Advances in Neural Information
Processing Systems, pages 2017–2025, 2015.
[24] Edward Johns, Stefan Leutenegger, and Andrew J Davison. Deep
learning a grasp function for grasping under gripper pose uncer-
tainty. In Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and
Systems (IROS), pages 4461–4468. IEEE, 2016.
[25] Daniel Kappler, Jeannette Bohg, and Stefan Schaal. Leveraging
big data for grasp planning. In Proc. IEEE Int. Conf. Robotics and
Automation (ICRA), 2015.
[26] Alexander Kasper, Zhixing Xue, and Rüdiger Dillmann. The
kit object models database: An object model database for object
recognition, localization and manipulation in service robotics. Int.
Journal of Robotics Research (IJRR), 31(8):927–934, 2012.
[27] Ben Kehoe, Akihiro Matsukawa, Sal Candido, James Kuffner,
and Ken Goldberg. Cloud-based robot grasping with the google
object recognition engine. In Proc. IEEE Int. Conf. Robotics and
Automation (ICRA), pages 4263–4270. IEEE, 2013.
[28] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet
classification with deep convolutional neural networks. In Proc.
Advances in Neural Information Processing Systems, pages 1097–
1105, 2012.
[29] Michael Laskey, Jeffrey Mahler, Zoe McCarthy, Florian T Poko-
rny, Sachin Patil, Jur van den Berg, Danica Kragic, Pieter Abbeel,
and Ken Goldberg. Multi-armed bandit models for 2d grasp
planning with uncertainty. In Proc. IEEE Conf. on Automation
Science and Engineering (CASE). IEEE, 2015.
[30] Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, San-
jay Krishnan, Kevin Jamieson, Anca Dragan, and Ken Goldberg.
Comparing human-centric and robot-centric sampling for robot
deep learning from demonstrations. In Proc. IEEE Int. Conf.
Robotics and Automation (ICRA). IEEE, 2017.
[31] Ian Lenz, Honglak Lee, and Ashutosh Saxena. Deep learning for
detecting robotic grasps. Int. Journal of Robotics Research (IJRR),
34(4-5):705–724, 2015.
[32] Sergey Levine, Peter Pastor, Alex Krizhevsky, and Deirdre
Quillen. Learning hand-eye coordination for robotic grasping
with deep learning and large-scale data collection. arXiv preprint
arXiv:1603.02199, 2016.
[33] Jeffrey Mahler, Brian Hou, Sherdil Niyaz, Florian T Pokorny,
Ramu Chandra, and Ken Goldberg. Privacy-preserving grasp
planning in the cloud. In Proc. IEEE Conf. on Automation Science
and Engineering (CASE), pages 468–475. IEEE, 2016.
[34] Jeffrey Mahler, Florian T Pokorny, Brian Hou, Melrose Roderick,
Michael Laskey, Mathieu Aubry, Kai Kohlhoff, Torsten Kröger,
James Kuffner, and Ken Goldberg. Dex-net 1.0: A cloud-based
network of 3d objects for robust grasp planning using a multi-
armed bandit model with correlated rewards. In Proc. IEEE Int.
Conf. Robotics and Automation (ICRA). IEEE, 2016.
[35] Tanwi Mallick, Partha Pratim Das, and Arun Kumar Majumdar.
Characterizations of noise in kinect depth images: A review. IEEE
Sensors Journal, 14(6):1731–1740, 2014.
[36] John W Miller, Rod Goodman, and Padhraic Smyth. On loss
functions which minimize to conditional expected values and
posterior probabilities. IEEE Transactions on Information Theory,
39(4):1404–1408, 1993.
[37] Luis Montesano and Manuel Lopes. Active learning of visual
descriptors for grasping using non-parametric smoothed beta dis-
tributions. Robotics and Autonomous Systems, 60(3):452–462,
2012.
[38] Anh Nguyen, Dimitrios Kanoulas, Darwin G Caldwell, and
Nikos G Tsagarakis. Detecting object affordances with convolu-
tional neural networks. In Proc. IEEE/RSJ Int. Conf. on Intelligent
Robots and Systems (IROS), pages 2765–2770. IEEE, 2016.
[39] John Oberlin and Stefanie Tellex. Autonomously acquiring
instance-based object models from experience. In Int. S. Robotics
Research (ISRR), 2015.
[40] Lerrel Pinto and Abhinav Gupta. Supersizing self-supervision:
Learning to grasp from 50k tries and 700 robot hours. In Proc.
IEEE Int. Conf. Robotics and Automation (ICRA), 2016.
[41] Lerrel Pinto, James Davidson, and Abhinav Gupta. Supervision
via competition: Robot adversaries for learning tasks. arXiv
preprint arXiv:1610.01685, 2016.
[42] Florian T Pokorny and Danica Kragic. Classical grasp quality
evaluation: New algorithms and theory. In Proc. IEEE/RSJ Int.
Conf. on Intelligent Robots and Systems (IROS), pages 3493–3500.
IEEE, 2013.
[43] Lorenzo Porzi, Samuel Rota Bulo, Adrian Penate-Sanchez, Elisa
Ricci, and Francesc Moreno-Noguer. Learning depth-aware deep
representations for robotic perception. IEEE Robotics & Automa-
tion Letters, 2016.
[44] Domenico Prattichizzo and Jeffrey C Trinkle. Grasping. In
Springer handbook of robotics, pages 671–700. Springer, 2008.
[45] Joseph Redmon and Anelia Angelova. Real-time grasp detection
using convolutional neural networks. In Proc. IEEE Int. Conf.
Robotics and Automation (ICRA), pages 1316–1322. IEEE, 2015.
[46] Alberto Rodriguez, Matthew T Mason, and Steve Ferry. From
caging to grasping. Int. Journal of Robotics Research (IJRR), page
0278364912442972, 2012.
[47] Reuven Y Rubinstein, Ad Ridder, and Radislav Vaisman. Fast
sequential Monte Carlo methods for counting and optimization.
John Wiley & Sons, 2013.
[48] Fereshteh Sadeghi and Sergey Levine. (cad)2 rl: Real single-
image flight without a single real image. arXiv preprint
arXiv:1611.04201, 2016.
[49] Renato F Salas-Moreno, Richard A Newcombe, Hauke Strasdat,
Paul HJ Kelly, and Andrew J Davison. Slam++: Simultaneous
localisation and mapping at the level of objects. In Proc. IEEE
Conf. on Computer Vision and Pattern Recognition (CVPR), pages
1352–1359, 2013.
[50] Ashutosh Saxena, Justin Driemeyer, and Andrew Y Ng. Robotic
grasping of novel objects using vision. The International Journal
of Robotics Research, 27(2):157–173, 2008.
[51] Daniel Seita, Florian T Pokorny, Jeffrey Mahler, Danica Kragic,
Michael Franklin, John Canny, and Ken Goldberg. Large-scale su-
pervised learning of the grasp robustness of surface patch pairs. In
Proc. IEEE Int. Conf. on Simulation, Modeling, and Programming
of Autonomous Robots (SIMPAR). IEEE, 2016.
[52] Andreas ten Pas and Robert Platt. Using geometry to detect grasp
posesin3dpointclouds. InIntlSymp.onRoboticsResearch,2015.
[53] Eric Tzeng, Coline Devin, Judy Hoffman, Chelsea Finn, Pieter
Abbeel, Sergey Levine, Kate Saenko, and Trevor Darrell. Adapt-
ing deep visuomotor representations with weak pairwise con-
straints. In Workshop on the Algorithmic Foundation of Robotics
(WAFR), 2016.
[54] Jacob Varley, Jonathan Weisz, Jared Weiss, and Peter Allen. Gen-
erating multi-fingered robotic grasps via deep learning. In Proc.
IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS),
pages 4415–4420. IEEE, 2015.
[55] Matthew Veres, Medhat Moussa, and Graham W Taylor. Modeling
grasp motor imagery through deep conditional generative models.
arXiv preprint arXiv:1701.03041, 2017.
[56] JonathanWeiszandPeterKAllen. Poseerrorrobustgraspingfrom
contact wrench space metrics. In Proc. IEEE Int. Conf. Robotics
and Automation (ICRA), pages 557–562. IEEE, 2012.
[57] Walter Wohlkinger, Aitor Aldoma, Radu B Rusu, and Markus
Vincze. 3dnet: Large-scale object class recognition from cad
models. InProc.IEEEInt.Conf.RoboticsandAutomation(ICRA),
pages 5384–5391. IEEE, 2012.
[58] Ziang Xie, Arjun Singh, Justin Uang, Karthik S Narayan, and
Pieter Abbeel. Multimodal blending for high-accuracy instance
recognition. In Proc. IEEE/RSJ Int. Conf. on Intelligent Robots
and Systems (IROS), pages 2214–2221. IEEE, 2013.
[59] Andy Zeng, Kuan-Ting Yu, Shuran Song, Daniel Suo,
Ed Walker Jr, Alberto Rodriguez, and Jianxiong Xiao. Multi-view
self-supervised deep learning for 6d pose estimation in the amazon
picking challenge. arXiv preprint arXiv:1609.09475, 2016.
[60] Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J Lim, Abhinav
Gupta, Li Fei-Fei, and Ali Farhadi. Target-driven visual navigation
in indoor scenes using deep reinforcement learning. arXiv preprint
arXiv:1609.05143, 2016.