2018頂會論文彙編

阿新 • • 發佈：2018-12-27

CVPR 2018

大會時間：6月18日~22日

會議地點：鹽湖城，UTAH

國際計算機視覺與模式識別會議（Conference on Computer Vision and Pattern Recognition，CVPR）是IEEE一年一度的學術性會議，會議的主要內容是計算機視覺與模式識別技術。CVPR是世界頂級的計算機視覺會議，近年來每年有約1000名參加者，收錄的論文數量一般300篇左右。本會議每年都會有固定的研討主題，而每一年都會有公司贊助該會議並獲得在會場展示的機會。

最佳論文

《Taskonomy：Disentangling Task Transfer Learning》

Amir Zamir, Alexander Sax, William Shen, Leonidas Guibas, Jitendra Malik, Silvio Savarese

【Abstract】Do visual tasks have a relationship, or are they unrelated? For instance, could having surface normals simplify estimating the depth of an image? Intuition answers these questions positively, implying existence of a structure among visual tasks. Knowing this structure has notable values; it is the concept underlying transfer learning and pro- vides a principled way for identifying redundancies across tasks, in order to, for instance, seamlessly reuse supervision among related tasks or solve many tasks in one system without piling up the complexity.

We propose a fully computational approach for modeling the structure of the space of visual tasks. This is done via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty-six 2D, 2.5D, 3D, and semantic tasks in a latent space. The product is a computational taxonomic map for task transfer learning. We study the consequences of this structure, e.g. emerged relationships, and exploit them to reduce the demand for labeled data. For example, we show that the total number of labeled data points needed for solving a set of 10 tasks can be reduced by roughly 2/3 (compared to training independently) while keeping the performance nearly the same. We provide a set of tools for computing and probing this taxonomical structure including a solver that users can employ to devise efficient supervision policies for their use cases.

【論文摘要】視覺任務之間是否有關聯，或者它們是否無關？例如，法線(Surface Normals)可以簡化估算影象的深度(Depth)嗎？直覺回答了這些問題，暗示了視覺任務中存在結構。瞭解這種結構具有顯著的價值;它是遷移學習的基本概念，並提供了一種原則性的方法來識別任務之間的冗餘，例如，無縫地重用相關任務之間的監督或在一個系統中解決許多工而不會增加複雜性。我們提出了一種完全計算的方法來建模視覺任務的空間結構。這是通過在潛在空間中的26個2D，2.5D，3D和語義任務的字典中查詢（一階和更高階）遷移學習依賴性來完成的。該產品是用於任務遷移學習的計算分類地圖。我們研究了這種結構的後果，例如非平凡的關係，並利用它們來減少對標籤資料的需求。例如，我們表明，解決一組10個任務所需的標籤資料點總數可以減少大約2/3（與獨立訓練相比），同時保持效能幾乎相同。

最佳論文提名

《Deep Learning of Graph Matching》

Andrei Zanfir, Cristian Sminchisescu

【Abstract】The problem of graph matching under node and pair- wise constraints is fundamental in areas as diverse as combinatorial optimization, machine learning or computer vision, where representing both the relations between nodes and their neighborhood structure is essential. We present an end-to-end model that makes it possible to learn all parameters of the graph matching process, including the unary and pairwise node neighborhoods, represented as deep feature extraction hierarchies. The challenge is in the formulation of the different matrix computation layers of the model in a way that enables the consistent, efficient propagation of gradients in the complete pipeline from the loss function, through the combinatorial optimization layer solving the matching problem, and the feature extraction hierarchy. Our computer vision experiments and ablation studies on challenging datasets like PASCAL VOC keypoints, Sintel and CUB show that matching models refined end-to-end are superior to counterparts based on feature hierarchies trained for other problems.

【論文摘要】在節點和配對約束下的圖匹配問題是組合優化、機器學習或計算機視覺等許多領域中的基本問題，其中表示節點之間的關係及其鄰域結構是至關重要的。本文提出了一個端到端的模型，使其能夠學習圖形匹配過程的所有引數，包括表示為深度特徵提取層次的一元節點鄰域和二元節點鄰域。挑戰在於通過求解匹配問題的組合優化層和特徵提取層次，以能夠從損失函式在整個管道（pipeline）中實現梯度的一致。坐著在PASCAL VOC keypoints、Sintel和CUB等具有挑戰性的資料集上的計算機視覺實驗和消融研究表明，端到端精確匹配模型優於基於針對其他問題訓練出的特徵層次結構的模型。

《SPLATNet: Sparse Lattice Networks for Point Cloud Processing》

Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, Jan Kautz

【Abstract】We present a network architecture for processing point clouds that directly operates on a collection of points rep- resented as a sparse set of samples in a high-dimensional lattice. Na ̈ıvely applying convolutions on this lattice scales poorly, both in terms of memory and computational cost, as the size of the lattice increases. Instead, our network uses sparse bilateral convolutional layers as building blocks. These layers maintain efficiency by using indexing structures to apply convolutions only on occupied parts of the lattice, and allow flexible specifications of the lattice structure enabling hierarchical and spatially-aware feature learning, as well as joint 2D-3D reasoning. Both point-based and image-based representations can be easily incorporated in a network with such layers and the resulting model can be trained in an end-to-end manner. We present results on 3D segmentation tasks where our approach outperforms existing state-of-the-art techniques.

【論文摘要】本文提出了用於處理點雲的網路結構，該點雲直接在高維網格中表示為稀疏樣本集的點集合上操作。隨著晶格尺寸的增加，在這個晶格上應用卷積在儲存和計算成本方面都表現得非常糟糕。相反，我們的網路使用稀疏的雙邊卷積層作為基本結構。這些層通過使用索引結構來保持效率，從而僅對格子的佔用部分應用卷積，並且允許格子結構的靈活規範，從而實現分層和空間感知的特徵學習以及聯合2D-3D推理。基於點和基於影象的表示都可以很容易地結合到具有此類層的網路中，並且所得到的模型可以用端到端的方式訓練。本文在3D分割任務上的結果顯示該方法優於現有最優的技術。

《CodeSLAM-learning a Compact, Optimisable Representation for Dense Visual SLAM》

Michael Bloesch, Jan Czarnowski, Ronald Clark, Stefan Leutenegger, Andrew J. Davison

【Abstract】The representation of geometry in real-time 3D perception systems continues to be a critical research issue. Dense maps capture complete surface shape and can be augmented with semantic labels, but their high dimensionality makes them computationally costly to store and process, and unsuitable for rigorous probabilistic inference. Sparse feature-based representations avoid these problems, but capture only partial scene information and are mainly useful for localisation only.

We present a new compact but dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters. We are inspired by work both on learned depth from images, and auto-encoders. Our approach is suitable for use in a keyframe-based monocular dense SLAM system: While each keyframe with a code can produce a depth map, the code can be optimised efficiently jointly with pose variables and together with the codes of overlapping keyframes to attain global consistency. Conditioning the depth map on the image allows the code to only represent aspects of the local geometry which cannot directly be predicted from the image. We explain how to learn our code representation, and demonstrate its advantageous properties in monocular SLAM.

【論文摘要】實時三維感知系統中的幾何表示仍然是一個關鍵的研究課題。稠密對映可以捕獲完整的表面形狀，並且可以用語義標籤進行擴充，但是它們的高維數使得它們儲存和處理的計算成本很高，並且不適合用於嚴格的概率推斷。稀疏的基於特徵的表示避免了這些問題，但是隻捕獲部分場景資訊，並且主要用於定位。本文提出一種新的緊湊密集的場景幾何表示，它以單個影象的強度資料為條件，並且由含少量引數的編碼生成。這個方法的靈感來自於從影象學習的深度和自動編碼器兩方面的工作。該方法適合在基於關鍵幀的單目密集SLAM系統中使用：雖然每個帶有編碼的關鍵幀可以生成一個深度圖，但是可以與姿態變數以及重疊關鍵幀的編碼一起有效地優化編碼，以實現全域性一致性。對影象上的深度圖進行條件化允許編碼僅表示不能從影象中直接預測的區域性幾何體。本文還解釋如何學習編碼表示，並演示其在單目SLAM中的優勢。

《Efficient Optimization for Rank-based Loss Functions》

Pritish Mohapatra, Michal Rolínek C.V. Jawahar, Vladimir Kolmogorov, M. Pawan Kumar

【Abstract】The accuracy of information retrieval systems is often measured using complex loss functions such as the aver- age precision (AP) or the normalized discounted cumulative gain (NDCG). Given a set of positive and negative samples, the parameters of a retrieval system can be estimated by minimizing these loss functions. However, the non-differentiability and non-decomposability of these loss functions does not allow for simple gradient based optimization algorithms. This issue is generally circumvented by either optimizing a structured hinge-loss upper bound to the loss function or by using asymptotic methods like the direct-loss minimization framework. Yet, the high computational complexity of loss-augmented inference, which is necessary for both the frameworks, prohibits its use in large training data sets. To alleviate this deficiency, we present a novel quicksort flavored algorithm for a large class of non-decomposable loss functions. We provide a complete characterization of the loss functions that are amenable to our algorithm, and show that it includes both AP and NDCG based loss functions. Furthermore, we prove that no comparison based algorithm can improve upon the computational complexity of our approach asymptotically. We demonstrate the effectiveness of our approach in the context of optimizing the structured hinge loss upper bound of AP and NDCG loss for learning models for a variety of vision tasks. We show that our approach provides significantly better results than simpler decomposable loss functions, while requiring a comparable training time.

【論文摘要】資訊檢索系統的精度通常使用諸如平均精度（Average Precision，AP）或歸一化折扣累積增益（Normalized Discounted Cumulative Gain，NDCG）的複雜損失函式來測量。給定一組正樣本和負樣本，可以通過最小化這些損失函式來估計檢索系統的引數。然而，這些損失函式的不可微性和不可分解性使得我們無法使用簡單的基於梯度的優化演算法。這個問題通常通過優化損失函式的結構鉸鏈損失（hinge-loss）上界或者使用像直接損失最小化框架（direct-loss minimization framework）這樣的漸進方法來避免。然而，損失增強推理（loss-augmented inference）的高計算複雜度限制了它在大型訓練資料集中的使用。為了克服這一不足，我們提出了一種針對大規模不可分解損失函式的快速排序演算法。我們提供了符合這一演算法的損失函式的特徵描述，它可以處理包括AP和NDCC系列的損失函式。此外，我們證明了任何基於比較的演算法都不能提高我們方法的漸近計算複雜度。在優化各種視覺任務學習模型的結構鉸鏈損失上限的AP和NDCG損失，我們證明了該方法的有效性。我們證明該方法比簡單的可分解損失函式提供更好的結果，同時只需要相當的訓練時間。

ECCV 2018

會議時間：9月8日~14日

會議地點：慕尼黑，德國

歐洲計算機視覺國際會議（European Conference on Computer Vision，ECCV）兩年一次，是計算機視覺三大會議（另外兩個是ICCV和CVPR）之一。每次會議在全球範圍錄用論文300篇左右，主要的錄用論文都來自美國、歐洲等頂尖實驗室及研究所，中國大陸的論文數量一般在10-20篇之間。ECCV2010的論文錄取率為27%。

本屆大會收到論文投稿 2439 篇，接收 776 篇（31.8%），59 篇 oral 論文，717 篇 poster 論文。在活動方面，ECCV 2018 共有 43 場 Workshop 和 11 場 Tutorial。

最佳論文Best Paper Award（一篇）

《Implicit 3D Orientation Learning for 6D Object Detection from RGB Images》

Martin Sundermeyer, Zoltan-Csaba Marton, Maximilian Durner, Manuel Brucker, Rudolph Triebel

【Abstract】We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization.

This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Experiments on the T-LESS and LineMOD datasets show that our method outperforms similar model- based approaches and competes with state-of-the art approaches that require real pose-annotated images.

【論文摘要】本文提出了一種基於RGB影象的實時物體檢測與6維姿態估計的方法。其中，新型的3維目標朝向估計方法是基於降噪自編碼器（Denoising Autoencoder）的一個變種，它使用域隨機化（Domain Randomization）方法在3維模型的模擬檢視上進行訓練。這種我們稱之為“增強自編碼器”（Augmented Autoencoder，AAE）的方法，比現有方法具有很多優點：它不需要真實的姿勢標註的訓練資料，可泛化到多種測試感測器，且能夠內部處理目標和檢視的對稱性。該方法不學習從輸入影象到目標姿勢的明確對映，相反，它提供了樣本在隱空間（latent space）中定義的目標朝向的隱式表達。在 T-LESS 和 LineMOD 資料集上的測試表明，我們的方法優於類似的基於模型的方法，可以媲美需要真實姿態標註影象的當前最優的方法。

最佳論文提名

Best Paper Award, Honorable Mention（兩篇）

《Group Normalization》

【Abstract】Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems — BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN’s usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN’s computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pre-training to fine-tuning. GN can outperform its BN- based counterparts for object detection and segmentation in COCO,1 and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code in modern libraries.

【論文摘要】批量歸一化（Batch Normalization，BN）是深度學習發展中的一項里程碑式技術，可以讓各種網路進行訓練。但是，批量維度進行歸一化會帶來一些問題——批量統計估算不準確導致批量變小時，BN的誤差會迅速增加。因此，BN在訓練大型網路或者將特徵轉移到計算機視覺任務（包括檢測、分割和視訊）的應用受到了限制，因為在這類問題中，記憶體消耗限制了只能使用小批量的BN。在這篇論文中，作者提出了群組歸一化（Group Normalization，GN）的方法作為 BN 的替代方法。GN首先將通道（channel）分為許多組（group），對每一組計算均值和方差，以進行歸一化。GN的計算與批大小（batch size）無關，並且它的精度在不同批大小的情況中都很穩定。在ImageNet上訓練的ResNet-50上，當批量大小為2時，GN的誤差比BN低10.6%。當使用經典的批量大小時，GN與BN相當，但優於其他歸一化變體。此外，GN 可以很自然地從預訓練階段遷移到微調階段。在COCO的目標檢測和分割任務以及Kinetics的視訊分類任務中，GN的效能優於或與BN變體相當，這表明GN可以在一系列不同任務中有效替代BN；在現代的深度學習庫中，GN通過若干行程式碼即可輕鬆實現。

《GANimation: Anatomically-aware Facial Animation from a Single Image》

【Abstract】Recent advances in Generative Adversarial Networks(GANs) have shown impressive results for task of facial expression synthesis. The most successful architecture is StarGAN [4], that conditions GANs’ generation process with images of a specific domain, namely a set of images of persons sharing the same expression. While effective, this approach can only generate a discrete number of expressions, determined by the content of the dataset. To address this limitation, in this paper, we introduce a novel GAN conditioning scheme based on Action Units (AU) annotations, which describes in a continuous manifold the anatomical facial movements defining a human expression. Our approach allows controlling the magnitude of activation of each AU and combine several of them. Additionally, we propose a fully unsupervised strategy to train the model, that only requires images annotated with their activated AUs, and exploit attention mechanisms that make our network robust to changing backgrounds and lighting conditions. Extensive evaluation show that our approach goes beyond competing conditional generators both in the capability to synthesize a much wider range of expressions ruled by anatomically feasible muscle movements, as in the capacity of dealing with images in the wild.

【論文摘要】生成式對抗網路（Generative Adversarial Networks, GANs）近期在面部表情合成任務中取得了驚人表現，其中最成功的架構是StarGAN，它把GANs的影象生成過程限定在了特定情形中，即一組不同的人做出同一個表情的影象。這種方法雖然有效，但只能生成若干離散的表情，具體生成哪一種取決於訓練資料內容。為了處理這種限制問題，本文提出了一種新的GAN條件限定方法，該方法基於動作單元（Action Units，AU）標註，而在連續的流形中，動作單元標註可以描述定義人類表情的解剖學面部動作。這種方法可以使我們控制每個AU的啟用程度，並將之組合。除此以外，本文還提出一種完全無監督的方法用來訓練模型，只需要標註了啟用的AU的影象，並通過應用注意力機制（attention mechanism）就可使網路對背景和光照條件的改變保持魯棒性。大量評估表明該方法比其他的條件生成方法有明顯更好的表現，不僅表現在有能力根據解剖學上可用的肌肉動作生成多樣的表情，而且也能更好地處理來自戶外的影象。

IJCAI-ECAI-2018

會議日期：7月13日~19日

會議地點：斯德哥爾摩，瑞典

國際人工智慧聯合會議（International Joint Conference on Artificial Intelligence, IJCAI）是人工智慧領域中最主要的學術會議之一，原為單數年召開，自2015年起改為每年召開。今年來華人在IJCAI的參與度不斷增加，尤其是南京大學的周志華教授將擔任 IJCAI-21 的程式主席，成為 IJCAI 史上第一位華人大會程式主席。

歐洲人工智慧會議（European Conference on Artificial Intelligence，ECAI）是在歐洲舉行的主要人工智慧和機器學習會議，始於1974年，由歐洲人工智慧協調委員會主辦。ECAI通常與IJCAI和AAAI並稱AI領域的三大頂會。

今年IJCAI和ECAI兩個會議將與7月13日~19日再瑞典首都斯德哥爾摩聯合舉辦。此外，今年IJCAI並未頒發最佳論文、最佳學生論文等獎項，而是一連放出了7篇傑出論文。來自北京大學、武漢大學、清華大學、北京理工大學的研究榜上有名。

傑出論文：

《SentiGAN: Generating Sentimental Texts via Mixture Adversarial Networks》

Ke Wang, Xiaojun Wan

【Abstract】Generating texts of different sentiment labels is get- ting more and more attention in the area of natural language generation. Recently, Generative Adversarial Net (GAN) has shown promising results in text generation. However, the texts generated by GAN usually suffer from the problems of poor quality, lack of diversity and mode collapse. In this paper, we propose a novel framework SentiGAN, which has multiple generators and one multi-class discriminator, to address the above problems. In our framework, multiple generators are trained simultaneously, aiming at generating texts of different sentiment labels without supervision. We pro- pose a penalty based objective in the generators to force each of them to generate diversified examples of a specific sentiment label. Moreover, the use of multiple generators and one multi-class discriminator can make each generator focus on generating its own examples of a specific sentiment label accurately. Experimental results on four datasets demonstrate that our model consistently outperforms several state-of-the-art text generation methods in the sentiment accuracy and quality of generated texts.

【論文摘要】在自然語言生成領域，不同情感文字的生成受到越來越廣泛的關注。近年來，生成對抗網（GAN）在文字生成中取得了成功的應用。然而，GAN 所產生的文字通常存在質量差、缺乏多樣性和模式崩潰的問題。在本文中，我們提出了一個新的框架——SentiGAN，包含多個生成器和一個多類別判別器，以解決上述問題。在我們的框架中，多個生成器同時訓練，旨在無監督環境下產生不同情感標籤的文字。我們提出了一個基於目標的懲罰函式，使每個生成器都能在特定情感標籤下生成具有多樣性的樣本。此外，使用多個生成器和一個多類判別器可以使每個生成器專注於準確地生成自己的特定情感標籤的例子。在四個資料集上的實驗結果表明，我們的模型在情感準確度和生成文字的質量方面始終優於幾種最先進的文字生成方法。

《Reasoning about Consensus when Opinions Diffuse through Majority Dynamics》

Vincenzo Auletta，Diodato Ferraioli，Gianluigi Greco

【Abstract】Opinion diffusion is studied on social graphs where agents hold binary opinions and where social pressure leads them to conform to the opinion manifested by the majority of their neighbors. Within this setting, questions related to whether a minority/majority can spread the opinion it supports to all the other agents are considered. It is shown that, no matter of the underlying graph, there is always a group formed by a half of the agents that can annihilate the opposite opinion. Instead, the influence power of minorities depends on certain features of the given graph, which are NP-hard to be identified. Deciding whether the two opinions can coexist in some stable configuration is NP-hard, too.

【論文摘要】在社會圖中，agent持有二元意見，並且社會壓力導致他們遵從大多數鄰居所表示的意見。在這種背景下，考慮有關少數/多數是否能夠將其支援的意見傳播到所有其他agent的問題。研究結果表明，無論底層圖如何，總是存在一個由半數agent組成的群體可以消除相反的意見。相反，少數群體的影響力取決於給定圖的某些特徵，這些特徵的識別是NP難問題。決定這兩種觀點是否可以在某種穩定的配置中共存也是NP難的。

《R-SVM+: Robust Learning with Privileged Information》

Xue Li , Bo Du , Chang Xu , Yipeng Zhang , Lefei Zhang , Dacheng Tao

【Abstract】In practice, the circumstance that training and test data are clean is not always satisfied. The performance of existing methods in the learning using privileged information (LUPI) paradigm may be seriously challenged, due to the lack of clear strategies to address potential noises in the data. This paper proposes a novel Robust SVM+ (R- SVM+) algorithm based on a rigorous theoretical analysis. Under the SVM+ framework in the LUPI paradigm, we study the lower bound of perturbations of both example feature data and privileged feature data, which will mislead the model to make wrong decisions. By maximizing the lower bound, tolerance of the learned model over perturbations will be increased. Accordingly, a novel regularization function is introduced to upgrade a variant form of SVM+. The objective function of R- SVM+ is transformed into a quadratic programming problem, which can be efficiently optimized using off-the-shelf solvers. Experiments on real- world datasets demonstrate the necessity of studying robust SVM+ and the effectiveness of the proposed algorithm.

【論文摘要】實際應用場景下，訓練資料和測試資料質量並不足夠乾淨。由於缺少解決資料中潛在噪聲的有效策略，現有方法的效果在特權資訊學習（learning using privileged information，LUPI）正規化中可能受到很大的挑戰。本文基於嚴格的理論分析，提出了一種新的魯棒SVM+（R-SVM+）演算法。我們在SVM+框架下的LUPI中研究了樣本標籤資料和特權標籤資料的擾動下界，這個擾動下界會誤導模型做出錯誤的決策。通過最大化下界，所學習的模型在擾動下的容忍度將會增大。因此，新的正則化函式被引入，用於升級SVM+的變體。將R-SVM+的目標函式轉化為二次規劃問題，利用現成的求解方法可以很容易進行優化求解。實證結果展現了R-SVM+的必要性和演算法的有效性。

《From Conjunctive Queries to Instance Queries in Ontology-Mediated Querying》

Cristina Feier, Carsten Lutz, Frank Wolter

【Abstract】We consider ontology-mediated queries (OMQs) based on expressive description logics of the ALC family and (unions) of conjunctive queries, studying the rewritability into OMQs based on instance queries (IQs). Our results include exact characterizations of when such a rewriting is possible and tight complexity bounds for deciding rewritability. We also give a tight complexity bound for the related problem of deciding whether a given MMSNP sentence is equivalent to a CSP.

【論文摘要】我們考慮基於ALC族和連線查詢的表達性描述邏輯的本體中介查詢（ontology-mediated queries，OMQs），研究基於例項查詢（instance queries，IQs）的OMQ的可重寫性。我們的結果包括這種重寫何時能精確表徵以及決定重寫性的嚴格複雜性界限。我們還給出了判定給定MMSNP語句是否等價於CSP的相關問題的嚴格複雜度界限。

《What Game are We Playing? End-to-end Learning in Normal and Extensive from Games》

Chun Kai Ling, Fei Fang, J. Zico Kolter

【Abstract】Although recent work in AI has made great progress in solving large, zero-sum, extensive-form games, the underlying assumption in most past work is that the parameters of the game itself are known to the agents. This paper deals with the relatively under-explored but equally important “in- verse” setting, where the parameters of the under- lying game are not known to all agents, but must be learned through observations. We propose a differentiable, end-to-end learning framework for ad- dressing this task. In particular, we consider a regularized version of the game, equivalent to a particular form of quantal response equilibrium, and develop 1) a primal-dual Newton method for finding such equilibrium points in both normal and extensive form games; and 2) a backpropagation method that lets us analytically compute gradients of all relevant game parameters through the solution itself. This ultimately lets us learn the game by training in an end-to-end fashion, effectively by integrating a “differentiable game solver” into the loop of larger deep network architectures. We demonstrate the effectiveness of the learning method in several set- tings including poker and security game tasks.

【論文摘要】雖然最近人工智慧的研究在求解大型、零和、擴充套件形式的博弈方面取得了很大進展，但過去大多數工作中的基本假設是博弈本身的引數是agent已知的。本文討論相對未被充分探索但同樣重要的“逆”設定，其中底層博弈的引數不是所有agent都知道的，必須通過觀察來學習。我們提出一個可微的、端到端的學習框架來處理這個任務。特別地，我們考慮博弈的正則化版本，等價於隨機最優反應均衡（quantal response equilibrium）的特定形式，並改進：1)在正規形式博弈和擴充套件形式博弈中尋找這種平衡點的原始-對偶牛頓（primal-dual Newton）方法；2)反向傳播方法，它使我們能夠通過解本身來計算所有相關博弈引數的梯度。這最終讓我們通過端到端的訓練來學習博弈，通過將“可微的博弈求解器”有效地整合到更大的深層網路體系結構的迴圈中。我們展示了該學習方法在多種設定中的有效性，包括撲克和安全博弈任務。

《Commonsense Knowledge Aware Conversation Generation with Graph Attention》

Hao Zhou, Tom Young, Minlie Huang, Haizhou Zhao, Jingfang Xu, Xiaoyan Zhu

【Abstract】Commonsense knowledge is vital to many natural language processing tasks. In this paper, we present a novel open-domain conversation generation model to demonstrate how large-scale commonsense knowledge can facilitate language under- standing and generation. Given a user post, the model retrieves relevant knowledge graphs from a knowledge base and then encodes the graphs with a static graph attention mechanism, which augments the semantic information of the post and thus sup- ports better understanding of the post. Then, during word generation, the model attentively reads the retrieved knowledge graphs and the knowledge triples within each graph to facilitate better generation through a dynamic graph attention mechanism. This is the first attempt that uses large-scale commonsense knowledge in conversation generation. Furthermore, unlike existing models that use knowledge triples (entities) separately and independently, our model treats each knowledge graph as a whole, which encodes more structured, connected semantic information in the graphs. Experiments show that the proposed model can generate more appropriate and informative responses than state- of-the-art baselines.

【論文摘要】常識知識對許多自然語言處理任務至關重要。本文提出了一種新的開放領域會話生成模型，以演示大規模常識知識如何促進語言理解和生成。給定使用者帖子，模型從知識庫中檢索相關知識圖，然後用靜態圖注意力機制對圖進行編碼，從而增強帖子的語義資訊，從而支援對帖子的更好理解。然後，在單詞生成過程中，該模型通過動態圖注意力機制仔細地讀取檢索到的知識圖和每個圖中的知識三元組，以便於更好地生成。這是第一次嘗試在對話生成中使用大規模常識知識。此外，與現有模型分別和獨立地使用知識三元組（實體）不同，我們的模型將每個知識圖作為一個整體來處理，從而在圖中編碼更結構化、連線的語義資訊。實驗表明，該模型能夠產生比現有基準更合適、資訊量更大的響應。

《A Degeneracy Framework for Graph Similarity》

Giannis Nikolentzos，Polykarpos Meladianos，Stratis Limnios，Michalis Vazirgiannis

【Abstract】The problem of accurately measuring the similarity between graphs is at the core of many applications in a variety of disciplines. Most existing methods for graph similarity focus either on local or on global properties of graphs. However, even if graphs seem very similar from a local or a global perspective, they may exhibit different structure at different scales. In this paper, we present a general framework for graph similarity which takes into account structure at multiple different scales. The proposed framework capitalizes on the well- known k-core decomposition of graphs in order to build a hierarchy of nested subgraphs. We apply the framework to derive variants of four graph kernels, namely graphlet kernel, shortest-path kernel, Weisfeiler-Lehman subtree kernel, and pyramid match graph kernel. The framework is not limited to graph kernels, but can be applied to any graph comparison algorithm. The proposed frame- work is evaluated on several benchmark datasets for graph classification. In most cases, the core- based kernels achieve significant improvements in terms of classification accuracy over the base kernels, while their time complexity remains very at- tractive.

【論文摘要】精確測量圖形之間的相似性是許多學科應用的核心問題。大多數現有的確定圖相似性的方法要麼關注圖的區域性性質，要麼關注圖的全域性性質。然而，即使從區域性或全域性的角度來看，圖形看起來非常相似，但它們可能在不同的尺度上表現出不同的結構。本文提出了一個通用的圖相似性框架，該框架考慮了多個不同尺度上的結構。該框架利用圖的k核（k-core）分解來構建巢狀子圖的層次結構。應用該框架匯出了四種圖核（graph kernels）的變體，即圖核、最短路徑核、Weisfeiler-Lehman子樹核和金字塔匹配圖核。該框架不僅限於圖核，而是可以應用於任何圖比較演算法。該框架在多個用於圖分類的基準資料集上進行了評估。在大多數情況下，基於核(core-based)的核心(kernel)在分類精度方面比基本核心(base kernel)有顯著的提高，而它們的時間複雜度仍然非常優秀。

ICML 2018

會議時間：7月10日~15日

會議地點：斯德哥爾摩，瑞典

國際機器學習大會（International Conference on Machine Learning，ICML），如今已發展為由國際機器學習學會（IMLS）主辦的年度機器學習國際頂級會議。

最佳論文Best Paper Awards

《Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples》

Anish Athalye，Nicholas Carlini，David Wagner

【Abstract】We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization- based attacks, we find defenses relying on this effect can be circumvented. We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it. In a case study, examining noncertified white-box-secure defenses at ICLR 2018, we find obfuscated gradients are a common occurrence, with 7 of 9 defenses relying on obfuscated gradients. Our new attacks successfully circumvent 6 completely, and 1 partially, in the original threat model each paper considers.

【論文摘要】我們發現混淆梯度（obfuscated gradient）——這種梯度掩蔽（gradient masking）現象會導致在防禦對抗樣本（adversarial examples）中有種虛假安全感。儘管基於混淆梯度的防禦看起來擊敗了基於優化的攻擊，但是我們發現依賴於此的防禦並非萬無一失。我們描述了表現出這種效果的防禦的特徵行為，並且對於我們發現的三種類型的混淆梯度中的每一種，我們都開發了攻擊技術來克服它。在一個案例研究中，在ICLR 2018上檢查未經認證的白盒安全防禦，我們發現混淆梯度是很常見的——9箇中的7個依賴於混淆梯度。在每篇論文所考慮的原始威脅模型中，我們的新攻擊成功完全繞過了6個，只有一個是部分繞過。

《Delayed Impact of Fair Machine Learning》

Lydia T. Liu, Sarah Dean, Esther Rolf, Max Simchowitz, Moritz Hardt

【Abstract】Fairness in machine learning has predominantly been studied in static classification settings without concern for how decisions change the underlying population over time. Conventional wisdom suggests that fairness criteria promote the long-term well-being of those groups they aim to protect.

We study how static fairness criteria interact with temporal indicators of well-being, such as long-term improvement, stagnation, and decline in a variable of interest. We demonstrate that even in a one-step feedback model, common fairness criteria in general do not promote improvement over time, and may in fact cause harm in cases where an unconstrained objective would not. We completely characterize the delayed impact of three standard criteria, contrasting the regimes in which these exhibit qualitatively different behavior. In addition, we find that a natural form of measurement error broadens the regime in which fairness criteria perform favorably.

Our results highlight the importance of measurement and temporal modeling in the evaluation of fairness criteria, suggesting a range of new challenges and trade-offs.

【論文摘要】機器學習的公平性主要在靜態分類設定中進行研究，而不關心決策如何隨著時間的推移改變潛在的群體。傳統觀點認為，公平標準可以促進他們旨在保護的群體的長期利益。

我們研究靜態公平標準如何與暫時的利益指標相互作用，例如利益變數的長期提升、停滯和下降。我們證明了即使在一步反饋模型中，常見的公平標準通常也不會隨著時間的推移而帶來改善，並且實際上可能在無約束的目標不會導致損害的情況下造成傷害。我們全面的總結了三個標準準則的延遲影響，對比了這些標準表現出質量上的不同的行為。此外，我們發現自然形式的測量誤差放寬了公平標準，從而有利地發揮作用的制度。

我們的結果強調了度量和時序建模在評估公平準則中的重要性，提出了一系列新的挑戰和權衡取捨。

最佳論文亞軍Best Paper Runner Up Awards

《Near Optimal Frequent Directions for Sketching Dense and Sparse Matrices》

Zengfeng Huang

【Abstract】Given a large matrix A ∈ Rn×d, we consider the problem of computing a sketch matrix B ∈ Rl×d which is significantly smaller than but still well approximates A. We are interested in minimizing the covariance error ∥AT A − BT B∥2. We consider the problems in the streaming model, where the algorithm can only make one pass over the input with limited working space. The popular Frequent Directions algorithm of (Liberty, 2013) and its variants achieve optimal space-error tradeoff. However, whether the running time can be improved remains an unanswered question. In this paper, we almost settle the time complexity of this problem. In particular, we provide new space-optimal algorithms with faster running times. Moreover, we also show that the running times of our algorithms are near-optimal unless the state-of-the-art running time of matrix multiplication can be improved significantly.

【論文摘要】給定一個維的大型矩陣A，我們考慮計算l x d維的草圖矩陣（sketch matrix），這個矩陣的維度要顯著小於原矩陣A，但它仍可以很好的近似A。我們希望最小化協方誤差∥AT A − BT B∥2。我們再考慮流模型（streaming model）中的問題，在這個模型裡，演算法只能在有限的工作空間內傳輸輸入一次。流行的 Frequent Directions 演算法（Liberty, 2013）與它的變體實現了最優空間和誤差間的權衡，然而，執行時間能否縮減還是一個未解決問題。在本論文中，我們幾乎解決了這個問題的時間複雜度。特別是，我們提供了有更快執行時間的新型空間-最優（space-optimal）演算法。此外，除非矩陣乘法的當前最優執行時間能顯著提升，否則我們演算法的執行時間是近似最優的（near-optimal）。

《The Mechanics of n-Player Differentiable Games》

David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, Thore Graepel

【Abstract】The cornerstone underpinning deep learning is the guarantee that gradient descent on an objective converges to local minima. Unfortunately, this guarantee fails in settings, such as generative adversarial nets, where there are multiple interacting losses. The behavior of gradient-based methods in games is not well understood – and is becoming increasingly important as adversarial and multi- objective architectures proliferate. In this paper, we develop new techniques to understand and control the dynamics in general games. The key result is to decompose the second-order dynamics into two components. The first is related to potential games, which reduce to gradient descent on an implicit function; the second relates to Hamiltonian games, a new class of games that obey a conservation law, akin to conservation laws in classical mechanical systems. The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in general games. Basic experiments show SGA is competitive with recently proposed algorithms for finding stable fixed points in GANs – whilst at the same time being applicable to – and having guarantees in – much more general games.

【論文摘要】深度學習的基石是保證目標函式能利用梯度下降收斂到區域性極小值。不幸的是，這個保證在某些情況下會失效，例如在生成對抗網路中有多個互動損失。在博弈中，基於梯度的方法的行為並沒有得到很好的理解，隨著對抗性和多目標架構的數量激增，這個問題變得越來越重要。在這篇論文中，我們開發了新的技術來理解和控制一般博弈中的動態。主要的結果是將二階動態分解為兩個部分。第一個和潛在博弈（potential game）相關，可以用內含的函式簡化為梯度下降；第二個和漢密爾頓博弈（Hamiltonian game）相關，這是一種新的博弈型別，遵循一種守恆定律——類似於經典力學系統中的守恆定律。該分解啟發了辛梯度調整（Symplectic Gradient Adjustment，SGA），這是一種用於尋找一般博弈中的穩定不動點的新演算法。基礎實驗表明 SGA 的效能和近期提出的尋找 GAN 穩定不動點的演算法不相上下，同時可以應用到更多的一般博弈中，並保證收斂性。

《Fairness Without Demographics in Repeated Loss Minimization》

Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, Percy Liang

【Abstract】Machine learning models (e.g., speech recognizers) are usually trained to minimize average loss, which results in representation disparity— minority groups (e.g., non-native speakers) con- tribute less to the training objective and thus tend to suffer higher loss. Worse, as model accuracy affects user retention, a minority group can shrink over time. In this paper, we first show that the status quo of empirical risk minimization (ERM) amplifies representation disparity over time, which can even make initially fair models unfair. To mitigate this, we develop an approach based on distributionally robust optimization (DRO), which minimizes the worst case risk over all distributions close to the empirical distribution. We prove that this approach controls the risk of the minority group at each time step, in the spirit of Rawlsian distributive justice, while remaining oblivious to the identity of the groups. We demonstrate that DRO prevents disparity amplification on examples where ERM fails, and show improvements in minority group user satisfaction in a real-world text autocomplete task.

【論文摘要】機器學習模型（如語音識別器）通常被訓練以最小化平均損失，這導致了表徵差異（representation disparity）問題——少數群體（如非母語說話者）對訓練目標函式的貢獻較少，並因此帶來了更高的損失。更糟糕的是，由於模型準確率會影響使用者留存，因此少數群體的數量會隨著時間而日益減少。本論文首先展示了經驗風險最小化（empirical risk minimization，ERM）的現狀放大了表徵差異，這甚至使得最初公平的模型也變得不公平了。為了減小這一問題，我們提出了一種基於分散式魯棒優化（distributionally robust optimization，DRO）的方法，可以最小化所有分佈上的最大風險，使其接近經驗分佈。我們證明了該方法可以控制每個時間步的少數群體風險，使其符合羅爾斯分配正義（rawlsian distributive justice），不過並不清楚該方法對群體的標識如何。我們證明DRO可以阻止樣本的表徵差異擴大，而這是ERM做不到的，我們還在現實世界的文字自動完成任務上證明了該方法對少數群體使用者滿意度有所改進。

NIPS 2018

會議時間：12月3日~8日

會議地點：蒙特利爾，加拿大

神經資訊處理系統大會(Conference and Workshop on Neural Information Processing Systems，NIPS)，是一個關於機器學習和計算神經科學的國際會議。該會議固定在每年的12月舉行,由NIPS基金會主辦。NIPS是機器學習領域的頂級會議。在中國計算機學會的國際學術會議排名中，NIPS為人工智慧領域的A類會議

最佳論文

《Neural Ordinary Differential Equations》

Tian Qi Chen， Yulia Rubanova， Jesse Bettencourt， David Duvenaud

【Abstract】We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black- box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.

【論文摘要】本文提出了一種新的深度神經網路模型。我們使用神經網路來引數化隱藏狀態的導數，而不是指定一個離散的隱藏層序列。利用黑盒微分方程求解器計算網路的輸出。這些連續深度模型具有固定的儲存成本，可以根據每個輸入調整其評估策略，並且可以顯式地通過改變數值精度換取速度。我們在連續深度殘差網路和連續時間潛在變數模型中證明了這些性質。我們還構建了連續標準化流（continuous normalizing flows），這是一個可以通過極大似然進行訓練、而無需對資料維度進行分割槽或排序的生成模型。對於訓練過程，我們展示瞭如何在不訪問任何ODE求解器內部操作的情況下，可擴充套件地反向傳播。這允許在更大的模型中對ODE進行端到端訓練。

《Non-delusional Q-learning and Value-iteration》

Tyler Lu， Dale Schuurmans， Craig Boutilier

【Abstract】We identify a fundamental source of error in Q-learning and other forms of dynamic programming with function approximation. Delusional bias arises when the approximation architecture limits the class of expressible greedy policies. Since standard Q-updates make globally uncoordinated action choices with respect to the expressible policy class, inconsistent or even conflicting Q-value estimates can result, leading to pathological behaviour such as over/under-estimation, instability and even divergence. To solve this problem, we introduce a new notion of policy consistency and define a local backup process that ensures global consistency through the use of information sets—sets that record constraints on policies consistent with backed-up Q-values. We prove that both the model-based and model-free algorithms using this backup remove delusional bias, yielding the first known algorithms that guarantee optimal results under general conditions. These algorithms furthermore only require poly nomially many information sets (from a potentially exponential support). Finally, we suggest other practical heuristics for value-iteration and Q-learning that attempt to reduce delusional bias.

【論文摘要】我們確定了Q-learning和其它形式的動態規劃中的一個基本的誤差來源。當近似體系結構限制了可表達的貪婪策略類時，就會產生妄想偏差（delusional bias）。由於標準Q-updates對可表達的策略類做出了全域性不協調的動作選擇，可能導致不一致甚至衝突的Q值估計，從而導致錯誤行為，如過高/過低估計、不穩定甚至分歧。為了解決這個問題，我們引入了新的策略一致性概念，並定義了一個本地備份流程，該流程通過使用資訊集來確保全域性一致性，這些資訊集記錄了與備份後的Q值一致的策略約束。我們證明使用此備份的基於模型和無模型的演算法都可消除妄想偏差，從而產生第一種已知演算法，可在一般條件下保證最佳結果。此外，這些演算法僅需要多項式的一些資訊集即可。最後，我們建議嘗試其它實用啟發式方法，以減少妄想偏差的Value-iteration和 Q-learning。

《Optimal Algorithms for Non-Smooth Distributed Optimization in Networks》

Kevin Scaman， Francis Bach， Sebastien Bubeck， Laurent Massoulié， Yin Tat Lee

【Abstract】In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in O(1/√t), the structure of the communication network only impacts a second-order term in O(1/t), where t is time. In other words, the error due to lim- its in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a d1/4 multiplicative factor of the optimal convergence rate, where d is the underlying dimension.

【論文摘要】我們利用計算單元網路，研究了非光滑凸函式的分佈優化問題。我們在兩個正則性假設下研究這個問題：(1)全域性目標函式的Lipschitz連續性，(2)區域性單個函式的Lipschitz連續性。在區域性正則性假設下，我們提出第一個最優一階分散演算法，即多步原始對偶演算法(multimulti-step primal-dual, MSPD)，並給出了相應的最優收斂速度。值得注意是，對於非光滑函式，雖然誤差的主導項在中，但是通訊網路的結構隻影響的二階項，其中t為時間。也就是說，即使在非強凸目標函式的情況下，由於通訊資源的限制而產生的誤差也會快速減小。在全域性正則性假設下，我們提出了一種基於目標函式區域性平滑的簡單而有效的分散式隨機平滑演算法(distributed smooth, DRS)，並證明了DRS是在最優收斂率的乘因子範圍內，其中d為底層維數。

《Nearly Tight Sample Complexity Bounds for Learning Mixtures of Gaussians via Sample Compression Schemes》

Hassan Ashtiani， Shai Ben-David， Nick Harvey， Christopher Liaw， Abbas Mehrabian， Yaniv Plan

【Abstract】We prove that \(\widetilde{\theta } （kd^{2}/\varepsilon ^{2})\) samples are necessary and sufficient for learning a mixture of k Gaussians in \(R^{d}\), up to error ε in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that \(\widetilde{\theta } （kd/\varepsilon ^{2})\) samples suffice, matching a known lower bound.

The upper bound is based on a novel technique for distribution learning based on a notion of sample compression. Any class of distributions that allows such a sample compression scheme can also be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. The core of our main result is showing that the class of Gaussians in \(R^{d}\) has an efficient sample compression.

【論文摘要】我們證明了\(\widetilde{\theta } （kd^{2}/\varepsilon ^{2})\)樣本對於學習\(R^{d}\)中的k階高斯混合是充分必要的，直到整體偏差距離為誤差ε。這改善了該問題已知的上限和下限。對於軸對齊高斯分佈（axis-aligned Gaussians）的混合，我們證明\(\widetilde{\theta } （kd/\varepsilon ^{2})\)樣本是足夠的，這與已知的下界相匹配。上界是基於一種新的方法，即基於樣本壓縮(sample compression)概念的分散式學習。任何一類允許這種樣本壓縮方案的分佈也可以通過很少的樣本來學習。我們的主要結果是證明了\(R^{d}\)中的高斯類具有有效的樣本壓縮。

AAAI 2018

會議時間：2月2日~7日

會議地點：新奧爾良市，美國

美國人工智慧協會（American Association for Artificial Intelligence）美國人工智慧協會是人工智慧領域的主要學術組織之一。該協會主辦的年會（AAAI, The National Conference on Artificial Intelligence）是一個人工智慧領域的主要學術會議。

今年的AAAI本屆共收到了3808篇論文投稿，其中錄用了938篇，較去年的投稿量增加了47%。

最佳論文

《Memory-Augmented Monte Carlo Tree Search》

Chenjun Xiao, Jincheng Mei and Martin Muller

【Abstract】This paper proposes and evaluates Memory-Augmented Monte Carlo Tree Search (M-MCTS), which provides a new approach to exploit generalization in online real- time search. The key idea of M-MCTS is to incorporate MCTS with a memory structure, where each entry contains information of a particular state. This memory is used to generate an approximate value estimation by combining the estimations of similar states. We show that the memory based value approximation is better than the vanilla Monte Carlo estimation with high probability under mild conditions. We evaluate M-MCTS in the game of Go. Experimental results show that M- MCTS outperforms the original MCTS with the same number of simulations.

【論文摘要】本文提出並評價了記憶增強蒙特卡羅樹搜尋（Memory-Augmented Monte Carlo Tree Search，M-MCTS），為線上實時搜尋提供了一種新的一般化方法。M-MCTS的關鍵思想是將MCTS與儲存器結構合併，其中每個條目包含特定狀態的資訊。該儲存器用於通過組合相似狀態的估計來生成近似值估計。結果表明，在溫和的條件下，基於記憶的值逼近方法優於具有高概率的普通蒙特卡羅方法。我們在圍棋遊戲中評估M-MCTS，結果表明，在相同的模擬次數下，MMCTS效能優於原MCTS。

最佳學生論文

《Counterfactual Multi-Agent Policy Gradients》

Jakob N. Foerster , Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson

【Abstract】Many real-world problems, such as network packet routing and the coordination of autonomous vehicles, are naturally modelled as cooperative multi-agent systems. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents’ policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent’s action, while keeping the other agents’ actions fixed. COMA also uses a critic representation that al- lows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor- critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.

【論文摘要】許多現實世界的問題，例如網路分組路由和自動駕駛車輛的協調，都很自然地被建模為多智慧體協作系統。這類問題非常需要一種新的強化學習方法，可以有效地學習這種系統的分散策略。為此，我們提出一種新的多智慧體 actor-critic方法，稱為反事實多智慧體（counterfactual multi-agent，COMA）策略梯度。COMA使用一箇中心化的critic來估計Q函式，以及一個去中心化的actors來優化智慧體的策略。此外，為了解決多智慧體信度分配的問題，COMA使用一個反事實基線（counterfactual baseline），將單個智慧體的行為邊緣化，同時保持其他智慧體的行為固定不變。COMA還使用critic表示允許在單個前向傳播中有效地計算反事實基線。我們在星際爭霸單位微操的測試平臺上評估COMA，使用具有顯著區域性可觀察性的去中心化變體。在這種條件下，COMA相比其他多智慧體actor-critic 方法的平均效能顯著要高，而且效能最好的智慧體可以與當前最優的中心化控制器相媲美，並能獲得全部狀態的資訊訪問。

ACL 2018

會議時間：7月15日~20日

會議地點：墨爾本，澳大利亞

ACL大會（Annual Meeting of the Association for Computational Linguistics）是計算語言學學會一年一度的年會，也是該領域最重要的學術會議。計算語言學學會始於1962年，原名為機器翻譯與計算語言學學會（Association for Machine Translation and Computational Linguistics, AMTCL），於1968年更名為ACL。每年夏季，來自世界各地的相關領域研究人員齊聚一堂，共同交流自然語言處理

2018頂會論文彙編

2018頂會論文彙編

區塊鏈安全問題歸類及頂會論文...

近三年臉部識別頂會論文lists(更新中...)

論文筆記：2018 PRCV 頂會頂刊牆展

論文 | 自然語言處理頂會ACL 2018該關注什麼？螞蟻金服專家告訴你！

阿裏雲參加ONS EU 2018，飛天洛神亮相網絡頂會

阿里雲參加ONS EU 2018，飛天洛神亮相網路頂會

【收藏】2018年不容錯過的20大人工智慧/機器學習/計算機視覺等頂會時間表

2018機器人國際學術頂會IROS一些見聞和感受

自然語言處理頂會 ACL 2018 參會見聞

參觀筆記：2018 PRCV 頂會頂刊牆展

識別和追蹤主題層次的影響力者(來自2018 Machine Learning 論文學習筆記)

直擊系統領域頂會OSDI'18現場，探祕阿里集團基礎設施團隊

【NIPS 2018】完整論文下載連結

【NIPS 2018】完整論文下載連結-續

2019 AI頂會時間表

2018-12-8 論文翻譯+hdoj+git+python

深度學習: CV頂會 & CV頂刊

會議介紹：計算機視覺－－頂會介紹

下一代高效能資料庫標杆POLARDB 亮相頂會VLDB2017

2018頂會論文彙編

相關推薦