1. 程式人生 > >CVPR 2018摘要:第四部分

CVPR 2018摘要:第四部分

標題 State of the Art in Domain Adaptation (CVPR in Review IV) CVPR 2018摘要:第四部分 by 啦啦啦 2 01

State of the Art in Domain Adaptation (CVPR in Review IV)

We have already had three installments about the CVPR 2018 (Computer Vision and Pattern Recognition) conference: the 

first part was devoted to GANs for computer vision, the second part dealt with papers about recognizing human beings (pose estimation and tracking), and the third parttackled synthetic data. Today we dive deeper into the details of one field of deep learning that has been on the rise lately: domain adaptation. For this NeuroNugget, I’m happy to present to you my co-author Anastasia Gaydashenko, who has already left Neuromation and went on to join Cisco…but her texts live on, and this is one of them.

What is Domain Adaptation?

There are a couple of specific directions in research that are trending lately (including CVPR 2018), and one of them is domain adaptation. As this field is closely related to synthetic data, it is of great interest for us here at Neuromation, but the topic is also increasingly popular and important in and by itself.

Let’s start at the beginning. We have already discussed the most common tasks that constitute the basis for modern computer vision: image classificationobject and pose detection, instance and semantic segmentationobject tracking, and so on. These problems are solved quite successfully due to deep convolutional neural architectures and large amounts of labeled data.

But, as we discussed in the last installment, a big challenge always remains: for supervised learning, you always need to find or create labeled datasets. Almost any paper you read about some fancy state of the art model will mention some problems with the dataset, unless they use one of the few standard “vanilla” datasets that everybody usually compares on. Thus, collecting labeled data has become as important as designing the networks themselves. These datasets should be reliable and diverse enough so researchers would be able to use them to develop and evaluate novel architectures.

領域適應的最新進展(IV回顧中的CVPR)

我們已經分三期關於CVPR 2018(計算機視覺和模式識別)會議:第一部分專門討論計算機視覺的GAN,第二部分涉及關於識別人類(姿勢估計和跟蹤)的論文,第三部分涉及合成數據。 今天,我們深入探討最近一直在興起的深度學習領域的細節:領域適應。 對於這個NeuroNugget,我很高興為您呈現我的共同作者Anastasia Gaydashenko,他已離開Neuromation並繼續加入思科...但她的研究繼續存在,這就是其中之一。

什麼是域適應?

最近研究中有幾個具體趨勢(包括CVPR 2018),其中一個是領域適應。 由於這個領域與合成數據密切相關,因此我們在Neuromation對我們非常感興趣,但這個主題在本身也越來越受歡迎和重要。

讓我們從頭開始。 我們已經討論了構成現代計算機視覺基礎的最常見任務:影象分類物件姿勢檢測,例項和語義分割物件跟蹤等。 由於深度卷積神經架構和大量標記資料,這些問題得到了相當成功的解決。

但是,正如我們在上一部分中所討論的那樣,總是存在一個巨大的挑戰:對於監督學習,你總是需要找到或建立標記資料集。 幾乎所有關於某些奇特的現有技術模型的論文都會提到資料集的一些問題,除非他們使用每個人通常比較的少數標準“

vanilla  ”資料集之一。 因此,收集標記資料與設計網路本身一樣重要。 這些資料集應該足夠可靠和多樣化,以便研究人員能夠使用它們來開發和評估新穎的架構。


by 老趙 2 02

We have already talked many times about how manual data collection is both expensive and time-consuming, often exceedingly so. Sometimes it is even flat out impossible to label the data manually (for example, how do you label for depth estimation, the problem of evaluating the distances from points on the image to the camera?). Of course, many standard problems already have large labeled datasets that are freely or easily available. But first, this readily labeled data can (and does) bias research towards the specific field where it is available, and second, your own problem will never be exactly the same, and standard datasets will often simply not fit your demands: they will contain different classes, will be biased in different ways, and so on.

The main problem with using existing datasets, or even synthetic data generators that were not done specifically for your particular problem, is that when the data is generated and already labeled we are still facing the problem of domain transfer: how do we use one kind of data to prepare the networks to cope with different kinds? This problem also looms large for the entire field of synthetic data: however realistic you make your data, it still cannot be completely indistinguishable from real world photographs. The major underlying challenge here is known as domain shift: basically, the distribution of data in the target domain (say, real images) is different than in the source domain (say, synthetic images). Devising models that can cope with this shift is exactly the problem called domain adaptation.

Let us see how people are handling this problem now, considering a few papers from CVPR 2018 in slightly more details than we used to in previous “CVPR in Review” installments.

Unsupervised Domain Adaptation with Similarity Learning

This work by Pedro Pinheiro (see pdf here) comes from ElementAI, a Montreal company co-founded in 2016 by none other than Yoshua Bengio. It deals with an approach to domain adaptation based on adversarial networks, the kind we touched upon a little bit before (see also this post, the second part for which is coming really soon… it is, it is, I promise!).

The simplest adversarial approach to unsupervised domain adaptation is a network that tries to extract features that remain the same across the domains. To achieve this, the network tries to make them indistinguishable for a separate part of the network, a discriminator (“disc” in the figure below). But at the same time, these features should be representative for the source domain so the network will be able to classify objects:



我們已經多次談到手動資料收集既昂貴又耗時,往往非常耗費精力。 有時甚至不可能手動標記資料(例如,如何標記深度估計,評估影象上的點到相機的距離的問題?)。 當然,許多標準問題已經具有可自由或容易獲得的大型標記資料集。 但首先,這些易於標記的資料可以(並且確實)將研究偏向於可用的特定領域,其次,你自己的問題永遠不會完全相同,標準資料集通常根本不符合您的要求:它們將包含不同的類別,會有不同的偏置,等等。

使用現有資料集,甚至是沒有專門針對你的特定問題的合成數據生成器的主要問題是,當生成資料並且已經標記時,我們仍然面臨域轉移的問題:我們如何使用一種資料準備網路應對不同種類? 對於整個合成數據領域來說,這個問題也很突出:無論你製作資料是否真實,它仍然無法與現實世界的照片完全區分開來。 這裡的主要潛在挑戰稱為域移位:基本上,目標域中的資料分佈(例如,真實影象)與源域中的資料分佈(例如,合成影象)不同。 設計能夠應對這種轉變的模型正是稱為域適應的問題。

讓我們看看人們現在如何處理這個問題,考慮一下CVPR 2018中的一些論文,比之前的“CVPR in Review”分期付款稍微詳細一些。

具有相似性學習的無監督域適應

Pedro Pinheiro的這項工作(見pdf)來自ElementAI,這是一家蒙特利爾公司,於2016年由Yoshua Bengio共同創立。 它涉及一種基於對抗性網路的域適應方法,我們之前提到的那種方式(參見本文,第二部分即將推出)。

對無監督域自適應的最簡單的對抗方法是嘗試提取跨域保持相同的特徵的網路。 為了實現這一點,網路試圖使它們與網路的單獨部分(鑑別器(下圖中的“光碟”)無法區分。 但與此同時,這些功能應該代表源域,以便網路能夠對物件進行分類:




by 老趙 2 03

In this way, the network has to extract features that would achieve two objectives at once: (1) be informative enough that the “class” network (usually very simple) can classify, and (2) be independent of the domain so that the “disc” network (usually as complex as the feature extractor itself, or more) cannot really distinguish. Note that we don’t have to have any labels for the target domain, only for the source domain, where it is usually much easier (again, think synthetic data for the source domain).

In Pinheiro’s paper, this approach is improved by replacing the classifier part with a similarity-based one. The discriminative part remains the same, and the classification part now compares the embedding of an image with a set of prototypes; all these representations are learned jointly and in an end-to-end fashion:



Basically, we are asking one network, g, to extract features from a labeled source domain and another network, f, to extract features from an unlabeled target domain, with a similar but different data distribution. The difference is that now f and g are different (we had the same f in the picture above), and the classification is now different: instead of training a classifier, we train the model to discriminate the target prototype from all other prototypes. And to label the image from the target domain, we compare the embedding of an image with embeddings of prototype images from the source domain, assigning the label of its nearest neighbors:



The paper shows that the proposed similarity-based classification approach is more robust to the domain shift between the two datasets.

通過這種方式,網路必須提取能夠同時實現兩個目標的特徵:(1)足夠的資訊,“類”網路(通常非常簡單)可以分類,(2)獨立於域,以便 “光碟”網路(通常與特徵提取器本身一樣複雜,或更多)無法真正區分。 請注意,我們不必為目標域提供任何標籤,僅針對源域,通常更容易(再次考慮源域的合成數據)。

在Pinheiro的論文中,通過用基於相似性的部分替換分類器部分來改進這種方法。 判別部分保持不變,分類部分現在比較影象與一組原型的嵌入; 所有這些表述都是以端到端的方式共同學習的:




基本上,我們要求一個網路g從標記的源域和另一個網路f中提取特徵,以從未標記的目標域中提取具有相似但不同的資料分佈的特徵。 不同之處在於現在f和g是不同的(我們在上圖中有相同的f),並且分類現在是不同的:我們訓練模型以區分目標原型和所有其他原型,而不是訓練分類器。 為了標記來自目標域的影象,我們將影象的嵌入與來自源域的原型影象的嵌入進行比較,分配其最近鄰的標籤:




本文表明,所提出的基於相似性的分類方法對於兩個資料集之間的域移位更加穩健。


by 老趙 2 04

Image to Image Translation for Domain Adaptation

In this work by Murez et al. (full pdf), coming from UCSD and HRL Laboratories, the main idea is actually rather simple, but the implementation is novel and interesting. The work deals with a more complex task than classification, namely image segmentation (see, e.g., our previous post), which is widely used in autonomous driving, medical imaging, and many other domains. So what is this “image translation” thing they are talking about?

Let us begin with regular translation. Imagine that we have two large text corpora in different languages, say English and French, and we don’t know which phrases correspond to which. They may be even slightly different and may lack the corresponding translations in the other language corpus. Just like the pictures from synthetic and real domains. Now, to get a machine translation model we translate a phrase from English to French and will try to distinguish the embedding of the resulting phrase from embeddings of phrases from the original French corpus. And then the way to check that we haven’t lost much is to try to translate this phrase back to English; now, even if the original corpora were completely unaligned, we know what we’re looking for: the answer is just the original sentence!

Now let us look at the image to image translation which is, actually, pretty similar. Basically, domain adaptation techniques aim to address the domain shift problem by finding a mapping from the source data distribution to the target distribution. Alternatively, both domains X and Y could be mapped into a shared domain Z where the distributions are aligned; this is the approach used in this paper. This embedding must be domain-agnostic (independent of the domain), hence we want to maximize the similarity between the distributions of embedded source and target images.



For example, suppose that X is the domain of driving scenes on a sunny day and Y is the domain of driving scenes on a rainy day. While “sunny” and “rainy” are characteristics of the source and target domains, they are in fact variations that mean next to nothing for the annotation task (e.g., semantic segmentation of the road), and they should not affect the annotations. Treating such characteristics as structured noise, we would like to find a latent space Z that would be invariant to such variations. In other words, domain Z should not contain domain-specific characteristics, that is, be domain-agnostic.

In this case, we also want to restore annotations for an image from the target domain. Therefore, we also need to add a mapping from the shared embedding space to the labels. It may be image-level labels such as classes in a classification problem or pixel-level labels such as semantic segmentation:



域適應的影象到影象翻譯

在Murez等人的這項工作中(完整的pdf)。來自加州大學聖地亞哥分校和HRL實驗室,主要的想法實際上相當簡單,但實施是新穎和有趣的。 該工作涉及比分類更復雜的任務,即影象分割(參見我們之前的帖子),其廣泛用於自動駕駛,醫學成像和許多其他領域。 那麼他們所談論的這種“形象翻譯”是什麼?

讓我們從常規翻譯開始。 想象一下,我們有兩個不同語言的大型文字語料庫,比如英語和法語,我們不知道哪些短語對應哪個。 它們甚至可能略有不同,可能缺少其他語言語料庫中的相應翻譯。 就像來自合成域和真實域的圖片一樣。 現在,為了得到一個機器翻譯模型,我們將一個短語從英語翻譯成法語,並試圖將所得短語的嵌入與原始法語語料庫中的短語嵌入區分開來。 然後檢查我們沒有失去太多的方法是嘗試將這個短語翻譯成英語; 現在,即使原始語料庫完全不對齊,我們也知道我們在尋找什麼:答案就是原始句子。

現在讓我們看看影象到影象的轉換,實際上,它非常相似。 基本上,域自適應技術旨在通過找到從源資料分佈到目標分佈的對映來解決域移位問題。 或者,域X和Y都可以對映到共享域Z,其中分佈是對齊的; 這是本文中使用的方法。 這種嵌入必須是域不可知的(獨立於域),因此我們希望最大化嵌入源和目標影象的分佈之間的相似性。




例如,假設X是晴天駕駛場景的領域,Y是下雨天駕駛場景的領域。 雖然“晴天”和“下雨”是源域和目標域的特徵,但實際上它們對於註釋任務(例如,道路的語義分段)幾乎沒有任何意義,並且它們不應該影響註釋。 在處理諸如結構化噪聲之類的特徵時,我們希望找到對這種變化不變的潛在空間Z. 換句話說,域Z不應包含特定於域的特徵,即與域無關。

在這種情況下,我們還希望從目標域恢復影象的註釋。 因此,我們還需要新增從共享嵌入空間到標籤的對映。 它可能是影象級標籤,如分類問題中的類或畫素級標籤,如語義分段:




by 老趙 2 05

Basically, that’s the whole idea! Now, to obtain the annotation for an image from the target domain we just need to get its embedding in the shared space Z and restore its annotation from C. This is the basic idea of the approach, but it can be further improved by the ideas proposed in this paper.

Specifically, there are three main tools needed to achieve successful unsupervised domain adaptation:

  • domain-agnostic feature extraction, which means that distributions of features extracted from both domains should be indistinguishable as judged by an adversarial discriminator network,

  • domain-specific reconstruction, which means that we should be able to decode embeddings back to the source and target domains, that is, we should be able to learn functions gX and gY like shown here:



  • cycle consistency to ensure that the mappings are learned correctly, that is, we should be able to get back where we started in cycles like this:



The whole point of the framework proposed in this work is to ensure these properties with loss functions and adversarial constructions. We will not go into the gritty details of the architectures since they may change for other domains and problems.

But let’s have a look at the results! At the end of the post, we will make a detailed comparison between three papers on domain adaptation, but now let’s just have a look at a single example. The paper used two datasets: a synthetic dataset from Grand Theft Auto 5 and a real-world Cityscapes dataset with pictures of cities. Here are two sample pictures:



And here are the segmentation results for the real-world image (B above):



On this picture, E is the ground truth segmentation, C is the result produced without domain adaptation, simply by training on the synthetic GTA5 dataset, and D is the result with domain adaptation. It does look better, and the numbers (intersection-over-union metric) do bear this out.

基本上,這就是整個想法。 現在,要從目標域獲取影象的註釋,我們只需要將其嵌入到共享空間Z中並從C恢復其註釋。這是該方法的基本思想,但可以通過這些思想進一步改進 本文提出。

具體而言,實現成功的無監督域適應需要三個主要工具:

       域無關特徵提取,這意味著從對抗性鑑別器網路判斷,從兩個域提取的特徵的分佈應該是難以區分的,

       特定域的重建,這意味著我們應該能夠將嵌入解碼回源域和目標域,也就是說,我們應該能夠學習如下所示的函式gX和gY:




       迴圈一致性,以確保正確學習對映,也就是說,我們應該能夠回到我們開始的迴圈,如下所示:




在這項工作中提出的框架的重點是確保這些屬性具有損失函式和對抗結構。 我們不會深入研究架構的細節,因為它們可能會針對其他領域和問題進行更改。

但是讓我們來看看結果。在帖子的最後,我們將對三篇關於領域適應的論文進行詳細比較,但現在讓我們看一個例子。 本文使用了兩個資料集:來自俠盜獵車手5的合成數據集和帶有城市圖片的真實世界城市景觀資料集。 這是兩張示例圖片:




以下是真實世界影象的分割結果(上圖B):




在這張圖片中,E是地面真實分割,C是沒有域適應的結果,只需通過訓練合成GTA5資料集,D是域適應的結果。 它確實看起來更好,並且數字(交叉聯合度量)確實證實了這一點。


by 老趙 2 06

Conditional Generative Adversarial Network for Structured Domain Adaptation

This paper by Hong et al. (full pdf) proposes another modification of a standard discriminator-segmentator architecture. From the first look at the architecture, we may not even notice any difference:



But actually this architecture does something very interesting: it integrates a GAN into a fully convolutional network (FCN). We have discussed FCNs in a previous NeuroNugget post; it s the network architecture used for the segmentation problem that returns labels for each pixel in the picture by feeding the features through deconvolution layers.

In this model, a GAN is used to mitigate the gap between source and target domains. For example, the previous paper aligns two domains via an intermediate feature space and thereby implicitly assumes the same decision function for both domains. This approach relaxes this assumption: here we learn the residual between feature maps from both domains because the generator learns to produce features like the ones from a real image in order to fool the discriminator; afterwards,FCN parameters are updated to accommodate the changes GAN has made.

Again, we will show a numerical comparison of the result below but here are some examples from the dataset:



Remarkably, in this work the authors have also provided something very similar to what we are doing in our studies into the efficiency of synthetic data: they have measured the accuracy of the results (again measured with intersection-over-union) depending on the portion of synthetic images in the dataset:



結構域自適應的條件生成對抗網路

本文由Hong等人撰寫(完整的pdf)提出了標準鑑別器 - 分段器架構的另一種修改。 從第一次看到架構,我們甚至可能沒有注意到任何差異:




但實際上這種架構非常有趣:它將GAN整合到完全卷積網路(FCN)中。 我們在之前的NeuroNugget帖子中討論了FCN; 它是用於分割問題的網路體系結構,它通過反捲積層提供特徵來返回圖片中每個畫素的標籤。

在此模型中,GAN用於緩解源域和目標域之間的差距。 例如,前一篇論文通過中間特徵空間對齊兩個域,從而隱含地假定兩個域具有相同的決策函式。 這種方法放鬆了這個假設:在這裡我們學習來自兩個域的特徵圖之間的殘差,因為生成器學會產生類似於真實影象中的特徵以欺騙鑑別器; 之後,更新FCN引數以適應GAN所做的更改。

同樣,我們將顯示下面結果的數字比較,但這裡是資料集中的一些示例:




值得注意的是,在這項工作中,作者還提供了與我們在合成數據效率研究中所做的非常類似的事情:他們已經測量了結果的準確性(再次通過交叉結合測量)取決於部分 資料集中的合成影象:




by 老趙 2 07

Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation

This work by Sankaranarayanan et al. (full pdf) presents another modification of the basic approach based on GANs that brings the embeddings closer in the learned feature space. This time, let us begin with the picture and then explain it:



The base network, whose architecture is similar to a pre-trained model such as VGG-16, is split into two parts: the embedding denoted by F and the pixel-wise classifier denoted by C. The output of C is a map of labels upsampled to the same size as the input of F. The generator network G takes as input the learned embedding and reconstructs the RGB image. The discriminator network D performs two different tasks given an input: it classifies the input as real or fake in a domain-consistent manner and also performs a pixel-wise labeling task similar to the network C (this is applied only to source data since target data does not come with any labels during training).

So the main contribution of this work is a technique that employs generative models to align the source and target distributions in the feature space. For this purpose, the authors first project intermediate feature representations obtained using a CNN to the image space by training a reconstruction part of the network and then impose the domain alignment constraint by forcing the network to learn features such that source features produce target-like images when passed to the reconstruction module and vice versa.

Sounds complicated, doesn’t it? Well, let’s see how all of these methods actually compare.

A Numerical Comparison of the Results

We have chosen these three papers for an in-depth look because their results are actually comparable! All three papers used domain adaptation with GTA5as the source (synthetic) dataset and Cityscapes as the target dataset, so we can literally just compare the numbers.

The Cityscapes dataset contains 19 classes characteristic for city outdoor scenes such as “road”, “wall”, “person”, “car”, etc. And all three papers actually contain tables with results broken down with respect to the classes.

Murez et al., image-to-image translation:



Hong et al., conditional GAN:



從合成數據中學習:解決語義分割的域移位問題

這項工作由Sankaranarayanan等人完成(完整的pdf)介紹了基於GAN的基本方法的另一種修改,它使嵌入在學習的特徵空間中更接近。 這一次,讓我們從圖片開始,然後解釋它:




基礎網路的結構類似於預先訓練的模型,如VGG-16,分為兩部分:F表示的嵌入和C表示的逐畫素分類器。C的輸出是標籤的對映上取樣到與F的輸入相同的大小。生成器網路G將學習的嵌入作為輸入並重建RGB影象。 鑑別器網路D在給定輸入的情況下執行兩個不同的任務:它以域一致的方式將輸入分類為真實或偽造,並且還執行類似於網路C的畫素標記任務(這僅適用於源資料,因為目標資料在訓練期間沒有任何標籤)。

因此,這項工作的主要貢獻是採用生成模型來對齊特徵空間中的源和目標分佈的技術。 為此,作者首先通過訓練網路的重建部分,將使用CNN獲得的中間特徵表示投影到影象空間,然後通過強制網路學習特徵使得源特徵產生類似目標的影象來強加域對齊約束。 當傳遞給重建模組時,反之亦然。

聽起來很複雜, 那麼,讓我們看看所有這些方法實際上是如何比較的。

結果的數值比較

我們選擇這三篇論文進行深入研究,因為它們的結果實際上是可比較的! 所有這三篇論文都使用了GTA5的域適應作為源(合成)資料集和Cityscapes作為目標資料集,因此我們可以簡單地比較這些數字。

Cityscapes資料集包含19個城市戶外場景的特徵,如

“road”, “wall”, “person”, “car”  等。所有這三篇論文實際上都包含表格,其中的結果按類別進行細分。

Murez等人,影象到影象的翻譯:




Hong等人,條件GAN:




by 老趙 2 08

Sankaranarayanan et al., GAN in an FCN:



The mean results are 31.8, 44.5, 37.1 respectively, so it appears that the image-to-image approach is the least successful and Conditional GAN is the winner. For clarity, let us also compare the top-3 most and least distinguishable classes (i.e., with best and worst results) for every approach.

Most distinguishable, in the same order of models:

  • road (85.3), car (76.7), veg (72.0)

  • road (89.2), veg (77.9), car (77.8)

  • road (88.0), car (80.4), veg (78.7)

This is not too interesting, obviously roads and cars are always the best. But with the worst classes the situation is different:

  • train (0.3), bike (0.6), rider (3.3)

  • train (0.0), fence (10.9), wall (13.5)

  • train (0.9), t sign (11.6), pole (16.7)

Again, the “train” class seems to pose some kind of an insurmountable challenge (probably there’re just not so many trains in the training set, pardon the pun), but the others are all different. So let us compare all models based on the “bike”, “rider”, “fence”, “wall”, “t sign”, and “pole” classes. Now their scores will be very distinct:



You can draw different conclusions from these results. But the main result that we personally find truly exciting is that with many different approaches that could be proposed for such a complex task, results in different papers at the same conference (so the authors could not follow one another, these results appeared independently) are perfectly comparable with each other, and researchers do not hesitate to publish these comparable numbers instead of some comfortable self-developed metrics that would prove their unquestionable supremacy. Way to go, modern machine learning!

And finally, let us finish on a lighter note, with one more fun paper about synthetic data.

Sankaranarayanan等人,GAN in FCN:




平均結果分別為31.8,44.5,37.1,因此看起來影象到影象的方法是最不成功的,條件GAN是贏家。 為清楚起見,我們還要比較每種方法的前3個最不可區分的類別(即最佳和最差結果)。

最明顯的是,按照相同的模型順序:

  • road (85.3), car (76.7), veg (72.0)

  • road (89.2), veg (77.9), car (77.8)

  • road (88.0), car (80.4), veg (78.7)

但是最糟糕的課程情況則不同:

  • train (0.3), bike (0.6), rider (3.3)

  • train (0.0), fence (10.9), wall (13.5)

  • train (0.9), t sign (11.6), pole (16.7)

再次,

“train” 類似乎構成了一種不可逾越的挑戰(可能在訓練集中沒有那麼多集合),但其他人都是不同的。 因此,讓我們比較所有基於“自行車”,

“bike”, “rider”, “fence”, “wall”, “t sign”, 和“pole”  類的模型。 現在他們的分數將非常明顯:




你可以從這些結果中得出不同的結論。 但是我們個人覺得真正令人興奮的主要結果是,對於這樣一個複雜的任務可以提出許多不同的方法,在同一個會議上產生不同的論文(因此作者不能互相追隨,這些結果獨立出現)是 完全可以相互比較,研究人員毫不猶豫地釋出這些可比較的數字,而不是一些舒適的自我開發的指標,這將證明他們無可置疑的至高無上的地位方式去嘻嘻嘻現代機器學習。

最後,讓我們以更輕鬆的方式完成,還有一篇關於合成數據的有趣論文。


by 老趙 2 09

Free supervision from video games

In this work, Philipp Krähenbühl (full pdf) created a wrapper for the ever popular Microsoft DirectX rendering API and added a specialized code into the game as it is running. This enables the DirectX engine to produce ground truth labels for instance segmentation, semantic labeling, depth estimation, optical flow, intrinsic image decomposition, and instance tracking in real time! Which sounds super cool because now, instead of labeling data manually or creating special purpose synthetic data engines, a researcher can just play video games all day long! All you need to do is find a suitable 3D game:



And with that, we finish the fourth installment on CVPR 2018. Thank you for your attention — and stay tuned!

Sergey Nikolenko
Chief Research Officer, Neuromation

Anastasia Gaydashenko
former Research Intern at Neuromation, currently Machine Learning Intern at Cisco


免費監督視訊遊戲

在這項工作中,PhilippKrähenbühl(完整的pdf)為流行的Microsoft DirectX渲染API建立了一個包裝器,並在遊戲執行時為遊戲添加了專門的程式碼。 這使得DirectX引擎能夠實時生成地面實況標籤,例如分段,語義標記,深度估計,光流,內在影象分解和例項跟蹤! 這聽起來非常酷,因為現在,研究人員不僅可以手動標記資料或建立專用合成數據引擎,而且可以整天玩視訊遊戲! 您需要做的就是找到合適的3D遊戲:




我們完成了CVPR 2018的第四部分。感謝你的關注 - 敬請關注。

Sergey Nikolenko
Chief Research Officer, Neuromation

Anastasia Gaydashenko
former Research Intern at Neuromation, currently Machine Learning Intern at Cisco

by 老趙 2