論文閱讀筆記StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

阿新 • • 發佈：2021-11-09

combine CLIP with StyleGAN

一.introduction and related work

1、CLIP主要完成的任務是：給定一幅影象，在32768個隨機抽取的文字片段中，找到能匹配的那個文字。為了完成這個任務，CLIP這個模型需要學習識別影象中各種視覺概念，並將視覺概念將圖片關聯，也因此，CLIP可以用於幾乎任意視覺人類任務。例如，一個數據集的任務為區分貓和狗，則CLIP模型預測影象更匹配文字描述“一張狗的照片”還是“一張貓的照片”。

2、text prompt 文字提示

3、related work about image manipulation base on text-guided

Some methods [10, 31, 27] use a GAN-based encoder-decoder architecture, to disentangle the semantics of both input images and text descriptions. ManiGAN [22] introduces a novel text-image combination module, which produces high-quality images.

A concurrent work to ours, TediGAN [51], also uses StyleGAN for text-guided image generation and manipulation.

[10] H. Dong, Simiao Yu, Chao Wu, and Y. Guo. Semantic imagesynthesis via adversarial learning.Proc. ICCV, pages 5707–5715, 2017

[27]Yahui Liu, Marco De Nadai, Deng Cai, Huayang Li, XavierAlameda-Pineda, N. Sebe, and Bruno Lepri.Describewhat to change: A text-guided unsupervised image-to-imagetranslation approach.Proceedings of the 28th ACM Interna-tional Conference on Multimedia, 2020

[31]Seonghyeon Nam, Yunji Kim, and S. Kim. Text-adaptivegenerative adversarial networks: Manipulating images withnatural language. InNeurIPS, 2018

4、While most works perform image manipulations in the W or W+ spaces, Wuet al. [50] proposed to use the StyleSpace S, and showed that it is better disentangled than W and W+

Our latent optimizer and mapper work in the W+ space, while the input-agnostic directions that we detect are in S.

二.contributions

In this work we explore three ways for text-driven image manipulation:

1.We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt.

2.we describe a latent mapper that infers a text-guided latent manipulation step fora given input image, allowing faster and more stable text-based manipulation.

3.Finally, we present a method for mapping a text prompts to input-agnostic directions in Style-GAN’s style space, enabling interactive text-driven image manipulation.

中文：

Latent Optimization: 將CLIP作為loss網路，這是最通用的方法，但是修改一張圖片需要好幾分鐘。
Latent Mapper：固定文字提示，以待修改的圖片作為起點，Mapper推理根據文字提示該如何修改圖片，然後對圖片進行修改。
Global Direction：與方法2類似，將文字提示對映到StyleGAN的‘style’空間，從而修改影象。

三.method

論文閱讀筆記StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

論文閱讀筆記StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

論文閱讀筆記exploiting spatial dimensions of latenr in GAN for real-time image editing

Discriminative Learning of Deep Convolutional Feature Point Descriptors 論文閱讀筆記

【論文閱讀筆記】-針對RSA的短解密指數的密碼學分析(Cryptanalysis of Short RSA Secret Exponents)

MapReduce 論文閱讀筆記

【論文閱讀筆記】《Conditional Generative Adversarial Nets》

CVPR2020｜Image Processing Using Multi-Code GAN Prior【論文閱讀筆記】

cornernet論文閱讀筆記

論文閱讀筆記《Distribution Consistency Based Covariance Metric Networks for Few-Shot Learning》

論文閱讀筆記《Few-Shot Learning Through an Information Retrieval Lens》

論文閱讀筆記5-An Asynchronous Energy-Efficient CNN Accelerator with Reconfigurable Architecture

論文閱讀筆記：《SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation》

【論文閱讀筆記】How Robust is 3D Human Pose Estimation to Occlusion?

論文閱讀筆記《RelationNet2: Deep Comparison Columns for Few-Shot Learning》

論文閱讀筆記：《CRNet: Cross-Reference Networks for Few-Shot Segmentation》

【論文閱讀筆記】Towards Accurate Multi-person Pose Estimation in the Wild

論文閱讀筆記《Automatic Fabric Defect Detection with a Multi-Scale Convolutional Denoising Autoencoder Net》

論文閱讀筆記：Social Collaborative Filtering for Cold-start Recommendations

論文閱讀筆記《Convolutional Neural Networks for Steel Surface Defect Detection from Photometric Stereo》

Fast RCNN論文閱讀筆記

論文閱讀筆記StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

相關推薦