1. 程式人生 > 其它 >論文閱讀筆記:gan inversion:a survey

論文閱讀筆記:gan inversion:a survey

GAN inversion:is to obtain the ‘real’ images’ latent codes and perform some subsequent image processing tasks

by manipulating the latent codes in the latent space.

一、gan models

  • DCGAN
  • WGAN
  • BigGAN
  • PGGAN
  • StyleGAN

二、datasets

  • ImageNet
  • CelebA
  • Flickr-Faces-HQ(FFHQ)
  • LSUN
  • DeepFashion,AnimeFaces, and StreetScapes

三、evaluation metrics

For evaluation, there are two important aspects for GAN inversion: how photorealistic(圖形真實感,影象質量) (image quality) and faithful(inversion accuracy) (反演精確度) the generated image is. IS, FID, and LPIPS are widely used measurements to assess the quality of GAN-generated images; recent studies have also used SWD.

IS and FID are metrics for image diversity, while LPIPS is a metric for similarity. For inversion accuracy, most methods use the reconstruction distance,e.g.PSNR or SSIM.Some other methods [59] use cosine or Euclidean distance toevaluate different attributes between the input and output,while other approaches [95] use classification accuracy forassessment.

3.1 image quality

(1)The mean opinion score(MOS) and difference mean opinion score(DMOS) have been used for subjective image quality assessment, where human raters are asked to assign perceptual quality scores to images.

平均意見分數(MOS)和差異平均意見分數(DMOS)已用於主觀影象質量評估,其中要求人類評分員為影象分配感知質量分數。

(1-5:bad to good,and the final MOS is calculated as the arithmetic mean算數平均值)

(2)the inception score (IS):is a widely used metric to measure the quality and diversity of images generated from GAN models.

(3)FID

(4)FSD

(5)SWD

(6)LPIPS

3.2 inversion accuracy

(1)propose reconstructor classification accuracy(RCA) to measure modelinterpretability by predicting the direction in the latentspace that a given image transformation is generated.

(2)Reconstruction Distances.To evaluate the reconstruction,the most widely used metrics are peak signal-to-noise ratio(PSNR) and structural similarity(SSIM)

四、gan inversion methods

一個好的潛在空間應該結構簡單,易於嵌入。這種潛在空間中的最新程式碼應具有以下兩個屬性:

(1)它應真實地重建輸入影象,並具有照片級真實感;

(2)它應便於下游任務(例如影象編輯)

4.1which space to embed

(1)Z space(the original):The generative model in the GAN architecture learns to map the values sampled from a simple distribution,e.g., normal or uniform distribution(正態分佈和均勻分佈), to the generated images.

the constraints of the Z space subject to a normal distribution limit its representative capacity for the semantic attributes

(2)W and W+ space:

最近的工作[16]通過一個8層的MLP實現的非線性對映網路,進一步將Native Z轉換為對映樣式向量,該網路形成另一箇中間潛在間隔層,即W space。

由於對映網路和仿射變換,StyleGAN的W Space比Z space包含更多未混淆的特徵。一些研究分析了兩種空間的可分性和語義。在[21]中,Shen等人說明了使用W Space的模型在可分性和表示方面比基於Z Space的模型表現得更好。StyleGAN的生成器傾向於基於W Space學習語義資訊,其效能優於使用Z Space的生成器。

對於語義,上述工作根據不同屬性的潛在分離邊界來評估分類準確性。由於直接嵌入W空間或者Z空間並不容易,一些工作[24],[25]利用了另一個潛在空間W+,其中,不同的中間latent vector w 通過AdaIN[81]饋入到生成器的每個層中。對於具有18層的1024×1024 stylegan,w∈ W has 512 dimensions,w∈ W+的demensions為18×512.

(3)S space:

This S space is proposed to achieve spatial disentanglement in the spatial dimensioninstead of at the semantic level.

空間糾纏是由於基於風格的生成器的內在複雜性和adain的空間不變性。

Recent methods [63], [67] have used learned affine transformations(仿射變換) to turn z∈ Z or w∈ W into channelwise style parameters s for each layer of the generator. By directly intervening the style codes∈S, both methods [63], [67] can achieve fine-grained(細粒度) controls on local translations

(4)P space:沒看明白...

4.2inversion methods

(1)learning-based gan inversion

the learning-based approach often achieves better performance than direct optimization and does not fall into local optima(區域性最優)

(2)optimization-based gan inversion

(3)Hybrid (混合)GAN Inversion

4.3 characteristics of gan inversion methods

1.supported resolution

2.semantic awareness

3.layerwise

4.out of distribution

4.4latent space navigation

五、applications

5.1 image manipulation

5.2 image generation

5.3 image restoration(影象復原,恢復)

5.4 image interpolation(影象插值)

5.5 style transfer

5.6 compressive sensing(壓縮感知)

5.7 semantic diffusion

5.8 category transfer

5.9adversarial defense

5.10 3d restruction

5.11 image understanding

5.12 multimodal learning

5.13 medical imaging

六、challenges and future directions