Current Issue Cover
风格强度可变的人脸风格迁移网络

廖远鸿1, 钱文华1, 曹进德2(1.云南大学信息学院, 昆明 650504;2.东南大学数学学院, 南京 214135)

摘 要
目的 针对人脸风格迁移算法StarGAN (star generative adversarial network)、MSGAN (mode seeking generative adversarial network)等存在细节风格学习不佳、迁移效果单一和生成图像失真等缺点,提出一种能够降低失真并生成不同风格强度图像的人脸风格迁移算法MStarGAN (multilayer StarGAN)。方法 首先,通过特征金字塔网络(feature pyramid network,FPN)构建前置编码器,生成蕴含有图像细节特征的多层特征向量,增强生成图像在风格传输时能学习到的风格图像的细节风格;其次,使用前置编码器对原图像及风格图像各生成一个风格向量并进行组合,利用组合后的风格向量进行风格传输,使生成图像具有不同的风格迁移强度;最后,采用权重解调算法作为生成器中的风格传输模块,通过对卷积权重的操作代替在特征图上的归一化操作,消除特征图中的特征伪影,减少生成图像中的失真。结果 在Celeba_HQ数据集上进行实验,与MSGAN、StarGAN v2等对比算法相比,在参考引导合成实验中,MStarGAN的FID (Frechét inception distance score)指标分别降低了18.9和3.1,LPIPS (learnedperceptual image patch similarity)指标分别提升了0.094和0.018。在潜在引导合成实验中,MStarGAN的FID指标分别降低了20.2和0.8,LPIPS指标分别提升了0.155和0.92,并能够生成具有不同风格强度的结果图像。结论 提出的算法能够传输图像的细节风格,生成具有不同强度的输出图像,并减少生成图像的失真。
关键词
MStarGAN: a face style transfer network with changeable style intensity

Liao Yuanhong1, Qian Wenhua1, Cao Jinde2(1.School of Information Science and Engineering, Yunnan University, Kunming 650504, China;2.School of Mathematics, Southeast University, Nanjing 214135, China)

Abstract
Objective The style transfer algorithm can transfer the style from the art image to the original natural image. The style image provides certain features, such as style texture and stroke, while the content image provides the contour structure. The goal of the style transfer algorithm is to synthesize a new stylized image with the texture stroke of the style image and the contour structure of the content image. The early face style transfer algorithm applies mathematical modeling to build a filter that counts the local features of the target image to understand its style. This algorithm then establishes a statistical model to describe the image style. However, the face style transfer algorithm only generates a single style, the resulting image style is not obvious, and needs to be modeled manually, thereby limiting its efficiency. With the rise of deep learning, the style transfer algorithm has started using the deep learning model as its core. Given that generative adversarial network(GAN)can generate images that satisfy certain distribution laws, we can generate a target image that is similar to the real image by training GAN. Therefore, GAN has been widely used in image style transfer algorithms. The main image style transfer algorithms are divided into two categories. The algorithms in the first category only improve GAN without using a pre-encoder, such as pix2pix and CycleGAN, while those in the second category use a pre-encoder. Due to the addition of encoders before the GAN structure, the resulting network structure becomes complex yet achieves highly realistic results, such as StyleGAN and StarGAN. To overcome the shortcomings of some face style transfer algorithms, such as StarGAN and MSGAN, which have poor detail style learning, insignificant style transfer effect, and generation of distorted images, we present a face style migration algorithm called multi-layer StarGAN(MStarGAN)with controllable style intensity. Method First, we construct the pre-encoder through the feature pyramid network(FPN)to generate multilayer feature vectors containing image detail features. Compared with the original 1 × 64 feature vector, the pre-encoder constructed by FPN can output a 6 × 256 feature vector, which contains additional details of the original image. Therefore, the generated image can learn the detailed style of the style image during style transmission. Second, we use the preencoder to generate style vectors for the original and style images and then combine these vectors. We then use the combined style vector for style transmission. We can also adjust the number of layers of this vector so that the style of the generated image is biased to either the original or style image, hence resulting in different style transfer intensities for the generated image. Third, we introduce a new loss function to maintain balance in the style of the generated image and ensure that the image will not be too biased toward either the original or style image. Fourth, we apply the weight demodulation algorithm as our style transmission module in the generator. The traditional method AdaIN has been proven to distort the generated image. By replacing the normalization operation on the feature map with the operation of convolution weight, we eliminate the feature artifacts in the feature map and reduce the distortion in the generated image. Result We implement our model in Python and test it on the Celeba_HQ dataset with RTX2080Ti. Our model not only generates high-quality random face images but also makes the generated images learn the style of style images, including hair and skin color. Compared with the multimodal unsupervised image-to-image translation), diverse image-to-image translation, MSGAN, and StarGAN V2 algorithms, in the latent-guided synthesis experiment, the Frechét inception distance(FID)index of the proposed algorithm is reduced by 18. 5, 39. 2, 20. 2, and 0. 8, respectively, while its learned perceptual image patch similarity (LPIPS)index is increased by 0. 181, 0. 366, 0. 155, and 0. 092 respectively. In the reference-guided synthesis experiment, the FID index of the proposed algorithm is reduced by 86. 4, 32. 6, 18. 9, and 3. 1, respectively, while its LPIPS index is increased by 0. 23, 0. 095, 0. 094, and 0. 018, respectively. In sum, our algorithm can generate result images with different styles and intensities. Conclusion The proposed algorithm can transmit the detail style of the image, control the style intensity of the output image, and reduce the distortion of the generated image.
Keywords

订阅号|日报