Current Issue Cover
深度学习汉字生成与字体风格迁移综述

王晨, 吴国华, 姚晔, 任一支, 王秋华, 袁理锋(杭州电子科技大学网络空间安全学院, 杭州 310018)

摘 要
汉字字体风格迁移旨在保证在语义内容不变的同时对汉字的字形作相应的转换。由于深度学习在图像风格迁移任务中表现出色,因此汉字生成可以从汉字图像入手,利用此技术实现汉字字体的转换,减少字体设计的人工干预,减轻字体设计的工作负担。然而,如何提高生成图像的质量仍是一个亟待解决的问题。本文首先系统梳理了当前汉字字体风格迁移的相关工作,将其分为3类,即基于卷积神经网络(convolutional neural network,CNN)、自编码器(auto-encoder,AE)和生成对抗网络(generative adversarial network,GAN)的汉字字体风格迁移方法。然后,对比分析了22种汉字字体风格迁移方法在数据集规模方面的需求和对不同字体类别转换的适用能力,并归纳了这些方法的特点,包括细化汉字图像特征、依赖预训练模型提取有效特征、支持去风格化等。同时,按照汉字部首检字表构造包含多种汉字字体的简繁体汉字图像数据集,并选取代表性的汉字字体风格迁移方法进行对比实验,实现源字体(仿宋)到目标字体(印刷体和手写体)的转换,展示并分析Rewrite2、zi2zi、TET-GAN(texture effects transfer GAN)和Unet-GAN等4种代表性汉字字体风格迁移方法的生成效果。最后,对该领域的现状和挑战进行总结,展望该领域未来发展方向。由于汉字具有数量庞大和风格多样的特性,因此基于深度学习的汉字生成与字体风格迁移技术还不够成熟。未来该领域将从融合汉字的风格化与去风格化为一体、有效提取汉字特征等方面进一步探索,使字体设计工作向更灵活、个性化的方向发展。
关键词
Review of Chinese characters generation and font transfer based on deep learning

Wang Chen, Wu Guohua, Yao Ye, Ren Yizhi, Wang Qiuhua, Yuan Lifeng(School of Cyberspace, Hangzhou Dianzi University, Hangzhou 310018, China)

Abstract
Deep learning technology is capable of image-style transfer tasks recently. The Chinese characters font transfer is focused on content preservation while the font attribute is converted. Thanks to the emerging deep learning, the workload of font design for Chinese characters can be alleviated effectively and the restrictions of human intervention are avoided as well. However, the quality of generated images is still a challenging issue to be resolved. Our review is aimed at the analysis of the most representative image generation and font transfer methods for Chinese characters. The literature review of contemporary font transfer methods for Chinese characters is systematically summarized and divided into three categories: 1) convolutional neural network based (CNN-based), 2) auto-encoder based (AE-based), and 3) generative adversarial networks based (GAN-based). To avoid information missing in the process of data reconstruction, a convolutional neural network extracted features of images without changing the dimensions of data. Auto-encoder processed the data through a deep neural network to learn the distribution of real samples and generate realistic fake samples. Generative adversarial networks became popular in Chinese characters font transfer after being proposed by Goodfellow. Its structure consists of a generator and a discriminator generally. The core idea of generative adversarial networks came from the Nash equilibrium of game theory, which is reflected in the process of continuous optimization between the generator and discriminator. Its generator learned the distribution of real data, generated fake images, and induced discriminators to make wrong decisions. The discriminator tried to determine whether the input data is real or fake. Through this game between generator and discriminator, the latter could not distinguish the real image from the fake in the end. According to the way of learning font style features of Chinese characters, we divided these methods based on GAN into three categories: 1) self-learning font style features, 2) external font style features, and 3) extractive font style features. We introduced twenty-two font transfer methods for Chinese characters and summarized the performance of these methods in terms of dataset requirements, font category supports, and evaluations for generated images. The key factors of these font transfer methods are introduced, compared, and analyzed, including refining Chinese characters features, relying on a pre-trained model for effective feature extraction, and supporting de-stylization. According to the uniformed table of radicals for Chinese characters, we built a data set consisting of 6 683 simplified and traditional characters in five fonts. To accomplish the transformation from source font (simfs.ttf) to target font (printed font and hand-written font), comparative experiments are carried out on the same data set. The comparative analysis of four archetypal font transfer methods for Chinese characters (Rewrite2, zi2zi, TET-GAN, and Unet-GAN) are implemented. Our quantitative evaluation metrics are composed of root mean square error (RMSE) and Pixel-level accuracy (pix_acc), and several generated results of each method for comparison were shown. The strokes of characters generated by Unet-GAN are the most complete and clear according to the subjective and objective evaluation metrics of generated images, which is competent for the transfer and generation of printing and handwriting font. At the same time, the methods named Rewrite2, zi2zi, and TET-GAN are more suitable for the font transfer task of printing characters, and their ability to generate strokes of Chinese characters needs to be improved. We summarized some challenging issues like blurred strokes of Chinese characters, immature methods of multi-domain transformation, and large-scale training data set applications. The future research direction can be further extended on the aspects of 1) integrating the stylization and de-stylization of Chinese characters, 2) reducing the size of the data set, and 3) extracting features of Chinese characters more effectively. Furthermore, its potential can be associated with information hiding technology for document watermarking and embedding secret messages.
Keywords

订阅号|日报