• Semiconductor Optoelectronics
  • Vol. 45, Issue 6, 960 (2024)
WEI Yiran, YI Junkai, ZHU Kequan, and TAN Lingling
Author Affiliations
  • College of Automation, Beijing Information Science and Technology University, Beijing 100192, CHN
  • show less
    DOI: 10.16818/j.issn1001-5868.2024052602 Cite this Article
    WEI Yiran, YI Junkai, ZHU Kequan, TAN Lingling. Deep Fusion-GAN Enhancement Model for Text-to-Image[J]. Semiconductor Optoelectronics, 2024, 45(6): 960 Copy Citation Text show less

    Abstract

    A deep fusion generative adversarial network (DF-GAN) enhancement model combined with self-attention mechanism is proposed for low semantic relevance, fuzzy details, and inadequate structural integrity in text-to-image tasks. First, the bidirectional encoder representations from transformers (BERT) model is used to mine the semantic features of text context and combined with the deep text-image fusion block to realize the matching of deep text semantics and image regional features. Second, a self-attention mechanism module is introduced as a supplement to the convolution module at the model architecture level, aiming to enhance the establishment of long-distance and multilevel dependencies. The experimental results demonstrate that the proposed enhancement model not only strengthens the semantic relationship between the text and image but also ensures the inclusion of precise details and the overall integrity of the generated image.