• Chinese Optics Letters
  • Vol. 23, Issue 12, 120501 (2025)
Yunrui Wang1, Wenqiang Wan1、*, Jiahui Fu1, and Yanfeng Su2
Author Affiliations
  • 1School of Science, East China Jiaotong University, Nanchang 330013, China
  • 2College of Optical and Electronic Technology, China Jiliang University, Hangzhou 310018, China
  • show less
    DOI: 10.3788/COL202523.120501 Cite this Article Set citation alerts
    Yunrui Wang, Wenqiang Wan, Jiahui Fu, Yanfeng Su, "Complex-valued dense atrous neural network for high-quality computer-generated holography," Chin. Opt. Lett. 23, 120501 (2025) Copy Citation Text show less

    Abstract

    In this paper, we propose, to our knowledge, a new complex-valued dense atrous neural network (CDANN) for phase-only hologram (POH) generation. The network architecture integrates a complex-valued partial convolution (C-PConv) module into the down-sampling stages of dual U-Net structures, enhancing computational efficiency through selective channel-wise processing. To improve feature extraction, we introduce a novel complex-value dense atrous convolution (DAC) module, which employs four cascaded branches with multi-scale atrous convolutions to capture intricate features while maintaining spatial resolution. Additionally, we integrate a spatial pyramid pooling (SPP) module into the U-Net architecture to encode multi-scale contextual features derived from the DAC module. This hierarchical integration expands the U-Net’s receptive field while facilitating cross-layer feature fusion. The proposed method achieves an average peak signal-to-noise ratio (PSNR) of 32.19 dB and an average structural similarity index measure (SSIM) of 0.892 within a running time of 24 ms, outperforming conventional approaches. Experiments confirm significant improvements in both reconstruction quality and computational efficiency, making the CDANN suitable for real-time holographic displays.
    © 2025 Chinese Optics Letters

    1. Introduction

    The computer-generated hologram (CGH) has risen as a solution for the creation of real-time holographic display[1]. In contrast to traditional optical holography based on physical optical processes, CGH accomplishes the computation and generation of holograms within a computational environment. With the remarkable enhancement in computational capabilities for data processing and the rapid advancements in digital imaging technologies, CGH not only improves the computational efficiency but also significantly enriches the complexity and detail of the generated holograms[2,3]. However, constrained by the modulation mechanism of the spatial light modulator (SLM), complex amplitude holograms must be converted into phase-only holograms (POHs) or amplitude-only holograms. POHs have emerged as the preferred choice for CGH due to their higher diffraction efficiency and the absence of conjugate images during reconstruction[4,5].

    The paramount challenge in CGH technology lies in striking a balance between the quality of reconstructed images and computational efficiency[6]. Traditional iterative algorithms, such as the Gerchberg–Saxton (GS) algorithm[710] and stochastic gradient descent (SGD) algorithm[11,12], typically require several tens of iterations to achieve high-quality reconstructed images, resulting in time-consuming CGH generation. Non-iterative algorithms, such as double-phase amplitude encoding[13,14] and error diffusion methods[15], can offer faster computation speeds, but the reconstructed images are prone to suffering from speckle noise and artifacts[16,17].

    Given the limitations of traditional CGH algorithms, deep-learning-based CGH has been introduced to address the trade-off between computational efficiency and the quality of reconstructed images[18]. Learning-based CGH algorithms are primarily divided into two categories: data-driven deep learning and model-driven deep learning. Data-driven CGH algorithms tackle inverse problems by learning the encoding strategies of traditional algorithms, necessitating the construction of a large-scale dataset comprising target images and corresponding CGHs[19]. Horisaki et al. pioneered the use of convolutional neural networks (CNNs) to infer holograms from handwritten digit images[20]. Sinha et al. demonstrated the potential of deep neural networks for solving end-to-end inverse problems in computational imaging[21]. Kavaklı et al. proposed a single complex-valued point spread function to optimize the propagation of POH towards the target plane[22]. However, the requirement for large-scale labeled CGH datasets in these approaches increases the computational burden, and the quality of these datasets limits the generalization capability of CGHs. To address these two issues, model-driven deep learning has been proposed for the generation of CGHs by integrating physical diffraction models into the network architecture[23]. Model-driven CGH algorithms enable unsupervised auto-learning of the latent encodings of POHs, and the loss function between the image dataset and the output reconstruction can be directly computed, eliminating the need for labeled datasets and directly predicting the POHs[24]. For instance, Peng et al. introduced a neural network architecture, HoloNet, that uses an in-loop camera algorithm to reduce the mismatches between reconstructed and target images for generating high-quality holograms[25]. Wu et al. developed Holo-encoder based on an autoencoder neural network architecture, which can autonomously learn the latent encoding of POHs in an unsupervised manner[26]. Liu et al. developed a 4K diffraction-model-driven network (4K-DMDNet) that incorporates strict frequency-domain constraints to enhance the quality of reconstructed images[27]. Building on this, Song et al. constructed a real-time 3D holographic imaging framework capable of processing dynamic real-world scenes by integrating physical models with deep learning architectures[28]. Zhong et al. employed complex-valued convolutional neural networks (CCNNs) to directly learn the complex-valued wave field at the spatial light modulator plane, enabling the generation of high-quality POHs[29]. However, these methods primarily rely on the U-Net architecture for image feature extraction, which results in inefficient utilization of computational resources and fails to fully exploit the neural network’s receptive field for processing extracted features. These limitations not only constrain further advancements in reconstruction quality but also contribute to increased computational complexity.

    In this paper, we propose a novel complex-valued dense atrous neural network (CDANN) for generating POHs, which demonstrates superior performance in both reconstruction quality and computational efficiency compared to conventional neural network approaches. The network architecture integrates a complex-valued partial convolution (C-PConv) module into the down-sampling stages of dual U-Net structures, enhancing computational efficiency through selective channel-wise processing. To enhance feature extraction, we introduce a novel complex-valued dense atrous convolution (DAC) module that incorporates four cascade branches with multi-scale atrous convolutions, enabling the capture of sophisticated features while preserving spatial information. Additionally, a spatial pyramid pooling (SPP) module is integrated into the U-Net structure to encode multi-scale contextual features derived from the DAC module. This architectural enhancement not only enlarges the network’s receptive field but also enables cross-layer feature fusion. Both simulation and experimental results demonstrate that the proposed CDANN achieves high-quality image reconstruction while significantly reducing computational time and resource requirements.

    2. Principles and Methods

    2.1. CDANN algorithm principle

    The schematic representation of the proposed network architecture is delineated in Fig. 1, encompassing two distinct networks: the complex amplitude prediction network (CAPN) and the hologram generation network (HGN). Initially, the complex amplitude of the target plane is predicted by the CAPN where the amplitude of the input image A is multiplied by a zero phase as an initial input complex amplitude. The resulting complex-valued wavefront in the target plane, modulated by the amplitude of input image A, can be mathematically represented as H1=Aexp(iφ),where φ is the phase predicted from the CAPN. The diffraction simulation from the target plane to the SLM plane can be accomplished through the forward propagation angular spectrum method (ASM)[30], which can be expressed as H2=F1{F[H1]·T1(fx,fy)},T1(fx,fy)={exp(i2πλz1λ2fx2λ2fy2),iffx2+fy2<1λ0,otherwise,where F[] and F1[] represent the Fourier transform and the inverse Fourier transform, respectively. T1(fx,fy) refers to the ASM transfer function, with fx and fy representing the spatial frequencies along the orthogonal x- and y-axes, respectively. The propagation distance z characterizes the separation between the target plane and SLM plane, while λ corresponds to the operational wavelength of the optical system.

    Detailed structure of the proposed network architecture.

    Figure 1.Detailed structure of the proposed network architecture.

    The HGN architecture generates the complex amplitude distribution H3 at the SLM plane, which is subsequently translated into a POH. The reconstructed image of the POH can be calculated using the backward propagation ASM from the SLM plane to the image plane. To optimize the network performance, the training procedure incorporates a mean squared error (MSE) loss function to evaluate the deviation between the reconstructed image and the ground truth distribution. Through iterative backpropagation optimization, the network parameters are systematically adjusted to minimize the loss function, consequently improving both the reconstruction fidelity and the phase encoding accuracy of the generated POH.

    2.2. Structure of CDANN

    The intricate architecture of the CAPN and HGN, as delineated in Fig. 2, is predicated on a modified U-Net framework. The HGN architecture incorporates three hierarchical down-sampling (DS) blocks, three corresponding up-sampling (US) blocks, complemented by skip connections, a DAC module, and an SPP module. Each DS block is comprised of a C-PConv layer succeeded by a complex-valued rectified linear unit (CRelu) layer, while the initial two US blocks are structured with a complex-valued deconvolutional (CDeconv) layer paired with a CRelu layer. The final US block (denoted as US1) maintains a simplified structure with a single CDeconv layer to preserve the complex amplitude characteristics of the output. The skip connections are strategically implemented to enable gradient flow during backpropagation, enhancing the network’s learning capability. The convolutional operations employ 3×3 kernels with a stride of 2 for feature extraction, while the deconvolutional layers utilize 4×4 kernels with a matching stride for feature reconstruction. The complex-valued convolutional operations are realized through dual real-valued kernels that independently process the real and imaginary components of the input signal. Nonlinear transformations are achieved through CRelu layers, which apply Relu activation separately to both real and imaginary components of the complex feature maps. In comparison to the HGN architecture, the CAPN adds an additional US1 layer and a complementary DS layer, forming an enhanced symmetrical structure. This architectural enhancement is strategically designed to enhance the network’s ability to discern more intricate features at higher levels of the computational hierarchy. Particularly when engaged in the processing of high-dimensional datasets, such as those inherent in the domain of CGH, the CAPN is poised to furnish a richer tapestry of information. This meticulously designed network architecture is pivotal for the processing of complex-valued data, which is paramount for advancing the state-of-the-art in deep learning algorithms for CGH.

    Detailed structures of the (a) CAPN and (b) HGN.

    Figure 2.Detailed structures of the (a) CAPN and (b) HGN.

    To improve computational efficiency, we introduce a C-PConv layer in the down-sampling process[31]. This design addresses the significant redundancy observed in feature maps, where traditional convolution operations unnecessarily process all input channels, resulting in increased computational burden and inefficient memory usage. The proposed layer selectively applies complex-valued convolution to a subset of input channels based on feature map redundancy characteristics, preserving critical feature information while maintaining the remaining channels unchanged. This selective processing strategy retains essential feature map information while reducing redundant computations. The processed and unprocessed channels are subsequently concatenated along the channel dimension to produce the complete output feature map. This approach maintains robust feature extraction capabilities while decreasing computational requirements, ultimately enhancing both model efficiency and practical performance. The computational procedure is formally expressed as Out=Concat{Conv[S1(x)],S2(x)}.

    Let x denote the input feature map, where S1 and S2 represent the channel-wise splitting operations. The complex-valued convolution operation (Conv) processes one partition while leaving the other unchanged, with the output feature map (Out) obtained through channel-wise concatenation (Concat) of both components.

    In the context of semantic segmentation and other tasks involving dense prediction, a notable reduction in the spatial resolution of the resulting feature maps is observed. This reduction is attributable to the successive application of max pooling and striding across the layered architecture of the network. To ameliorate this phenomenon, we advocate incorporating atrous convolution, a technique initially conceived for the expeditious computation of wavelet transforms[32]. This methodology affords the computation of responses at arbitrary resolutions for any stratum within the network. It can be directly implemented in pre-trained networks or seamlessly integrated into the training regimen. Mathematically, the computation of atrous convolution for two-dimensional signals is as follows: y[i,j]=m=1Mn=1Nx[i+rh·m,j+rω·n]·ω[m,n],where x[i,j] represents the input feature map and y[i,j] is the output. M and N are the numbers of elements of the convolution kernel in the vertical and horizontal directions, respectively. ω[m,n] is the weight of the convolution kernel at position (m,n). rh and rw correspond to the dilation rates of the convolution kernel in the vertical and horizontal directions, respectively. See Fig. 3 for illustration. By incorporating dilated intervals within the convolution kernel, atrous convolution can enlarge the receptive field while preserving computational efficiency and parameter count. This design not only enhances contextual information aggregation but also facilitates multi-scale feature extraction without compromising spatial resolution. Furthermore, the enhanced feature fusion capability further improves model performance while increasing architectural flexibility.

    Illustration of the atrous convolution.

    Figure 3.Illustration of the atrous convolution.

    As depicted in Fig. 4, the DAC module consists of four sequentially integrated branches with atrous convolutions organized in a hierarchical and stacked manner. With an increment in the number of atrous convolutional layers, specifically progressing from the initial to the third and fifth layers, the receptive field of each branch expands incrementally to dimensions of 3, 7, 9, and 19. Each branch includes a 1×1 convolution operation for linear activation, which enhances the module’s feature discrimination capacity. The mathematical formulation of this operation is given by Out=x+i=14RL[Convi(x)],where Convi represents the convolution operations along the i-th path; RL denotes the activation function. Convolutional layers with large receptive fields are particularly adept at capturing features of larger objects, providing more sophisticated and abstract information. Conversely, layers with smaller receptive fields exhibit heightened sensitivity to the nuances of smaller objects. The DAC module, by combining convolutional operations with varying attributes, extracts and represents the diverse characteristics of objects across a range of sizes. This multi-tiered approach ensures a comprehensive representation of the object features, enabling the network to discern both macroscopic and microscopic details with precision. The feature fusion employs direct channel-wise concatenation instead of weighted summation, preserving scale-specific information while maintaining inter-scale independence. This strategy prevents feature conflicts across scales, enabling optimal multi-scale feature utilization without interference-induced performance degradation.

    Framework of the DAC module.

    Figure 4.Framework of the DAC module.

    The SPP module is instrumental in retaining spatial information through the implementation of pooling operations within localized spatial domains. The dimensions of these spatial domains are scaled proportionally to the size of the input image, ensuring a consistent number of regions regardless of the image’s dimensions. The SPP module’s design accommodates input images of various aspect ratios and scales, offering processing flexibility. This capability allows for resizing input images to any desired scale, after which the same CNN can be applied.

    As illustrated in Fig. 5, the proposed SPP module, including four hierarchical levels, is capable of encoding global contextual information through the utilization of four distinct receptive fields: 2×2, 3×3, 5×5, and 6×6. The outputs at each level yield feature maps of diverse spatial extents. In an effort to reduce the dimensionality of the weight matrices and computational cost, a 1×1 convolutional operation is strategically applied subsequent to each pooling stage. This approach compresses the dimensionality of the feature maps to 1/n of their initial dimensionality, where n denotes the count of channels within the original feature maps. Ultimately, the pristine features are concatenated with the upsampled feature maps, amalgamating information from varying spatial scales. The computational process can be mathematically represented as Out=Conv{Concat[i=14MaxPooli(x)]+x},where MaxPooli represents the ith complex-valued max pooling layer. The SPP module employs complex-valued convolutions and multi-scale pooling operations, which not only strengthen the neural network’s capacity to recognize objects across varying scales but also enhance the efficiency of processing phase and amplitude information through complex feature fusion. This integration leads to improved model performance and generalization when handling complex data.

    Framework of the SPP module.

    Figure 5.Framework of the SPP module.

    These modules operate synergistically to optimize the performance of the CDANN architecture. During feature extraction, the C-PConv module’s computational optimization enables more efficient propagation of enhanced feature representations to the subsequent DAC module. The DAC module employs a hierarchical multi-scale feature extraction strategy through parallel convolutional operations with distinct dilation rates, enabling comprehensive representation of object characteristics across varying spatial scales. The module’s design bridges the gap between macroscopic and microscopic feature extraction, allowing for precise discrimination of objects with diverse spatial characteristics. Meanwhile, the SPP module captures multi-scale features through pooling operations with varying kernel sizes, followed by upsampling, concatenation, and dimensionality reduction processes, improving the model’s ability to handle targets at different scales. The synergistic combination of these modules improves the U-Net architecture’s capability in multi-scale feature extraction and representation. This architectural enhancement leads to improved model performance across diverse datasets while maintaining strong generalization capabilities, particularly in computational holography applications.

    3. Simulation and Experiments

    The construction of the proposed CDANN model was facilitated by employing Python 3.10 and PyTorch 2.1.1. The training regimen for the CDANN was conducted utilizing the super-resolution DIV2K dataset, with 800 images serving as the training dataset and 100 images as the validation set[33]. The laser wavelength is 532 nm and the pixel pitch of the POH was established at 4.5 µm. The diffraction distance is set to 20 cm. In the optimization process, the CDANN leveraged the Adam optimizer with an initial learning rate of 0.001. All computational simulations and training procedures were executed on a robust hardware platform comprising an Intel(R) Xeon(R) Platinum 8160 T CPU @ 2.10 GHz and an NVIDIA Tesla V100-16 GB GPU, ensuring the computational demands of the model were met without compromise.

    3.1. Simulation results

    A comparison of the numerical reconstruction images was performed between the CDANN, CCNN, HoloNet, Holo-encoder, and GS algorithm, as depicted in Fig. 6. The peak signal-to-noise ratio (PSNR) and the structural similarity index measure (SSIM) were employed as the evaluation metrics. In our experimental evaluation of the GS algorithm, we performed 1000 iterations and observed convergence initiation at approximately the 100th iteration. Based on this convergence behavior, we established 100 iterations as the optimal operational parameter for subsequent GS algorithm implementations. The test images sourced from the DIV2K dataset were resized to 1920pixel×1072 pixel. The details in Fig. 6 represent the outcomes post a threefold magnification. The simulation results demonstrate that while the GS algorithm achieves a PSNR of 36.48 dB in image reconstruction, its computational efficiency is significantly limited by a processing time of 36.75 s per frame. This substantial computational latency renders the algorithm impractical for real-time holographic display applications. Although the Holo-encoder achieves reconstruction results in the briefest timeframe, the resultant images exhibit lower PSNR and SSIM values compared with the other methods. The HoloNet and CCNN are capable of producing smooth reconstructed images, but distinct artifacts are perceptible. In contrast to these methodologies, the proposed CDANN effectively mitigates the scattering noise and artifacts, providing high-quality reconstructed images.

    Comparison of numerical reconstruction images of POHs generated by several algorithms.

    Figure 6.Comparison of numerical reconstruction images of POHs generated by several algorithms.

    Figure 7(a) illuminates the relationship between the running time for generating POH and the quality of reconstruction. The evaluation was conducted using a randomly sampled image from the DIV2K validation dataset, with quantitative metrics including both reconstruction time and PSNR being systematically measured. It is evident that the CDANN not only exhibits almost the fastest computational efficiency among all the methods but also sustains high-quality reconstructed images. The iterative GS algorithm can enhance reconstruction quality by incrementing the number of iterations, but this process is time-consuming. The proposed CDANN algorithm achieves high-quality 1920×1072 resolution POH generation with a PSNR of 32.34 dB within 24 ms, demonstrating superior performance compared to existing methods. Specifically, CDANN outperforms Holo-encoder by 8.53 dB in reconstruction quality while maintaining comparable computational efficiency. Furthermore, CDANN exhibits a 13 ms reduction in generation time and a 1.50 dB improvement in reconstruction quality compared to CCNN-based approaches. These performance metrics indicate that our method significantly reduces both running time and computational resource requirements while minimizing the discrepancy between predicted and ground truth values. The calculated results demonstrate the CDANN’s capability to simultaneously optimize imaging quality and computational efficiency, making it particularly suitable for real-time holographic display applications. Figure 7(b) illustrates the convergence behavior of different methods by plotting the loss function versus training epochs. The CDANN framework exhibits superior convergence, achieving the minimum loss within 50 epochs and outperforming other baseline methods. These results further validate the framework’s superior optimization efficiency and stable convergence behavior.

    (a) Correlation between the running time for generating POH and the quality of reconstruction; (b) relationship between the loss function and the training epoch.

    Figure 7.(a) Correlation between the running time for generating POH and the quality of reconstruction; (b) relationship between the loss function and the training epoch.

    To further assess the performance enhancement of the proposed CDANN relative to conventional CNN architectures, we conducted an ablation study utilizing 100 images from the DIV2K validation dataset. The quantitative evaluation, as detailed in Table 1, focuses on two key metrics: average PSNR (AvgPSNR) and average SSIM (AvgSSIM), providing a systematic comparison of reconstruction quality across different network configurations. The results demonstrate that CDANN achieves superior performance compared to traditional CNN, dense atrous neural network (DANN), and CCNN, with statistically significant improvements in both PSNR and SSIM metrics. Specifically, the conventional CNN architectures achieve an average PSNR of 28.15 dB and SSIM of 0.818 on the test dataset, highlighting their limitations in processing complex-valued holographic data. By incorporating dense atrous convolutional structures into the CNN, the DANN achieves improvements, with the average PSNR and SSIM increasing to 28.62 dB and 0.843, respectively, demonstrating enhanced capability in multi-scale feature extraction and structural detail preservation. The implementation of complex-valued convolutional operations in CCNN demonstrates performance gains, achieving an average PSNR of 29.38 dB and SSIM of 0.858. These results underscore the inherent advantages of complex-valued network architectures in effectively processing both phase and amplitude components of holographic data, leading to superior reconstruction quality compared to real-valued network implementations. The proposed CDANN architecture, which synergistically integrates complex-valued convolutional operations with dense atrous structures, achieves optimal performance with an average PSNR of 32.19 dB and SSIM of 0.892. These quantitative metrics not only validate the architectural efficacy of CDANN but also demonstrate its superior capability in artifact reduction and speckle noise suppression during holographic reconstruction.

     CNNDANNCCNNCDANN
    AvgPSNR (dB)28.1528.6229.3832.19
    AvgSSIM0.8180.8430.8580.892

    Table 1. Ablation Study on Various CNNs

    To systematically evaluate the generalization capability of the proposed model, we conducted rigorous experiments on the Flickr2K dataset[34]. A representative test set of 200 randomly selected images was established to objectively assess practical performance. Comparative analysis was performed against three established models (Holo-encoder, HoloNet, and CCNN) alongside our proposed CDANN. Experimental results (Table 2) demonstrate CDANN’s superior performance, achieving state-of-the-art metrics of 32.85 dB PSNR and 0.902 SSIM. The comparative analysis revealed significant performance gaps: Holo-encoder (24.59 dB PSNR, 0.783 SSIM) exhibited limitations in noise suppression and detail preservation; HoloNet (27.62 dB PSNR, 0.816 SSIM) showed improvement but still did not meet the ideal application standards; while CCNN (30.83 dB PSNR, 0.858 SSIM) approached but still fell short of CDANN’s performance in critical aspects of detail recovery and structural similarity. In summary, the experimental results demonstrate the robust cross-dataset generalization capability of the proposed model. The model consistently generates high-fidelity holograms with significantly improved reconstruction quality and superior detail preservation. These findings confirm the model’s distinct advantages in complex practical applications, particularly when handling diverse data distributions, where it exhibits remarkable performance.

     Holo-encoderHoloNetCCNNCDANN
    AvgPSNR (dB)24.5927.6230.8332.85
    AvgSSIM0.7830.8160.8580.902

    Table 2. Quantitative Comparison of Four Algorithms Using PSNR/SSIM (Flickr2K)

    3.2. Experiments and results

    To validate the effectiveness of our proposed methodology, we have implemented an experimental prototype as illustrated in Fig. 8. The system employed a 532 nm laser source, which was spatially filtered and collimated before illuminating a phase-only SLM with a 4.5 µm pixel pitch and 1920×1080 resolution. The SLM-modulated wavefront was then processed through a 4f optical system consisting of two 250 mm Fourier lenses and a filter, effectively suppressing zero-order and higher-order diffraction components. Prior to being loaded onto the SLM, a hologram with 1920×1072 resolution underwent zero-padding preprocessing to match the SLM’s native format. To ensure that the SLM operates exclusively in phase-modulation mode, a polarizer was employed to regulate the polarization orientation of the illuminating light source. Ultimately, the holographic reconstruction images were captured by a camera (Canon EOS 90D), with meticulous attention to maintaining consistent imaging conditions throughout the experiments. The entire optical setup was mounted on a vibration-isolated platform to eliminate mechanical disturbances and ensure measurement reproducibility.

    (a) Schematic representation of the experimental optical setup; (b) photograph of the implemented optical configuration.

    Figure 8.(a) Schematic representation of the experimental optical setup; (b) photograph of the implemented optical configuration.

    Figure 9 presents the optical reconstruction results, with insets showing threefold magnified views of selected regions for detailed analysis. A comprehensive comparative evaluation was performed to assess the reconstruction quality of CDANN, three alternative approaches (Holo-encoder, HoloNet, and CCNN), and the conventional iterative GS algorithm, with particular emphasis on both image fidelity and feature preservation. It is pertinent to acknowledge that various factors, including the suboptimal fill factor of the SLM, camera instability, and ambient light conditions, may introduce certain aberrations within the experimental results. Although the iterative GS algorithm achieves the highest optical reconstruction fidelity, this superior performance is attained at the expense of substantially increased computational complexity. The experimental results reveal that holographic reconstructions generated by the Holo-encoder algorithm exhibit noticeable artifacts and irregularities. These limitations primarily stem from the architecture’s reliance on a single network for hologram prediction, resulting in inadequate representation of complex scene details and compromised feature fidelity. In comparison with Holo-encoder, the HoloNet approach demonstrates superior artifact suppression capabilities through architectural optimization, yielding enhanced reconstruction quality. While CCNN exhibits improved performance over Holo-encoder in both artifact reduction and overall image clarity, its ability to accurately reproduce fine structural details remains limited, indicating potential areas for further architectural refinement. Notably, the CDANN algorithm demonstrates superior performance in both artifact suppression and feature representation, outperforming both HoloNet and CCNN in detailed scene reconstruction. This enhanced capability stems from the network’s comprehensive feature extraction mechanism, which enables more accurate reproduction of fine details and improved texture fidelity in the reconstructed images. Both numerical simulations and experimental results consistently demonstrate that the CDANN algorithm achieves significant speckle noise reduction while maintaining computational efficiency. This dual capability of enhanced reconstruction quality and low computational overhead results in superior overall performance compared to existing approaches.

    Captured optical reconstruction images of POHs generated by several methods. (a) The GS method; (b) the Holo-encoder method; (c) the HoloNet method; (d) the CCNN method; (e) the CDANN method.

    Figure 9.Captured optical reconstruction images of POHs generated by several methods. (a) The GS method; (b) the Holo-encoder method; (c) the HoloNet method; (d) the CCNN method; (e) the CDANN method.

    4. Discussion and Conclusion

    In this paper, we introduce a novel CDANN to generate the POH with the dual objectives of enhancing the fidelity of reconstructed images and augmenting computational velocity in comparison to conventional neural network architectures. The network design incorporates a C-PConv module within the down-sampling phases of dual U-Net architectures, improving processing efficiency via channel-specific operations. To further enhance the quality of reconstructed images, a DAC module and an SPP module have been seamlessly integrated into the U-Net architecture. The integration expands the U-Net’s receptive field while facilitating effective cross-layer feature fusion. The computational results demonstrate that the proposed CDANN algorithm generates high-quality 1920×1072 resolution POHs with an average PSNR of 32.19 dB and SSIM of 0.892 at a speed of 24 ms per frame. Both numerical simulations and experimental validations consistently demonstrate that the CDANN is capable of producing high-quality reconstructed images, while reducing the requisite running time and computational resources. The capabilities exhibited by our proposed method not only validate its technical merits but also underscore its significant potential for deployment in real-time holographic display technologies, where rapid and accurate holographic reconstruction is of paramount importance.

    References

    [1] C. Slinger, C. Cameron, M. Stanley. Computer-generated holography as a generic display technology. Computer, 38, 46(2005).

    [2] E. Sahin, E. Stoykova, J. Mäkinen et al. Computer-generated holograms for 3D imaging. ACM Comput. Surv., 53, 32(2021).

    [3] D. Pi, J. Liu, Y. Wang. Review of computer-generated hologram algorithms for color dynamic holographic three-dimensional display. Light Sci. Appl., 11, 231(2022).

    [4] Y. Pan, J. Liu, X. Li et al. A review of dynamic holographic three-dimensional display: Algorithms, devices, and systems. IEEE Trans. Ind. Inform., 12, 1599(2016).

    [5] J. Zhang, N. Pégard, J. Zhong et al. 3D computer-generated holography by non-convex optimization. Optica, 4, 1306(2017).

    [6] D. Pi, Y. Ye, K. Cheng et al. Temporal multiplexing complex amplitude holography for 3D display with natural depth perception. Opt. Lett., 50, 1160(2025).

    [7] R. W. Gerchberg, W. O. Saxton. A practical algorithm for the determination of phase from image and diffraction plane pictures. Optik, 35, 237(1972).

    [8] Y. Wu, J. Wang, C. Chen et al. Adaptive weighted Gerchberg-Saxton algorithm for the generation of the phase-only hologram with artifacts suppression. Opt. Express, 29, 1412(2021).

    [9] M. Makowski, M. Sypek, A. Kolodziejczyk et al. Three-plane phase-only computer hologram generated with iterative Fresnel algorithm. Opt. Eng., 44, 125805(2005).

    [10] P. Zhou, Y. Li, S. Liu et al. Dynamic compensatory Gerchberg-Saxton algorithm for multiple-plane reconstruction in holographic displays. Opt. Express, 27, 8958(2019).

    [11] Z. Wang, T. Chen, Q. Chen et al. Reducing crosstalk of a multi-plane holographic display by the time-multiplexing stochastic gradient descent. Opt. Express, 31, 7413(2023).

    [12] C. Chen, B. Lee, N. N. Li et al. Multi-depth hologram generation using stochastic gradient descent algorithm with complex loss function. Opt. Express, 29, 15089(2021).

    [13] C. K. Hsueh, A. A. Sawchuk. Computer-generated double-phase holograms. Appl. Opt., 17, 3874(1978).

    [14] X. Sui, Z. He, G. Jin et al. Band-limited double-phase method for enhancing image sharpness in complex modulated computer-generated holograms. Opt. Express, 29, 2597(2021).

    [15] P. Tsang, T. C. Poon. Novel method for converting digital Fresnel hologram to phase-only hologram based on bidirectional error diffusion. Opt. Express, 21, 23680(2013).

    [16] D. Pi, Y. Ye, K. Cheng et al. Speckle-free 3D holography in the Wigner domain. Laser Photonics Rev., 19, 2401828(2025).

    [17] D. Pi, J. Liu, J. Wang et al. Optimized computer-generated hologram for enhancing depth cue based on complex amplitude modulation. Opt. Lett., 47, 6377(2022).

    [18] D. Blinder, T. Birnbaum, T. Ito et al. The state-of-the-art in computer-generated holography for 3D display. Light Adv. Manuf., 3, 35(2022).

    [19] J. W. Kang, B. S. Park, J. K. Kim et al. Deep-learning-based hologram generation using a generative model. Appl. Opt., 60, 7391(2021).

    [20] R. Horisaki, R. Takagi, J. Tanida. Deep-learning-generated holography. Appl. Opt., 57, 3859(2018).

    [21] A. Sinha, J. Lee, S. Li et al. Lensless computational imaging through deep learning. Optica, 4, 1117(2017).

    [22] K. Kavaklı, H. Urey, K. Akşit. Learned holographic light transport: invited. Appl. Opt., 61, B50(2022).

    [23] L. Shi, B. Li, C. Kim et al. Towards real-time photorealistic 3D holography with deep neural networks. Nature, 591, 234(2021).

    [24] M. H. Eybposh, N. W. Caira, M. Atisa et al. DeepCGH: 3D computer-generated holography using deep learning. Opt. Express, 28, 26636(2020).

    [25] Y. Peng, S. Choi, N. Padmanaban et al. Neural holography with camera-in-the-loop training. ACM Trans. Graph., 39, 185(2020).

    [26] J. Wu, K. Liu, X. Sui et al. High-speed computer generated holography using an autoencoder-based deep neural network. Opt. Lett., 46, 2908(2021).

    [27] K. Liu, J. Wu, L. Cao. 4K-DMDNet: Diffraction model-driven network for 4K computer-generated holography. Opto-Electron. Adv., 6, 220135(2023).

    [28] X. Song, J. Dong, M. Liu et al. Real-time intelligent 3D holographic photography for real-world scenarios. Opt. Express, 32, 24540(2024).

    [29] C. Zhong, X. Sang, B. Yan et al. Real-time high-quality computer-generated hologram using complex-valued convolutional neural network. IEEE Trans. Vis. Comput. Graph., 30, 3709(2023).

    [30] K. Matsushima, T. Shimobaba. Band-limited angular spectrum method for numerical simulation of free-space propagation in far and near fields. Opt. Express, 17, 19662(2009).

    [31] J. Chen, S. Kao, H. He et al. Run, don’t walk: Chasing higher FLOPS for faster neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 120, 21(2023).

    [32] L. C. Chen, G. Papandreou, I. Kokkinos et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell., 40, 834(2017).

    [33] R. Timofte, E. Agustsson, L. Van Gool et al. NTIRE 2017 challenge on single image super resolution: Methods and results. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 114, 125(2017).

    [34] E. Agustsson, R. Timofte. NTIRE 2017 challenge on single image super-resolution: Dataset and study. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 126, 135(2017).

    Yunrui Wang, Wenqiang Wan, Jiahui Fu, Yanfeng Su, "Complex-valued dense atrous neural network for high-quality computer-generated holography," Chin. Opt. Lett. 23, 120501 (2025)
    Download Citation