- Photonics Research
- Vol. 13, Issue 12, 3383 (2025)
Abstract
1. INTRODUCTION
Light field manipulation, by controlling the information carried by the wavefront to achieve complex spatial distributions and novel physical effects, has long been a significant frontier in photonics research. In recent years, the advantages of light field manipulation through metasurfaces have gained widespread attention. Among all light field manipulation applications based on metasurfaces [1–5], the most representative is 3D holography. Its applications in 3D display [6], virtual reality [7], augmented reality [8], and human-computer interaction [9] have driven considerable researches to realize 3D holography using metasurfaces [10,11]. While it is relatively simple to use polarization or frequency multiplexing to achieve multi-channel 3D holography [12,13], it consumes the total number of degrees of freedom for wavefront modulation. In the more challenging realm of single-channel 3D holography, iterative algorithms such as the Gerchberg–Saxton (GS) algorithm are often used to optimize the corresponding metasurfaces [14,15]. But these works mainly produce 2.5D holograms, where different image planes are formed at different distances, and the transitions between images are discontinuous. True 3D holographic metasurfaces enable continuous control of light field distribution over a certain propagation distance. They hold great potential for specialized beam shaping and the generation of specific modes, such as high order laser beams and kinds of vector beams, which are vital for optical component research across all frequency bands, particularly the underdeveloped terahertz regime with limited device options. Previously, we demonstrated 3D holography of a simple bar pattern by applying a propagation phase to 2D holograms [16]. However, there is no analytical expression for the propagation phase required for complex 3D holographic patterns. The reason lies in the fact that traditional metasurface design methods only cover limited points and regions within their design space, wasting a significant portion of design degrees of freedom.
AI-enabled inverse design offers unprecedented opportunities to fully exploit the design freedom of metasurfaces. Deep learning, as a powerful computational tool, can reveal the complex relationships linking the geometry and material properties of meta-atoms to their control over amplitude and phase [17,18]. Thus, in recent years, AI assisted metasurface design has attracted lots of research interest in the holography community. Fan
In this work, at 0.75 THz, we demonstrate that, using only a limited set of image planes as feedback, a well-trained neural network can understand the full spatial constraints formed by the 3D motion of image content, such as continuous rotations and reciprocal translations accompanying propagation. And AI can automatically design silicon pillar-based terahertz metasurfaces to realize the target 3D hologram. Furthermore, we propose a novel loss function evaluation strategy that assesses the similarities and differences between the generated image and the target image by partitioning the image and applying weights, which significantly enhances the recognizability of image patterns. It is noteworthy that, when compared with the GS algorithm, the 3D hologram designed by AI shows more uniform intensity. Additionally, in cases of physically unsolvable issues such as curved light propagation, AI is capable of finding 2.5D approximate solutions that GS algorithms could not find. This work demonstrates that AI empowerment is a necessary complement for designing complex light field control devices of terahertz band and holds promise for extension to other frequencies.
Sign up for Photonics Research TOC. Get the latest issue of Photonics Research delivered right to you!Sign up now
2. RESULTS
Figure 1 illustrates the architecture and design process of the deep learning network proposed in this work, which consists of two sub-modules connected in tandem: the forward prediction network and the inverse design network. The design process of this framework includes a preparation section and an inverse design section. (1) The first step in preparation section is to train the forward network, which predicts the transmission amplitude and phase of the meta-atoms based on gradient descent algorithm. Given the input geometric dimensions, this network can accurately, quickly, and automatically predict the transmission amplitude and phase of the meta-atoms. (2) Five image planes are sampled at various distances in the target 3D hologram, and the 2D grayscale images on these five planes are combined to serve as the ground truth. Then, the inverse design process begins. (i) The ground truth is fed into the inverse design network, which adjusts the weights and biases of the network’s neurons using the gradient descent algorithm to predict the geometric distribution of the meta-atoms. (ii) The generated structural parameters are input into the forward network to predict the transmission amplitude and phase of the meta-atoms. (iii) The Rayleigh–Sommerfeld integral is applied to calculate the 2D patterns produced on the five sampled image planes, which are compared with the ground truth to compute the total loss function. (iv) Based on the evaluation results, the inverse network’s neurons are updated to generate a new geometric distribution of the metasurface, and the process returns to step (i) for the next iteration. This iterative process continues until the loss function is minimized. At this point, the output meta-atoms geometric parameters are considered the final design.

Figure 1.Flow chart of the 3D holographic metasurface design method assisted by the deep tandem neural network.
A. Pretrained Forward Prediction Network
First, we use a dataset of full-silicon rectangular pillars, constructed based on numerical simulations, to train the forward prediction network using the gradient descent algorithm. The meta-atoms in this dataset consist of center-symmetric rectangular high-resistivity silicon pillars, 200 μm in height, and a substrate with a thickness of 2.8 mm and a fixed period of . The training and test sets account for 80% and 20% of the 8224 silicon columns in the dataset, respectively. The forward network, composed of fully connected layers, has the transmission complex amplitude of the meta-atoms ranging from 0.3 to 1.2 THz as the ground truth, which is computed using numerical simulation software under periodic boundary conditions. The input to the network consists of the length and width of the silicon pillars (see the inset in Fig. 1), and the output is the predicted transmission complex amplitude . After 50,000 iterations of training, the mean squared error (MSE) loss between the predicted and the ground truth for all the silicon pillars in the training and test sets reaches approximately with no further significant decrease, signaling the end of the training process, as shown in Fig. 2(a). The trained forward network demonstrates excellent spectral prediction capabilities, as shown in Figs. 2(d) and 2(e), in which the network perfectly matches the predicted transmission amplitude and phase spectra with the results obtained from commercial electromagnetic simulation software for a randomly chosen meta-atom in the test set. Importantly, under the computing power of Intel Core™ i5-12600K CPU, RAM 32 GB, the network’s single prediction time is less than 32 ms, which is only 0.4‰ of the 80 s required by commercial numerical software. This rapid and highly accurate prediction ensures the feasibility of the subsequent inverse design of large-area metasurfaces in terms of both time efficiency and precision.
![]()
Figure 2.Training performance of the forward prediction neural network. (a) Convergence of the network training. (b) Amplitude and phase coverage of the original meta-atom dataset. (c) Amplitude and phase coverage of the rotated meta-atom dataset. (d) and (e) Example of amplitude and phase fitting using the forward network. (f) and (g) Example of comparison between the calculation results and the simulation results for rotated meta-atom amplitudes and phases.
To further expand the control over the amplitude and phase of the meta-atoms, we introduce planar rotation, in addition to the length and width of the silicon pillars, as a new design degree of freedom. This allows for an extended design space for the meta-atoms (as shown in the inset of Fig. 1). The transmission polarization change induced by planar rotation can be calculated as
B. Inverse Design by Tandem Neural Networks
In this study, the inverse network consists of a convolutional neural network (CNN), aiming to design a metasurface to achieve true 3D holography at a specific frequency . To let the network understand the target 3D hologram, five grayscale image planes at various distances in the target 3D hologram are combined as the ground truth for training the inverse network. These five images are arranged as a single-channel matrix with the format (image -dimension pixels, image -dimension pixels, image -dimension pixels, batch size, channels) to fit the PyTorch framework. The output of the network is the geometric parameter distribution of each meta-atom. Specifically, the inverse network is composed of two parts: 3D convolutional layers for feature extraction and transposed 3D convolutional layers for structure generation.
The feature extraction part uses 3D convolution kernels to reorganize dimensions and extract information from the input images. The structure generation part consists of three identical transposed 3D convolution modules, each responsible for reconstructing one of the three geometric parameters of the meta-atoms at a given position: length, width, and rotation angle. The transposed convolution kernels recover the length and width data to match the pixel count of the metasurface, while the -dimension pixel remains constant at 1 (since the metasurface consists of a single layer). The format is (metasurface -dimension pixels, metasurface -dimension pixels, 1, batch size, channels). Based on GPU parallel computing, the network generates the geometric size distributions , , and for each silicon pillar on the metasurface in each iteration, where the superscript denotes the th iteration. This approach realizes an end-to-end automated inverse prediction from the five-image-plane input to the geometric parameters’ distribution of the meta-atoms.
The distributions and are fed into a forward network with frozen gradients to predict the electric field components and for non-rotated meta-atoms, followed by a rotation of angle to calculate by using Eq. (1). Then is applied to calculate the five image planes through the Rayleigh–Sommerfeld (RS) diffraction integral, as shown in Eq. (2):
C. Deep Learning-Assisted Metasurface Design for Single-Channel 3D Holography
As mentioned in the introduction section, single-channel true 3D holography remains the most challenging task in metasurface design. Here, a uniformly rotating alphabet “S” with a rotation speed of 20 deg/mm and an image plane scale of is chosen as the first true 3D hologram target. The choice of letter “S” is due to its complexity compared to the bar shape in Ref. [15] since the method of adding propagation-direction rotating phases to the pattern, as shown in Appendix A, is not applicable to “S.” Five image planes at 3, 4, 5, 6, and 7 mm above the metasurface as the ground truth are calculated as shown in Fig. 3(a) to train the inverse network, which then outputs the , , and for meta-atoms. When the training converges (), the calculated image planes through the RS integral are shown in the second row of Fig. 3(a), where the counterclockwise rotation behavior of the letter “S” with propagation perfectly matches the ground truth. Upon careful analysis, the edges of letter “S” at 5 mm are the sharpest, while those at 3 and 7 mm exhibit noticeable blurring. However, the letter “S” is generally clear and distinguishable at all sampled image planes. The third row of Fig. 3(a) displays the simulation results of the designed metasurface in commercial electromagnetic software, with a Gaussian beam as the incident light and open boundary conditions. The numerically calculated rotation angle still perfectly matches the true value, validating the success of the inverse network from simulation. Compared to the RS results in Fig. 3(a), there are more noise points outside the letter in the simulation, and the sharpness of the letter’s edges slightly decreases. The differences between the second and the third rows of Fig. 3(a) are due to the lower calculation accuracy of the diffraction integral compared to electromagnetic simulations. To experimentally verify the effectiveness of the inverse network design, we fabricated the metasurface designed by the network on a high-resistivity silicon wafer using photolithography and dry etching processes. The five images at different distances were scanned under cross-polarized detection using a home-built terahertz probe spectroscopy system. Please refer to the Appendix B section for specific sample preparation procedures and experimental details. The fourth row of Fig. 3(a) presents the experiment results, where the letter “S” is quite clear on all five image planes, and the rotation angles highly match the ground truth. Compared to the calculated results, the measured field intensity in the middle region of letter “S” is more concentrated, and the noise in the blank areas is the most prominent. These deviations are mainly attributed to the incident conditions not fully satisfying the plane wave incidence. In fact, as described in the Appendix B section, the incident terahertz beam is a collimated Gaussian-like beam, which does not fully align with the incident conditions assumed in the integral calculation and simulation. Additionally, manufacturing error during the fabrication of the metasurface sample is also one of the significant reasons for these deviations. Even so, the measured results highly agree with the true values, integral calculations, and simulations.
![]()
Figure 3.3D holography and experimental setup. (a) Holographic imaging results of the letter “S” rotating counterclockwise during propagation: ground truth (first row), RS integral calculation (second row), simulation results (third row), and experimental results (fourth row). (b) Simulation results at the target distances and intermediate distances. (c) Optical microscope image of metasurface made up of silicon pillars. Scale bar, 400 μm. (d) Schematic of the spatially resolved terahertz probe detection system. Antenna, terahertz photoconductive antenna; Lens, terahertz lens; LP, terahertz linear polarizer; MS, metasurface sample; Probe, terahertz near-field probe.
The next crucial issue we must answer is whether the inverse network fully understands the goal of “uniform rotation during propagation” and has generated a metasurface to realize this 3D holography, or whether it has only achieved rotation on the five image planes of interest (i.e., 2.5D holography). Given the high consistency between the simulation and measurement results, we investigated five additional image planes at 2.5, 3.5, 4.5, 5.5, and 6.5 mm in the simulation that were not involved in the network training process. The results, as shown in Fig. 3(b), indicate that the letter “S” continuously rotates following the preset rotation speed of 20 deg/mm. This undoubtedly confirms that: (1) the trained inverse network understands the preset true 3D hologram target through a limited but related set of images; (2) the trained inverse network successfully finds a solution in the design space of the metasurface to realize true 3D holography.
To check the scalability and practicality of the proposed design architecture, we packaged the five image planes of each of the 10 letters from “A” to “J” together as input to train the inverse network. The trained inverse network achieved a total MSE loss of 0.030 and possessed the capability to design the corresponding metasurface for any of these letters within milliseconds to achieve a continuous and uniform 3D hologram rotation of 20 deg/mm. Details are provided in Appendix C.
D. Achieving Complex 3D Holography through Partitioned and Weighted Loss Function
The network architecture introduced above, along with the method of using five image planes to constrain true 3D holography, effectively addresses the issue of uniform rotation of patterns with propagation. However, when applied to more complex non-uniform rotations, even for the simplest patterns, the metasurface given by the inverse network is not satisfactory. As shown in Fig. 4(a), the 3D hologram target is a bar rotating non-uniformly accompanying propagation, where the rotation angles on the five image planes at 3, 4, 5, 6, and 7 mm are 0°, 30°, 60°, 70°, and 80°, respectively (rotating slower as propagating farther). The metasurface designed using the process described in the previous section does not exhibit exceptional holographic effects in numerical simulations. As depicted in Fig. 4(a), although the bar is still recognizable on the image planes at 4, 5, and 6 mm, the edges of the pattern are no longer clear at 3 and 7 mm, and even the shape of the bar cannot be maintained.
![]()
Figure 4.Holography of non-uniform rotation during propagation. (a) Ground truth for 3D imaging (first row) and simulation without loss function control (second row). (b) Schematic of the partitioned loss function control strategy, where the blue box represents the pattern loss region and the black area represents the background loss region. (c) Imaging results with loss function control: RS integral calculation (first row), simulation results (second row), and experimental results (third row).
On the other hand, Fig. 4(a) demonstrates that the rotation angle of the bar aligns with the preset target, indicating the feasibility of achieving non-uniform rotation. The image blurriness can be attributed to the imperfect setting of the total loss function for the image planes: the total MSE defined in Eq. (3) necessitates minimizing noise on the five image planes, combined with the intricate requirements of 3D holography, potentially leading to no solution that can balance both non-uniform rotation and noise reduction on the desired image planes. The design approach at this point should differentiate the noise: those close to the bar strongly impact pattern recognition and the accuracy of rotation angles; thus, the induced loss should be weighted higher, whereas those farther from the pattern can be weighted lower. In this case, the calculation of the loss function is divided into two regions, as shown in Fig. 4(b). The loss within the blue box is pattern loss, and the loss outside the bar is background loss, with weights of 1 and 0.85, respectively. By retraining the inverse network with this partitioned and weighted loss function and designing the corresponding metasurface, good consistency is achieved among the RS simulated and experimentally obtained holographic images, as shown in Fig. 4(c). It can be seen that the non-uniform rotation with a partitioned and weighted loss function is equally accurate. More importantly, the bar images on each plane exhibit significantly improved edge clarity, especially at 3 and 5 mm, where the integrity of the pattern far exceeds the corresponding results without partitioning. Analysis of the simulation results in Fig. 4(c) reveals that the number of noise points outside the box exceeds those at the same locations without partitioned loss function. However, these noise points in the background loss region do not affect the clarity and recognition of the pattern, which aligns perfectly with our conjecture about the cause of image blurriness. Therefore, the method proposed in this paper, partitioning and weighting the loss function, is an effective network training optimization approach from a practical perspective. Its core lies in focusing the attention of the network on the realization of 3D hologram and weighing the demand for noise reduction according to the impact of noise on the image.
Although the uniform and non-uniform rotations of the aforementioned letter “S” and the bar pattern as they propagate are complicated, the results indicate that correct solutions for these two types of 3D holograms can still be found within the rich design space of metasurfaces. From the practical application perspective of inverse networks, however, not all light field manipulation requirements are necessarily reasonable (convex optimization-friendly), such as the curved light propagation of the cross pattern shown in Fig. 5(a). Specifically, as the light propagates from 3 mm to 7 mm, the cross continuously moves downward from (), reaching its lowest point at () via (), and then continuously moves upward along the same path until back to (). This objective is physically unreasonable because it challenges the fundamental principle that light travels in a straight line in vacuum. Although some special beam patterns, such as Airy beams, exhibit diffraction characteristics, the cross pattern differs significantly from Airy spots, and the degree of curved propagation set for the beam in this target far exceeds the capabilities of Airy beams.
![]()
Figure 5.Holography of quasi-curved light propagation effect. (a) Ground truth for 3D holography (first row) and experimental imaging results without partitioned control loss function (second row). (b) Schematic of the partitioned control loss function strategy: the blue box represents the pattern loss region, the gray blurred cross pattern represents the ghosting loss region, and the black area represents the background loss region. (c) Imaging results with partitioned control loss function: RS integral calculation (first row), simulation results (second row), and experiment results (third row). (d) RS integral calculation results at the target distances and intermediate distances.
To optimize the partition and weight of the loss function, the network was initially trained using a non-partitioned loss function, and the measurement results of the designed metasurface on the five image planes are shown in Fig. 5(a). It is evident that the curved light propagation of the cross was not achieved. Specifically, although the cross appears at the correct -position for the 4 and 7 mm image planes, its shape and intensity are not prominent. A more prominent issue is the severe ghosting, i.e., the cross pattern appears at incorrect -positions for the 3, 5, and 6 mm image planes, which are far more obvious than the images at the correct -position, severely affecting the recognition of the correct cross. Therefore, in this example, not only a box was drawn around the edge of the cross as shown in Fig. 5(b) to divide the pattern region and background region in the loss function, but also a ghosting loss was defined to evaluate the impact of the ghosting on the recognition of the correct cross pattern. Additionally, considering the divergence of the beam as it propagates, the weights of different image planes can be adjusted accordingly. Therefore, the total loss function was weighted and adjusted according to four dimensions: pattern, background, ghosting, and image plane (detailed process in Appendix D). The inverse network was trained until the total MSE loss converged to a certain value 0.152, and meta-atoms’ , , and were obtained by inputting the five target image planes. The first row of Fig. 5(c) shows the generated images at the five distances, where it can be seen that the cross appears clearly at the correct -position. Although the noise at other positions is severe, the correct cross and its position can be clearly identified in terms of shape and intensity. The middle and bottom rows of Fig. 5(c) show the simulated and the measured results of the fabricated metasurface. (The five images were measured at , 3.5, 4.5, 6, and 7 mm.) It can be seen that the measured results accord well to the simulation and are significantly improved compared to the non-partitioned loss function case [Fig. 5(a)], especially during the upward movement of the cross from to 6.2 mm, which further demonstrates the advantages of our strategy for balancing the loss function through partitioning. The deviation of the measured results from the simulation results in Fig. 5(c) is also due to the imperfect terahertz incidence in the measurement and the fabrication errors.
It is important to note that the metasurface design does not produce a true target 3D hologram, as this limitation is dictated by the physically unreasonable nature of curved light propagation. As shown in the RS results in Fig. 5(d), the cross does not move continuously along the -axis. Instead, the intensity of the cross is distributed differently on the three fixed -axis positions: upper (), middle (), and lower (). This results in a 2.5D hologram, which most closely approximates the true patterns of the five image planes. This 2.5D “approximate solution” is achievable because, while the five image planes are derived from the target true 3D hologram, the limited number of image planes allows the network to converge to only these five planes. The loss function’s optimized partition and weighting further reduce the demand for noise suppression by distinguishing the significance of noise, preventing the network from collapsing due to overfitting or becoming constrained to local solutions. Instead, the proposed network finds a globally optimal 2.5D holographic approximation within the metasurface design space for this physically unreasonable problem. This not only reduces dependence on human expertise but even surpasses traditional design experience.
E. Comparison between Deep Learning-Assisted Metasurface Design and GS Algorithm
Apart from the inverse network-based design approach for metasurfaces showcased previously, iterative algorithms such as the Gerchberg–Saxton (GS) algorithm [22] can also serve in the automated design of light field manipulation devices. Compare to traditional iterative algorithms, what are the advantages of AI-powered true 3D holographic metasurface design? Since the GS algorithm is primarily used for 2D computational holography, we have improved the method for updating the amplitude and phase of the object plane in GS algorithm: similarly, five image planes in the 3D holography are selected as the ground truth. In each iteration, all image planes are computed based on the current object plane. Then, the amplitude and phase of each image plane are backpropagated to the object plane. After adding the five complex amplitudes together, the total amplitude and phase are calculated to update the object plane, completing one iteration (for detailed calculations of the GS algorithm, refer to Appendix E).
Figures 6(a) and 6(b) present the simulated images generated by the metasurface designed using the improved GS algorithm for the rotated letter “S” and bar, respectively. It is evident that the GS algorithm can also interpret the target 3D holography and find a corresponding solution within the design space of the metasurface. However, there are notable differences between the AI design and the GS counterpart. (1) The geometric parameters of the metasurface and, thus, the distribution given by AI and the GS algorithm are completely different. (2) The holographic patterns realized by the GS algorithm exhibit more energy concentrated at the center of the image planes. Specifically, in the case of the letter “S,” the average variance of the amplitudes of the pattern areas in the five image planes realized by the GS algorithm is approximately 0.030, while that of the deep learning method is approximately 0.024. For the bar, the average variance given by the GS algorithm is about 0.043, compared to approximately 0.033 for the deep learning method. Given that the pattern brightness of the truth values is a constant, the AI design is superior to the GS algorithm from the perspective of pattern uniformity. (3) For the rotated letter “S,” the imaging results at 3 and 4 mm no longer maintain the “S” shape, but tend to look like “.” (4) For the rotated bar, the imaging at 3 mm shows incorrect rotation instead of being horizontal, and the bar shapes at different distances are also distorted.
![]()
Figure 6.Comparison of simulated results for improved GS algorithm and deep learning for holographic effects. (a) Counterclockwise rotation of the letter “S”: GS algorithm (first row), network design (second row). (b) Nonlinear rotation of a bar pattern: GS algorithm (first row), network design (second row). (c) Quasi-curved light propagation: GS algorithm (first row), network design (second row).
For the physically unreasonable problem of curved light propagation of the “cross” shape, the failure of the GS algorithm is illustrated in Fig. 6(c). Whether evaluated in terms of reproducing the “cross” shape or the reciprocal translation along the -direction, the GS algorithm falls significantly short of the performance achieved by AI across the five image planes. We have also attempted other iterative methods within the GS algorithm (see Appendix E), but none of them could find an approximate solution for the 2.5D hologram as effectively as AI. This contrast fully demonstrates the powerful capabilities of deep learning in designing complex light field manipulation devices: it not only performs better than other methods in achieving 3D holography but also provides approximate solutions for target 3D holography based on limited inputs when faced with physically unreasonable problems. We speculate that the moderately relaxed training objectives—both the constrained number of image planes and partitional weighted loss function—enabled the network to discover approximate solutions where conventional methods fail. The AI assisted metasurface design methodology discussed here serves as a necessary complement and versatile approach to the research and development of complex light field manipulation devices, surpassing traditional methods and human expertise in this field. The source code has been made publicly available on GitHub [23].
3. CONCLUSION
In this paper, we employ a limited number of image planes as criteria to guide the inverse design of metasurfaces using neural networks for achieving complex 3D light field manipulations such as 3D rotation, non-uniform rotation, and quasi-curved light propagation of holographic patterns along the propagation direction. Experimental validation in the terahertz band demonstrates the ability of AI to fully explore the design space of metasurfaces. The peak signal-to-noise ratios (PSNRs) for holographic reconstructions of the “S,” bar, and cross patterns can reach 11.91 dB, 13.11 dB, and 12.39 dB, respectively. Starting from a pragmatic design philosophy, we take the recognizability of patterns as the sole basis for assessing the impact of noise. On this foundation, we refine the calculation regions and weights of the loss function, thereby maximizing the design freedom of the neural networks. This method is capable of exploring the entire design parameter space of metasurfaces. For physically reasonable problems, it can find metasurface parameters comparable to those obtained by analytical methods. More importantly, for physically unreasonable problems, the neural network successfully avoids failure and finds specific combinations within the design space of metasurfaces to best meet the requirements of all image planes, surpassing the capabilities of traditional analytical or iterative forward design. The 3D holographic metasurfaces presented in this paper provide an approach for continuously controlling the 3D light field, which can benefit the design of specific mode-generating devices that are currently lacking in the terahertz band. Besides, the approach demonstrated in this study is not confined to the terahertz band; it can be extended to the design of true 3D holographic devices operating across other frequency bands.
APPENDIX A: FAILURE OF PROPAGATING PHASE FOR COMPLEX 3D HOLOGRAPHY
We referenced a method proposed in Ref. [
![]()
Figure 7.The simulated images of complex patterns by applying rotational phase.
APPENDIX B: EXPERIMENTAL CHARACTERIZATION
The performance of the designed metasurface was experimentally characterized using a laboratory-made spatially resolved terahertz probe detection system. A photoconductive antenna emitted -polarized THz pulses, which were collimated by a 25 mm focal length lens into a 10 mm diameter beam. To enable polarization conversion—necessary due to the cross-polarized design and the -polarization sensitivity of the probe—the beam passed through two THz linear polarizers (P1 at 45° and P2 at 90°), resulting in -polarized output. A photoconductive probe mounted on a 2D motorized translation stage with 0.2 mm steps detected the amplitude and phase of the THz field 3–7 mm above the metasurface.
APPENDIX C: TRAINING FOR TEN LETTERS WITH VARIED INITIAL ANGLES
The proposed framework not only enables 3D rotation of individual letters but also supports simultaneous training of multiple patterns with varying initial angles. Using 10 letters from “A” to “J” as examples, we randomly assigned their initial orientations and trained a 3D hologram design network to achieve rotation for each letter. Figure
![]()
Figure 8.Training results for 10 letters with diverse initial angles.
APPENDIX D: LOSS CONTROL STRATEGY FOR CROSS PATTERN PSEUDO-3D HOLOGRAMS
System.Xml.XmlElementSystem.Xml.XmlElementSystem.Xml.XmlElement
Finally, the linear combination of the three losses forms the total loss function for training the network, as shown in the following equation:
APPENDIX E: GS ALGORITHMS FOR 3D HOLOGRAPHY
We improved the traditional GS algorithm to simultaneously iterate five images at different depths. Unlike the standard 2D GS algorithm, which uses a single target image as ground truth, our approach applies inverse Rayleigh–Sommerfeld diffraction to five target images, generating corresponding complex amplitude distributions at the metasurface. We tested three superposition methods to combine these distributions, as illustrated in Fig.
![]()
Figure 9.Improved GS algorithm of 3D holography. (a) The schematic of the improved GS algorithm for 3D holography. (b) The uniform rotation of the letter “S.” (c) The non-uniform rotation of the bar pattern. (d) The reciprocal moving of the cross generated by the metasurface.
We drew inspiration for handling amplitude and phase from the calculation of the centroid. For a polygon defined by a set of vertices , for , where is the number of vertices in the polygon, its centroid can be calculated using the following formulas:
The five complex amplitude distributions at the metasurface were first obtained via inverse Rayleigh–Sommerfeld diffraction. By plotting their real and imaginary parts on the complex plane (real as -axis, imaginary as -axis), the five points form a pentagon. Since the centroid of a regular polygon is equidistant from all vertices, we used the centroid’s coordinates to represent the real and imaginary part (Re, Im) of complex amplitude on the metasurface.
APPENDIX F: CROSS-FREQUENCY IMAGING PERFORMANCE OF THE “S”-PATTERN
The cross-frequency imaging performance shown in Fig.
![]()
Figure 10.The holograms of letter “S” across frequency band from 0.5 to 1.0 THz.
APPENDIX G: EXPERIMENTAL RESULTS OF “S”-PATTERN IMAGING AT TARGET AND INTERMEDIATE DISTANCES
As for the rotation continuity of the image, Fig.
![]()
Figure 11.The “S” letter images at the trained and non-trained
APPENDIX H: COMPARATIVE ANALYSIS OF GANS AND OUR FRAMEWORK FOR METASURFACE DESIGN
Generative adversarial networks (GANs), as representative generative models, demonstrate notable advantages in metasurface design. For instance, GANs possess excellent structure-generation capabilities, making them suitable for creating high-degree-of-freedom mosaic meta-atom structures. Additionally, their training requires relatively smaller datasets compared to conventional approaches. However, GANs exhibit inherent limitations: their training relies on adversarial optimization between generator and discriminator—essentially solving a minimax problem rather than directly minimizing a conventional loss function. This adversarial mechanism often causes unstable training dynamics, including potential mode collapse and convergence difficulties.
Conversely, our proposed framework integrates an inverse-design network with a forward-prediction network into a differentiable architecture. This end-to-end differentiable system enables direct backpropagation of imaging-plane errors to structural parameters, achieving stable gradient-descent optimization toward loss minimization. Further enhancing controllability, we introduced a regionally weighted loss strategy allowing customizable zone partitioning and prioritized optimization based on imaging priorities, significantly improving holography image recognition. Importantly, rectangular silicon pillars used in this work inherently provide sufficient amplitude-phase modulation capabilities, eliminating the need for more complex pixelated meta-atoms where GANs typically excel.
APPENDIX I: TRAINING CURVES OF THE INVERSE NETWORK AND PHASE DISTRIBUTIONS AT THE METASURFACE DURING THE OPTIMIZATION PROCESS
The training curves of the inverse network and phase distributions at the metasurface during the optimization process are shown in Figs.
![]()
Figure 12.Training curve of the inverse network.
![]()
Figure 13.Variation of the phase distribution at the metasurface.
APPENDIX J: COMPARISON OF HOLOGRAPHIC PERFORMANCE: NETWORK APPROACH VERSUS GS ALGORITHM
The performance comparison between our proposed deep learning approach and the GS algorithm is shown in Fig.
![]()
Figure 14.Comparison of holographic performance.
APPENDIX K: NETWORK ARCHITECTURE AND TRAINING CONFIGURATION
The forward network architecture is shown in Fig.
![]()
Figure 15.Architecture of the forward network.
![]()
Figure 16.Architecture of the inverse network.
APPENDIX L: METASURFACE PHASE DISTRIBUTIONS DESIGNED BY THE GS ALGORITHM AND DEEP LEARNING APPROACH
Figure
![]()
Figure 17.Comparison of the metasurfaces’ phase distributions designed by AI and GS.
References
[22] R. W. Gerchberg. A practical algorithm for the determination of phase from image and diffraction plane pictures. Optik, 35, 237-246(1972).

Set citation alerts for the article
Please enter your email address


AI Video Guide
AI Picture Guide
AI One Sentence


