
- Advanced Imaging
- Vol. 1, Issue 3, 031001 (2024)
Abstract
1. Introduction
Ghost imaging is a groundbreaking imaging technology that relies on computing correlations within light field intensity fluctuations[1,2]. Unlike conventional dot-to-dot imaging methods, ghost imaging employs a single-pixel detector (SPD) to capture the intensity of the light field reflected or transmitted through the object. The object image is subsequently reconstructed by calculating the correlation between the intensity sequence captured by the SPD and the intensity distribution of the actively projected light field. This unique method significantly improves imaging quality in low-light environments[3,4]. Nevertheless, light propagation in dynamic scattering media, such as underwater environments, suffers from scattering and absorption, leading to significant imaging challenges, including low contrast and blurred details[5,6]. The inherent strengths of ghost imaging offer promising avenues for recovering and enhancing images captured in these media.
To further improve the quality of ghost images in low-light and high-scattering environments, both compressed sensing and deep learning algorithms have been proposed. Compressed sensing algorithms[7,8] leverage sparse representation for image reconstruction, achieving high image quality but often at the expense of increased acquisition times and computational complexity. Deep learning[9–12] has emerged as a powerful tool for ghost imaging. Early network architectures explore the capabilities of convolutional neural networks (CNNs) for high-quality image reconstruction[13,14]. However, data-driven models are inherently data-hungry, and the challenges associated with underwater data acquisition can limit their generalization capabilities. To address this limitation, deep learning methods combined with physical constraints, such as GIDC[15], VGenNet[16], and ILNet[17] have been proposed. These approaches leverage untrained neural networks to generate high-quality images by embedding prior physical knowledge within the network architecture. This integration of physical constraints fosters the development of robust imaging solutions. However, a key limitation of these techniques lies in their reliance on one-dimensional intensity information, neglecting the potential to capture richer multi-dimensional object information, such as polarization information, especially in dynamic scattering media.
Polarization technology has emerged as a critical tool for overcoming imaging challenges in dynamic scattering media, especially in underwater environments[18–20]. Polarization imaging approaches, which leverage physical models based on polarization information and image processing techniques, have been proposed to address these limitations[21–23]. These techniques typically require the acquisition of object images across various polarization states, followed by subsequent processing to optimize image quality. Inspired by these advancements, the concept of polarization ghost imaging (PGI) has been proposed[24]. Building upon this concept, Shi et al.[25] leverages a single detector to simultaneously capture multi-polarization information. This method reconstructs high-fidelity images by correlating intensity across different polarizations and subsequently merging the weighted components. However, PGI currently relies on only two orthogonally polarized signals to differentiate objects from backgrounds within dynamic scattering media. Importantly, circularly polarized light exhibits superior resilience to multiple scattering events compared to linearly polarized light due to its inherent point-symmetry. This property allows circular polarization to maintain superior polarization characteristics in both forward and backward scattering scenarios[26]. Indeed, the use of circular polarization has been demonstrated to enhance underwater object image quality[27,28]. Unfortunately, current PGI methods have overlooked the use of circularly polarized light to improve imaging quality. More importantly, multiple polarization fusion methods have not been employed for ghost imaging.
Sign up for Advanced Imaging TOC. Get the latest issue of Advanced Imaging delivered right to you!Sign up now
In this paper, we present a multi-polarization fusion mutual supervision network (MPFNet) for ghost imaging. MPFNet leverages a multi-branch spatial-channel cross-attention architecture to process one-dimensional intensity signals acquired under both linear and circular polarization illumination. This framework facilitates the reconstruction of underwater objects with minimal reliance on pre-trained models. Instead, the network parameters are optimized using a loss function based on the intensity difference between the reconstructed image and the detector’s recorded intensity signals. This approach effectively incorporates physical priors, mitigating the dependency on large datasets and significantly reducing data acquisition costs, particularly in noisy underwater environments. The feasibility and efficacy of this multi-polarization fusion approach are validated through experimental demonstrations in both free-space and underwater environments.
2. Method and Experimental Setup
2.1. Methodology
In ghost imaging architecture, a beam splitter divides the optical path into two arms: the reference arm and the object arm. The reference arm uses an array detector to record the illumination pattern, while the object arm employs the SPD to capture the signal. The mathematical framework of both traditional ghost imaging (TGI)[29] and differential ghost imaging (DGI)[30] involves the light field distribution in the reference arm and the data collected by the SPD. Image reconstruction is performed by computing the second-order correlation between the data acquired from the two detectors. Subsequently, a method known as computational ghost imaging leverages a digital micromirror device (DMD) to project pre-modulated speckle patterns, eliminating the need for a reference arm.
The polarization state of polarized light is defined as the time average of its two components along the propagation direction. Specifically, polarized light can be represented as the vector sum of its and components, as shown in Eq. (1):
We eliminate the part from Eq. (1) to get the following formula:
From the above representation, the light vector is determined by the magnitude of . When is an integer multiple of , the light is linearly polarized. Conversely, when and , the light is circularly polarized. During the interaction of light with a medium, factors such as surface structure, material properties, and incident angles can alter the polarization state, embedding characteristic information about the medium in the polarization data. Polarized light passing through various optical components, transmission media, and systems can be considered as undergoing optical transformations. These transformations are typically described using the Jones matrix or the Mueller matrix.
To achieve the ghost image, a series of modulated mask light fields with different spatial distributions need to be loaded onto the object. For the modulation speckle pattern sequence, , where denotes the frame of the modulation speckle pattern and denotes the number of the modulation speckle pattern. The corresponding light intensity value is captured by the SPD after interacting with the object,
To estimate the spatial resolution of the reconstructed ghost image, the slanted-edge method is employed. This method involves selecting a region of the image that is uniformly covered by both the signal and the background. Pixel values parallel to the edge of the object are averaged to create a line profile perpendicular to the edge, which is then fitted with an error function. Using the 10% to 90% criterion, this approach provides a reliable measure of the image resolution. It is worth noting that the higher the CNR and PSNR values, the better the image quality; the lower the spatial resolution value, the better the image quality. It is worth noting that the CNR reflects the contrast between the target area and the background in the reconstructed image. The larger the CNR, the higher the contrast of the target object in the image, and the easier it is to distinguish the target part; the PSNR reflects the ratio of the target to the noise in the reconstructed image, focusing on the impact of noise on the image restoration accuracy. The higher the PSNR value, the less noise the reconstructed image has and the closer it is to the original image; resolution focuses on the smallest discernible detail in the image, and the smaller the resolution, the better the resolution.
2.2. Network Structure
The MPFNet incorporates basic modules such as encoder, decoder, and skip connections, along with ResBlocks[34] for linking data between adjacent layers as shown in Fig. 1. The network’s input consists of one-dimensional intensity sequence signals detected by the SPD, corresponding to the number of samples taken. Different from classical single-input structures, we integrate detection data from multiple polarization states. Within each input branch, multi-scale convolution operations are employed to extract information from the raw data. Skip connections are used to prevent gradient vanishing and enhance feature reuse.
The one-dimensional intensity signals of (real) and (real) collected by the SPD are first mapped into two-dimensional and by the fully connected net (FCN). In each feature extraction layer, convolution features of different convolution kernel sizes are extracted from the input two-dimensional image. Since convolution operations usually only focus on local information, multi-layer convolution expands the receptive field but also leads to a certain degree of loss of global information. After generating each image, the data from each branch are convolved with the speckle pattern and reconverted into a one-dimensional sequence. The actual collected signal is used for supervision, with the loss functions for the individual branches denoted as and , respectively. In the multi-branch data fusion stage, the multi-branch fusion (MBF) module is employed to reintegrate and process features of the object information and background noise extracted by different branches during feature extraction. As illustrated in Fig. 2, this fusion process preserves the integrity of data features by combining results from various stages of feature extraction.
As shown in Fig. 2, this fusion process maintains the integrity of data features by combining the results of each stage of feature extraction. Since the data of each branch may have different resolutions and channel configurations due to downsampling, it is represented as output features of different sizes. The dual-branch stage processes the image mapped from the original data into different resolution sizes (ignoring changes in the channel layer) through different feature extraction modules as shown in Fig. 1. In the downward pass, the original image size changes to , , and . In order to unify the dual-branch output to size and input it into the MBF module, the kernel size of MaxPooling is set to 4, 2, and 1 to unify the resolution size of the feature map, and then a convolutional layer with a kernel size of is used to unify the number of channels.
Figure 1.Schematic diagram of the MPFNet.
Figure 2.Schematic diagram of the MBF module.
Additionally, at the downsampling stages, the feature extraction results of multiple branches are finally converged with the output of the MBF module. Inspired by standard channel attention and spatial attention mechanisms, we propose the multi-branch spatial-channel attention (MBSCA) module. Unlike classical channel and spatial attention mechanisms[35], the MBSCA module is designed to capture multi-scale information by integrating branch and spatial dimensions to enhance the attention mechanism. As shown in Fig. 1, the MBSCA module is situated between the encoder and decoder within the network architecture. This module processes inputs from three sources: (1) Stream 1 and (2) Stream 2, which represent one-dimensional signal inputs from each branch, are first passed through fully connected layers and then through convolutional layers of varying sizes to extract features and generate two-dimensional image outputs. Then, (3) the MBF stream consolidates convolution results from different branches at various stages, standardizing them to match the size of the outputs from the other two streams. At this stage, the outputs from all branches have the same channel number and resolution.
Figure 3 primarily illustrates the feature fusion structure but omits the reshaping and permutation operations required for channel variation and matrix manipulation during the multi-scale fusion process. Inputs from different branches undergo spatial and channel attention modules, resulting in feature maps of varying dimensions. These maps are then unified in size through operations such as matrix manipulation and concatenation. Signals from different branches are deeply fused and extracted at the connection points between the encoder and decoder. Ultimately, the decoder processes these features to reconstruct the final image.
Figure 3.Schematic diagram of the MBSCA structure.
In order to effectively integrate the feature information extracted from the two single branches and the MBF fusion data, the MBSCA module connects the encoder and the decoder, as shown in Fig. 3. On the one hand, MBSCA uses the extracted detail features to correct the changing features in the single branch by fusing the single branch data with MBF. On the other hand, MBSCA reduces redundant features in space and channels and enhances the changing features through multiple channel attention and spatial attention modules. Among them, for the spatial dimension, we use to represent the deep feature map from branch Stream 1 and Stream 2, and to represent the features of the MBF module. Next, we use a convolution operation to increase nonlinearity and enhance the robustness of features. We use SAM to identify and enhance the changes in the convolution features and obtain two spatial distribution feature maps. The form of the process is
Considering that the fusion of MBF and the two branch features in the spatial feature will also have some data redundancy in the whole process, we get a new data feature map by averaging them, which is recorded as
After the MBSCA module fuses the data of the three branches, it will produce the final features. In addition, the fused features are passed to the decoder stage through the jump connection structure to supplement the change information at different scales and for change detection. The overall algorithmic flow of the network framework is summarized in Table 1.
1: |
2: 1D |
3: 1D signal2 |
4: |
5: |
6: |
7: |
8: |
9: |
10: |
11: |
12: |
13: |
14: |
15: |
16: |
Table 1. MPFNet Algorithm.
2.3. Experimental Setup
To validate the effectiveness of MPFNet in unfamiliar environments, we conduct experiments in free-space and highly noisy underwater environments. The specific experimental setup is shown in Fig. 4. The objects used in these experiments are laser-cut digits and letters. In addition, a 532 nm laser with an intensity of 10 mW undergoes beam expansion through an expander lens. This laser has high coherence, can maintain a long coherence distance, and has the characteristics of low noise output and high stability, which is suitable for optical imaging applications. The expanded laser then illuminates the DMD, which is equipped with micromirrors with a 10.6 µm pitch. By synchronizing the DMD projection with the SPD signal reception, we project the modulated pattern at a rate of 50 Hz and record intensity sequences for 500, 750, 1000, 1250, and 1500 modulation masks. The corresponding sample rates are 4.8%, 7.3%, 9.7%, 12.2%, and 14.6%, respectively. The model of the DMD we used is FLDISCOVERY F430 DDR 0.65 WXGA, and the model of the SPD is GLGYZN SPDM30. Additionally, the reconstructed images are quantitatively evaluated using CNR, PSNR, and resolution metrics.
Figure 4.Schematic diagram of the experimental setup. (a) The free-space ghost imaging experimental setup. The laser passes through a lens and shines on the DMD. The laser passes through the object and is received by the SPD. (b) The underwater ghost imaging experimental setup. The light source is modulated by a PBS and a QWP. The dashed box in the first part represents the acquisition of linear polarization signals, and the dashed box in the back represents the acquisition of circular polarization signals.
The experimental setup consisted of a laser beam expanded and directed through a lens onto a DMD in free space [Fig. 4(a)]. The modulated beam interacted with the object, and the resulting light intensity signals were captured by an SPD. For underwater experiments, a polarizing beam splitter (PBS) and a quarter-wave plate (QWP) generated horizontally and circularly polarized lights in Fig. 4(b). The laser beam was converted to horizontally polarized light by the PBS and maintained this polarization state when the QWP’s fast axis was aligned horizontally. The beam reflected off the DMD, passed through two glass layers and turbid water, and was focused onto the SPD. To capture data from the linearly polarized light source, two SPDs were employed: one positioned behind the PBS for co-polarized signals and another for vertically polarized signals. For each polarization state, 1500 light intensity data points were collected, allowing for single polarization and combined polarization reconstructions. Additionally, after rotating the QWP by 45° to convert the linearly polarized light source into a circularly polarized light source, we collected 1500 light intensity data points for different objects. Speckle patterns remained consistent throughout the experiment.
This experiment considered two different imaging media. One was to collect data in the air to test the imaging performance of our proposed method. The other was to collect experimental data in an underwater environment with putty powder. Since the DMD and SPD are separated on both sides of the water tank, the presence of two layers of glass makes the environment more complicated than typical computational imaging environments. Considering that the impurity particles in our underwater environment are larger in size, with an attenuation coefficient of and a corresponding mean free path of about 2 m, we compared the performance of our proposed method against two common deep learning methods.
3. Results and Discussion
3.1. Imaging Quality at Different Iterations and Sampling Rates for Different Methods
To validate MPFNet’s capabilities in reconstructing object images, we employ experiments in free-space environments. As illustrated in Fig. 5, the images reconstructed by TGI and DGI contain background noise below 1500 measurements, underperforming MPFNet, which achieves comparable results with only 500 measurements. Increasing measurement counts led to superior noise reduction in MPFNet’s reconstructions. This demonstrates that MPFNet can reconstruct object images with high quality. To ensure consistent sampling across experiments, each network branch processes half the total reconstruction data. Thus, with 500 reconstructions, each branch utilizes 250 measurements, a ratio maintained for subsequent iterations.
Figure 5.Reconstruction results in free space, with the corresponding grayscale and ground truth of the object.
Subsequently, we assess the reconstructed images using metrics such as CNR, PSNR, and resolution analysis to quantitatively demonstrate MPFNet’s superior reconstruction performance. The experimental results are shown in Fig. 6, where blue, green, yellow, and red represent DGI with 1500 iterations (DGI-1500), TGI with 1500 iterations (TGI-1500), our method with single input and 1500 iterations (Ours-s1500), and our method with dual-branch input where each branch has 750 iterations (Ours-m1500). It can be clearly seen that the proposed method achieves higher image quality with only 500 iterations compared to 1500 iterations using TGI methods, as indicated by the curves. This highlights the effectiveness of MPFNet in delivering high-quality reconstructions in fewer iterations. As illustrated in Fig. 6, the dual-branch reconstruction marginally surpasses the single-input approach in both CNR and PSNR metrics. Moreover, the MPFNet with dual-branch input consistently exhibits the best performance in terms of resolution. These results collectively provide compelling evidence that the fusion of multi-polarization information effectively mitigates the adverse impact of dynamic scattering environments on the quality of reconstructed images.
Figure 6.Quantitative evaluations of the CNR, PSNR, and resolution of TGI, DGI, and the MPFNet. Object-DGI and Object-TGI represent the reconstruction results of DGI and TGI at different sampling rates. Object-Ours-s represents the single input and Object-Ours-m represents the dual-branch input.
3.2. Imaging Quality in Turbulent Water with Different Methods
To further verify the effectiveness of MPFNet, we reconstruct the object image with different speckle pattern sizes in turbulent water. The experimental results are shown in Fig. 7(a). Figure 7(b) shows the effects of varying speckle sizes on the DGI method in underwater environments. Larger speckle sizes correspond to higher CNR and lower resolution, consistent with the resolution trends discussed in the paper. While better resolution provides clearer object visibility, we opted for a speckle size of 60 for underwater data collection to ensure adequate object discernibility. To validate the superiority of MPFNet, we compare its performance against TGI and DGI methods using metrics such as CNR, PSNR, and resolution. MPFNet significantly outperforms conventional methods. Unlike free-space environments, the light scattering and absorption underwater distort the light field, resulting in more pronounced image distortion and detail loss in reconstructions.
Figure 7.(a) Relationship between CNR and resolution for images reconstructed by the network using different speckle sizes. (b) Relationship between CNR and resolution for DGI reconstruction results of light fields modulated with various speckle sizes in underwater environments.
To evaluate the effectiveness of PGI, we compare the quality of reconstructed images using horizontally polarized, vertically polarized, and combined orthogonally polarized data samples. All images were reconstructed using 1500 sampling data. Traditional TGI and DGI algorithms use 1500 light intensity data samples from an unpolarized light source as a baseline. The linearly polarized light source input reconstruction includes , , and Multi-LP, where each branch input of the network in and reconstruction is 750 data samples of mutually perpendicular linear polarization components. The Multi-LP method fuses two orthogonal components of the linearly polarized light source, and each component also contains 750 acquisitions. For CP, each branch input in the dual branch is from a circularly polarized light source. In the LP + CP setting, the linearly polarized and circularly polarized light source data are fused, and the total number of samples remains unchanged. For consistency, all experiments use the same speckle sequence.
In addition, we also use the reconstruction results of simulated data to illustrate the robustness and effectiveness of our proposed structure. The simulated data model the process of laser passing through optical equipment and underwater media to reach the bucket detector and calculate the light intensity sequence captured by the detector. TGI and DGI methods are used as benchmarks, while ResUNet and GIDC networks are used for comparison. Ours-s represents the reconstruction results of single-branch input, and Ours-m represents multi-branch input, with MBF and MBSCA frameworks integrated based on Ours-s. As shown in Fig. 8, by comparing common deep learning methods with and without MBF and MBSCA modules, we can observe their impact on performance. From the experimental results, it can be seen that adding MBF and MBSCA modules to the reconstruction process highlights the overall boundary of the target and reconstructs the target signal strength. By calculating indicators such as CNR and PSNR, we find that the effects of Ours-m on different indicators are also improved.
Figure 8.Reconstructed image results using simulated data. Ours-s represents single-branch imaging results; Ours-m represents dual-branch imaging results including MBF and MBSCA structures.
We first use two deep learning network frameworks widely used in the field of computational imaging to compare the imaging of underwater acquisition data. Among them, UNet is the framework used in most computational imaging. We use the basic variant of UNet and ResUNet, and we use the GIDC network that also does not require data training as a comparison. In Fig. 8, the overall imaging quality of GIDC is higher than that of ResUNet. In general, the CNR, PSNR, and resolution indicators of linear polarization data show that the imaging effect of circular polarization is slightly better than that of linear polarization. In addition, since the network frameworks of ResUNet and GIDC only have one input branch, it is difficult to reconstruct a clear image in the case of the vertical component of linear polarization. At this time, the network’s ability to distinguish target features from the data is weak. The method we proposed uses a dual-branch structure, as well as MBF and MBSCA modules to fully extract target feature information from the data, and the overall imaging quality is significantly better than these two deep learning methods.
In free space as shown in Fig. 9(a), the CNR and PSNR values of the images reconstructed using TGI and DGI are relatively low compared to the network reconstruction results, indicating that the images contain more noise and that the contrast between the object region and the background is low. In experiments using linear polarized light sources, the data from can still reconstruct the object, and the reconstruction results from show relatively higher CNR and PSNR compared to . As illustrated in Fig. 9(b), we present the fluctuations of 1500 data points collected for the letter “P” under two linear polarization directions. After normalizing the randomly selected intensity data, we found that the intensity fluctuations in both directions are similar. The relative fluctuation of the intensity corresponding to was calculated to be 1.0213, while that corresponding to was 0.8274. For , the water environment caused depolarization of the detection light; however, since the detector was positioned behind the object, the depolarized light also contained object information, allowing for the reconstruction of the object image. The signal corresponding to contained more object information, resulting in better image quality for reconstruction. Furthermore, the Multi-LP reconstruction results, which integrate information from both polarization directions, yielded better reconstruction results than those from a single polarization direction due to the acquisition of more object information. Compared to linearly polarized light, which changes its polarization state due to collision with particles in a dynamic scattering medium, circularly polarized light can be regarded as two orthogonal linear polarized lights with a constant phase difference. The two components maintain a constant rotation during propagation, and collision with particles during propagation has little effect on the overall polarization state. In addition, due to the symmetry and uniform optical properties of circularly polarized light, it remains more stable in a dynamic scattering medium to reduce the impact of polarization attenuation. In our experiment, putty powder was used as the medium in the water environment. The size of the putty powder medium is much larger than the laser wavelength of 532 nm. At this time, Mie scattering is dominant. At this time, the polarization-preserving characteristics of circularly polarized light are higher than those of linearly polarized light, which also explains why the reconstruction result using circularly polarized light is better than the reconstruction result using horizontal linear polarization alone. The same circular polarization also supplements the linear polarization in the double-branch structure. The corresponding light intensity signal received by the detector is also stronger, making the circularly polarized (CP) reconstructed image close to the Multi-linearly polarized (LP) reconstruction result, and the CNR and PSNR are slightly improved. Additionally, since LP + CP not only fuses intensity information from various directions but also enhances the object information, the combined effect of the multi-branch fusion in MPFNet and the data augmentation module led to superior performance in CNR and PSNR compared to other methods. Also, it demonstrated superior resolution metrics within fixed regions. From the perspective of intuitive visualization, the imaging results of combining CP and LP lights do not exhibit significant visual differences compared to using CP or LP alone. However, when assessing the quality of computational imaging, we employ quantitative metrics such as CNR, PSNR, and resolution to evaluate the reconstruction performance. As shown in Fig. 9, our method demonstrates notable advantages over conventional deep learning approaches in terms of these objective measures, highlighting its effectiveness in enhancing image quality.
Figure 9.(a) Reconstructed underwater ghost images and corresponding CNR, PSNR, and resolution. ResUNet-
4. Conclusion
The proposed MPFNet enhances image reconstruction quality through three key innovations, enabling effective reconstruction with low sampling rates in challenging underwater environments. First, by employing dual-structured inputs with identical sample counts but different speckle patterns of light intensity data, we integrate a physical model into the network, thereby eliminating the need for pre-training. This method addresses generalization issues commonly encountered in traditional data-driven networks. Second, utilizing the information extraction capabilities of CNNs, our approach progressively maps high-resolution, low-channel initial images to low-resolution, high-channel outputs, effectively reducing the sampling rate. While traditional networks often use residual structures with skip connections to preserve information, our method employs the MBF module to standardize results from different stages with varying resolutions and channel configurations to a uniform size. This strategy minimizes data loss during convolutional downsampling and enhances data by integrating dual-branch inputs, allowing the comparison of object and background features to correct reconstruction anomalies. Finally, we introduce an attention module that combines three inputs: data processed at each encoder stage of the dual-branch structure and results unified through the MBF module. By adjusting channel and pooling kernel sizes, we maintain consistent resolutions and channel numbers across inputs. This attention module features channel counts significantly higher than the single channel count of the final reconstruction image. With the image resolution reduced to 1/256 of its original size, the combined spatial and channel attention mechanism enables focused feature map analysis, assigning weights to different spatial positions and channels. Each branch input provides contextual information from various dimensions, thereby improving the ability to extract feature map information effectively.
References
[14] M. Lyu et al. Deep-learning-based ghost imaging. Sci. Rep., 7, 17865(2017).
[24] D. Shi, S. Hu, Y. Wang. Polarimetric ghost imaging. Opt. Lett., 39, 1231(2014).
[25] D. Shi et al. Polarization-multiplexing ghost imaging. Opt. Laser Eng., 102, 100(2018).
[30] F. Ferri et al. Differential ghost imaging. Phys. Rev. Lett., 104, 253603(2010).
[34] K. He et al. Deep residual learning for image recognition, 770(2016).
[35] S. Woo et al. CBAM: convolutional block attention module, 3(2018).

Set citation alerts for the article
Please enter your email address