
- Chinese Optics Letters
- Vol. 21, Issue 8, 080501 (2023)
Abstract
1. Introduction
As a versatile digital holography technique, optical scanning holography (OSH) has been widely used in microscopy[1], remote sensing[2], image encryption, etc[3,4]. In OSH, the object is scanned by the heterodyne beams, which are launched from the same light source with frequency difference generated by the frequency shifter. Unlike traditional digital holography, OSH can record 3D objects into 2D holograms by single-pixel 2D scanning. The hologram can preserve the amplitude as well as the phase information of the object. One can obtain this object’s information using hologram reconstruction.
In the image reconstruction, there are two important issues. The first one is auto-focusing, i.e., finding the accurate reconstruction distance, and the other is choosing an applicable reconstruction method with increasing depth resolution as well as less defocus noise. A great deal of research have proposed to retrieve the reconstruction distance automatically, such as the extended focused imaging[5], structure tensor[6], and connected domain[7]. Researchers also used the time-reversal (TR) technique to get the depth information by calculating the pseudo-spectrum of the TR matrix generated from the hologram[8]. To further improve the depth resolution, methods based on double measurements have also been proposed, including the use of a dual-wavelength laser[9], double-location detection[10], and the reconfigurable pupil[11].
The out-of-focus haze, also known as the defocus noise, is undesired residual signals from other sections. Many methods have been presented to conduct image reconstruction and to suppress the defocus noise, such as inverse imaging[12,13], Wiener filtering[14], and 3D imaging[15]. For example, Zhou et al. used a random phase pupil to transfer the defocus noise into speckle-like patterns[16]. This noise can be further suppressed by average[16], connected component[17], and image fusion[18].
Sign up for Chinese Optics Letters TOC. Get the latest issue of Chinese Optics Letters delivered right to you!Sign up now
In recent years, deep learning has undergone rapid development and has found wide applications in some areas, such as language processing, image processing, biomedical and machine visions, as well as digital holography[19–22]. Ren et al. presented a convolution neural network (CNN) based on the regression method to achieve fast auto-focusing[23]. The CNN has been trained by a set of holograms a priori. Pitkäaho et al. showed that CNNs can also predict the in-focus depth by learning from half a million hologram amplitude images in advance[24]. Compared with traditional methods, their work showed better precision and efficiency. Rivenson et al. used deep learning to rapidly perform phase recovery and image reconstruction simultaneously. The calculation was based on only one hologram, and could reconstruct both the phase and amplitude images of the objects[25]. Nguyen et al. presented a phase aberration compensation method based on deep learning CNN[26]. It could perform automatic background region detection for most aberrations. Deep learning has also proved an effective tool in molecular diagnostics[27], as Kim et al. trained the neural networks to classify the holograms without reconstruction. The captured holograms of the cells were used as raw holograms to train the neural networks, which were able to classify individual cells afterwards.
In this paper, we present for the first time and to the best of our knowledge, a reconstruction method based on a U-shaped convolutional neural network (U-net) to remove the speckle-like defocus noise in a OSH system. The U-net approach is adopted to learn the mapping between various holograms and the corresponding sectional images. Unlike other CNN methods, which require large training data sets, U-net can work with very few training images and yields more precise results. The proposed method can eliminate the speckle-like noise generated by the random phase pupil. Simulation results show that the algorithm works well with both simple and complex graphics. It also outperforms the traditional reconstruction methods in terms of better sectional image quality and significantly faster processing speed.
This paper is organized as follows. In Section 2, we first introduce the OSH system and the principle of random phase pupil system. The theory of U-net deep learning is also explained in this section. Simulation results are presented and discussed in Section 3 to demonstrate the visibility of the proposed method. The conclusion remarks are given in Section 4.
2. Principle
2.1. Optical scanning holography
The holographic system is illustrated in Fig. 1. A He–Ne laser is set at the starting position of the system to launch a cluster of planar waves. The waves will be split into two beams by beam splitter BS1. An acousto-optic frequency shifter (AOFS) is used to shift one of the frequencies from
Figure 1.OSH system setup[
In a random phase pupil holographic system, the two pupils are set as
To recover the
For a better sectioning effect, this noise should be further eliminated. One can suppress the speckle haze by averaging multiple section images or using the connected component methods[16,17]. These methods succeed in suppressing overriding noise. However, they all require multiple frames to solve the problem. This would greatly reduce the efficiency. Here, we present a special CNN-based method, U-net, to suppress the speckle-like noise. In addition to having no requirement of prior information, the U-net method also features a simpler, shorter operation time and is more robust than the conventional methods mentioned above.
Unlike other CNN methods, which require large training data sets, U-net can work with very few training images and yield more precise results[29,30]. This is the main advantage, especially in the situation where it is difficult to retrieve large data sets, such as biological applications.
2.2. U-shaped convolutional neural network
The neural network is organized by a contracting path as well as an expansive path. As the appearance is very similar to the letter ‘U’, this neural network has been named U-net. U-net was first presented to segment the biomedical image and has proved to be a very effective end-to-end image processing tool[30,31].
Figure 2 illustrates the architecture of the U-net, which contains two paths: the contracting one (convolution + downsampling) and the expansive one (deconvolution + upsampling). The contracting path has many layers. Each layer consists of two
Figure 2.The architecture of U-net. ‘Conv, 3 × 3’ represents a 3 × 3 convolution kernel with the ReLU activation function. ‘Padding=same’ means that the matrix dimensions of the input and output in the convolution layer are the same. ‘Maxpool 2 × 2’ represents the function to choose the maximum value from a 2 × 2 matrix. ‘Upsampling and conv, 2 × 2’ stands for upsampling using a 2 × 2 convolution kernel. Each blue box represents a multi-channel feature map, while the white ones represent the copied feature maps.
It is worth noting that the number of feature channels are copied and cropped after each downsampling process, as denoted by the white boxes in Fig. 2. These contracted high resolution features are then merged and combined with the upsampled output to generate a more precise output.
For the
To suppress the defocus noise, a sufficient training set should be collected. The training set contains the encoded hologram with speckle-like noise and the labeled image without noise. In the training process, the reconstructed images with noise are set as input images, which would propagate forward to obtain the predicted images. The loss function
3. Results and Discussion
The proposed method was demonstrated via simulation. The optical process was simulated with Matlab, while the reconstruction results based on the proposed U-net method were generated with Pytorch. The GPU used in the simulation was NVIDIA GTX 1080Ti with 16 GB memory. A He–Ne laser centered at 632.8 nm was used as the source. The focal length of the lens L1 and L2 was
3.1. Simple graphics
The U-net method was first verified with simple graphics, such as the English alphabet. In the training process, it is important to generate enough data sets. In the simulation, 386 original images were used, with each one passing through 27 different random phase pupils in the OSH system as shown in Fig. 1. In this way, 10,422 data sets were produced for training. Some of the sample images with speckle-like noise are shown in Fig. 3, which are used as the input images of the U-net model. The speckle-like noises are generated from the other section based on Eqs. (1) and (2). The sectional images are in the database as mentioned above. One can observe from this figure that different speckle-like noise was added according to Eq. (2). The corresponding noise-free images, also denoted as the standard images or reference images of the U-net, are shown in Fig. 4.
Figure 3.Input images for the U-net model.
Figure 4.Standard images of the U-net model.
To accelerate the convergence, we chosed the method for stochastic optimization with a learning rate equal to 0.0001[32]. The parameters less than 0.5 were dropped out to prevent over-fitting[33]. The relationship between the training loss and the iteration times is shown as the blue curve in Fig. 5, while the orange one represents the relationship between the validation loss and the iteration times. It can be seen from this figure that the loss of the training data as well as the validation data both decrease with the iteration times.
Figure 5.The relationship between the loss function and iteration times.
The first simulation results proved that the U-net architecture really did a good job of learning the characteristics of speckle-like noise. Here, we demonstrate the reconstruction results under different random phase pupils in Fig. 6. Three different test images were used to verify the proposed method. The image with the letters ‘ABC’ was in the training data sets, while the ones with letter ‘
Figure 6.The reconstruction results with U-net. (a), (d), and (g) are the original images. (b), (e), and (h) are input images with speckle-like noise generated by different random phase pupils. (c), (f), and (i) are the corresponding reconstructed output images.
To evaluate the reconstruction effect with the U-net, we compare the results with two important factors: the peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM). The PSNR can be defined as[34]
The SSIM is used to quantify the visibility of differences between the output image and the corresponding reference image. The quality assessment is based on the degradation of structural information and can be expressed as[35]
The quantified assessment results are shown in Table 1. One can observe from the row ‘Input VS Original’ that, the PSNR for all cases is all around 15 dB, and the SSIM is quite small at around 0.06. This means that the speckle-like noise has greatly degraded the signal-to-noise ratio, and the similarity between the noisy images and the original ones has also been reduced to a rather low degree. While for the scenario of ‘Output VS Original’, the PSNR for all cases is increased above 32 dB, and the SSIM is also raised close to 1. This indicates that speckle haze has been eliminated successfully, and the output images of the U-net have small distortion and high similarity with the original ones.
Test sample | MSE | PSNR (dB) | SSIM (∈[0,1]) | |
---|---|---|---|---|
Input | ‘ABC’ | 1909.14 | 15.32 | 0.5782 |
VS | ‘XYZ’ | 1942.50 | 15.25 | 0.6171 |
Original | ‘光’ | 2138.30 | 14.80 | 0.5584 |
Output | ‘ABC’ | 15.8 | 36.15 | 0.9535 |
VS | ‘XYZ’ | 36.4 | 32.52 | 0.9326 |
Original | ‘光’ | 32.7 | 32.98 | 0.9383 |
Table 1. The Quantified Performance of the U-net Using Different Test Images
It is worth noting that the computation time for all cases are around 33.3 ms, and the computer is configured as NVIDIA GTX 1080Ti with 16 GB memory.
We have also measured the sectioning results of the Chinese character in Fig. 6(h) with different noise ratios. The noise is generated from different sections with the traditional method[16], as is shown in Figs. 7(a)–7(c). The sectional images generated by the U-net method are shown in Figs. 7(d)–7(f), respectively. Table 2 presents the corresponding quantified results. It can be seen from Fig. 6 and Table 2 that for sectional images with different noise ratios, the values of the PSNR and the SSIM all increase significantly (with the PSNR around 32 dB and the SSIM close to 1). This result shows that the U-net method can successfully remove the defocus noise with different noise ratios.
Test sample in Fig. | MSE | PSNR (dB) | SSIM (∈[0,1]) | |
---|---|---|---|---|
Input | (a) | 2256.34 | 13.35 | 0.3732 |
VS | (b) | 2387.33 | 12.33 | 0.3245 |
Original | (c) | 2489.95 | 10.58 | 0.2788 |
Output | (d) | 33.1 | 32.56 | 0.9305 |
VS | (e) | 32.1 | 32.47 | 0.9289 |
Original | (f) | 33.7 | 32.18 | 0.9245 |
Table 2. The Quantified Performance of the U-net with Different Noise Ratios
Figure 7.Sectioning results with different noise ratios based on the U-net method.
3.2. Complex graphics
In this subsection, the situation of complex graphics is tested. Some samples of the complex graphics are shown in Fig. 8, in which some standard digital image processing images are included, such as Barbara, Cameraman, and Peppers. These images were used in the OSH system to generate holograms with complex graphics in order to test the noise-suppressing ability of the proposed method.
Figure 8.The original images for generating the training data sets.
To generate the training data sets, 364 original images were used, with each one passing through 30 different random phase pupils. In this way, 10,920 data sets were produced for the U-net model. It is important to mention that the speckle-like noises are generated from the other section based on Eqs. (1) and (2). The sectional images are in the database as mentioned above.
The loss function for both the training data and the validation data were calculated in the training process. The results are shown in Fig. 9. As can be seen from this figure, the loss function decreases with the iteration times. One can expect that the defocus noise can be suppressed after 400 iterations. The generalization gap between training loss and validation loss is measured to be around 0.002 in this case.
Figure 9.The relationship between the loss function and iteration times in the complex graphics.
The reconstruction results using the U-net with complex graphics are shown in Fig. 10. Two complex graphics named ‘Monkey’ and ‘Rice’ were used to test the U-net method. It can be seen from this figure that most of the speckle haze has been eliminated.
Figure 10.The test results of the complex graphics with U-net. (a),(d) are the original images of ‘Monkey’ and ‘Rice’. (b),(e) are the input images with speckle-like noise generated by different random phase pupils. (c),(f) are the corresponding output images of the U-net.
The quantified evaluation results are listed in Table 3. By comparing the data in the row ‘Input VS Original’ with that in the row ‘Output VS Original’, one can observe that the values of the PSNR and the SSIM have both increased after the U-net processing. The increased PSNR represents the improvement of the signal-to-noise ratio, which means that the noise has been decreased. While the variation of the SSIM suggests higher similarity between the output images of the U-net and the original ones.
Test sample | MSE | PSNR (dB) | SSIM (∈[0,1]) | |
---|---|---|---|---|
Input VS | ‘Monkey’ | 996.3 | 18.15 | 0.5613 |
Original | ‘Rice’ | 1218.2 | 18.15 | 0.5613 |
Output VS | ‘Monkey’ | 278.5 | 23.68 | 0.8127 |
Original | ‘Rice’ | 57.6 | 30.53 | 0.8378 |
Table 3. The Quantified Performance of the U-net Using Complex Test Images
It can also be concluded from Tables 1 and 3 that as the input image becomes more complex, the improvement of the PSNR would degrade. This indicates that it is harder for U-net to distinguish the features between the image and the speckle noise when the complexity of the image increases.
3.3. Reconstruction of 3D objects
To verify the feasibility of eliminating the defocus noise in OSH, we have also evaluated the reconstruction results of two different 3D objects in this subsection. The performances between the conventional reconstruction method and the proposed one are also analyzed.
The first 3D object used in the simulation contains two slices, as is shown in Fig. 11(a). Each slice has a size of
Figure 11.(a) Object with two slices, and (b) the recorded hologram.
Figures 12(a) and 12(b) show the retrieved sectional images using the conventional algorithm[16], with reconstruction distance at
Figure 12.(a) , (b) Sectional results of the conventional method. (c) , (d) Sectional results of the proposed U-net method, with z1 = 9 mm and z2 = 10 mm.
We have also tested the U-net method with a simulated hologram of a rocket. The semi-transparent 3D rocket is shown in Fig. 13. It has been divided into six uniformly separated sections along the
Figure 13.The 3D rocket.
Figure 14.(a)–(f) Sectional images of the 3D rocket. (g)–(l) Reconstructed images with the traditional method. (m)–(r) Reconstructed images with the proposed method.
The reconstructed images with the traditional method are shown in Figs. 14(g)–14(l), while the ones with the U-net-based method are shown in Figs. 14(m)–14(r). The corresponding reconstruction distances are set as
The quantified assessment results of each section are also analyzed. The results are shown in Figs. 15 and 16, in which number of section denotes the
Figure 15.PSNR of the sectioning results.
Figure 16.SSIM of the sectioning results.
In conclusion, the U-net based method can be adapted to 3D objects with multiple sections. It outperforms the traditional method in removing defocus noise as well as recovering multiple sections. One can also deduce from the tables and figures in section 3 that the improvement of the PSNR ranges from around 5 dB to 20 dB, which indicates better sectioning results over the conditional method. However, the scenario of the object with complex sectional images still needs further investigation. This can be done either by increasing the training data sets or adjusting the training mode in the deep learning algorithm. These are all the future tasks to be done.
4. Conclusion
Speckle-like noise is generated by the random phase pupil in an OSH system, which is hard to reduce. This work verifies the feasibility of the U-net neural network for this task, which provides a new way for us to realize fast and effective defocus noise suppressing in OSH. Simulation results show that the proposed method works well both in simple and complex graphics. We believe that the proposed method can also be applied to other digital holography systems, especially for biomedical applications where it is hard to get enough training data sets.
References
[28] T.-C. Poon. Optical Scanning Holography with MATLAB(2017).
[29] K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition(2015).
[30] O. Ronneberger, P. Fischer, T. Brox. U-net: Convolutional networks for biomedical image segmentation(2015).
[32] D. P. Kingma, J. Ba. Adam: A method for stochastic optimization(2014).
[33] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., 15, 1929(2014).
[34] Y. Fisher. Fractal Image Compression(1995).

Set citation alerts for the article
Please enter your email address