【AIGC One Sentence Reading】:This study reviews recent progress in data-driven polarimetric imaging, exploring its diverse applications and integration with deep learning for enhanced computer vision tasks. The paper also analyzes input data, datasets, and loss functions, emphasizing the technology's future significance.
【AIGC Short Abstract】:This review summarizes recent progress in data-driven polarimetric imaging, emphasizing its diverse applications and the integration of polarization information into CNNs for enhanced computer vision tasks. It also analyzes input data, datasets, and loss functions, while discussing the importance of this technology for future research and development.
Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.
Abstract
This study reviews the recent advances in data-driven polarimetric imaging technologies based on a wide range of practical applications. The widespread international research and activity in polarimetric imaging techniques demonstrate their broad applications and interest. Polarization information is increasingly incorporated into convolutional neural networks (CNN) as a supplemental feature of objects to improve performance in computer vision task applications. Polarimetric imaging and deep learning can extract abundant information to address various challenges. Therefore, this article briefly reviews recent developments in data-driven polarimetric imaging, including polarimetric descattering, 3D imaging, reflection removal, target detection, and biomedical imaging. Furthermore, we synthetically analyze the input, datasets, and loss functions and list the existing datasets and loss functions with an evaluation of their advantages and disadvantages. We also highlight the significance of data-driven polarimetric imaging in future research and development.
Polarization is a fundamental physical property of light that expresses the characteristics of vector shear wave1, 2. When light interacts with objects or media, it exhibits various characteristics and representations of polarization that correspond to the intrinsic characteristics of the material. This unique quality is an additional dimension of information that has many applications in various fields, such as polarimetric descattering3-9, 3D shape reconstruction10-13, reflection removal14, 15, target detection16-20, biomedical imaging21-23, pathological diagnosis24-29, remote sensing30-33, and semantic segmentation34-36. Furthermore, harnessing the features of polarization opens new possibilities for research and technology development.
The conventional polarization method is limited in accurately capturing and utilizing information owing to complicated interactions in the transmission process. Moreover, convolutional neural networks (CNN) excel at nonlinear expression and information extraction based on large datasets, making them better suited for modeling and interpreting polarization information than traditional algorithms, which can bridge the gap between theory and practice37-41.
To overcome the lack of a physical theoretical model, the use of deep learning technology to optimize the information processing procedure has been proposed. Several experiments have proven its advantages in extracting potential features from images and determining the relationship between information transmission systems based on massive datasets. However, intensity information is commonly used in existing data-driven methods, which lose other three-dimensional information. In addition, the unicity of information would make the imaging performance suffer from various challenges; for example, in target detection tasks, spoofing targets and camouflaged targets can reduce the accuracy rate; in semantic segmentation, water hazards and metallic surfaces are the key challenges in the segmentation of road scenes; in pathological diagnosis, the only color information in medical images would increase the risk of misdiagnosis; and for transparent objects, parts of imaging processes would be challenging to implement, such as 3D reconstruction and segmentation in the intensity field. Therefore, the introduction of additional light-field physical information, such as polarization, is gaining increasing attention as a supplemental feature of objects for improved performance in higher-level visual tasks, expanding beyond intensity-only coverage. Therefore, polarimetric imaging and deep learning will contribute to future research and development. This review summarizes the existing methods of combining polarimetric imaging and deep learning and demonstrates the current gains visually and comprehensively.
The remainder of this study is organized as follows: First, we review the recent trends in data-driven polarimetric imaging from four aspects. Second, seven existing research fields are categorized and analyzed with their corresponding algorithms. Third, we discuss various algorithmic approaches used in each field. Next, we discuss three critical aspects of their existing datasets and loss functions and their advantages and disadvantages. Next, we discuss the strengths and weaknesses of the practical applications of data-driven polarimetric imaging and possible opportunities to address these challenges. Finally, we provide conclusions, highlighting the importance and potential of data-driven polarimetric imaging in various applications and research areas that could extend the current uses of data-driven polarimetric imaging and provide insights to advance its future development.
Data-driven polarimetric imaging
Short history
Polarimetric imaging, first observed by Sir Isaac Newton and Christiaan Huygens in the 17th century, is a technique used to capture and analyze the polarization properties of light. Moreover, they noticed that light could be separated into two polarized lights when interacting with calcite crystals. In the 19th century, scientists such as Malus and Fresnel made significant contributions to the understanding of polarization. Since its inception, polarimetric imaging has progressed and been applied in three essential fields in the 20th century: polarization microscopy, which allows scientists to study the birefringence properties of materials under polarized light, remote sensing and polarimetry, enhancing the detection and discrimination of objects based on their polarization characteristics; medical imaging; and tissue optical properties. In recent years, digital cameras and other innovative imaging techniques have become capable of capturing valuable information on surfaces, materials, and biological samples. Recent developments in computer vision have enabled the incorporation of polarization information into various tasks, offering potential advantages over traditional RGB imaging.
Deep learning, a subfield of machine learning, is inspired by the human brain. Artificial neural networks emerged during the 1940s and 1950s. Research on neural networks continued during the 1960s and 1980s, and perceptron and backpropagation algorithms were developed. However, neural networks face limitations owing to computational constraints and insufficient data. In the early 2000s, traditional machine learning techniques, such as support vector machines and decision trees, outperformed deep learning in many tasks, leading to a decline in interest in deep learning. After 2010, the proposal of a new network structure and improvements in computational power propelled the advancement of deep learning. Key developments during this decade include the use of CNN for computer vision and recurrent neural networks for natural language processing. With ongoing research, deep learning continues to evolve into more efficient training algorithms, model architectures, and applications across various domains, making it a central part of the AI field of artificial intelligence.
Until the 21st century, the combination of deep learning and polarimetric imaging has become more pronounced. In the mid-2010s, data-driven techniques were applied to polarimetric imaging. Machines and deep learning have been used for tasks, including object recognition in polarimetric images. Subsequently, researchers explored the potential of deep learning for processing and analyzing polarimetric data in more diverse domains because deep learning helps extract valuable information from polarization data. Additionally, the synergy between deep learning and polarimetric imaging continues to evolve with advancements in models, algorithms, and applications, forming the significant field called “data-driven polarimetric imaging”, which is the novel method that combines the learning method and polarimetric imaging. Data-driven polarimetric imaging can revolutionize various fields and represents a promising interdisciplinary research area and technological innovation. A diagram of this short history is shown in Fig. 1.
Figure 1.Schematic of short history about data-driven polarimetric imaging.
Data-driven polarimetric imaging is a novel approach aimed at compensating for the defects and difficulties of a single-information interpretation model. Recently, the advantages of combining polarimetric imaging and deep learning have been applied in several fields, and dozens of algorithms based on data-driven polarimetric imaging have been proposed, covering several application areas. This section describes more recent trends in data-driven polarimetric imaging from four perspectives. Schematics of the trends in existing data-driven polarimetric imaging are shown in Fig. 2.
Figure 2.Schematics of the trends of existing data-driven polarimetric imaging. The solid black arrows refer to the data flow, the dotted black arrows refer to that the data may or not flow. The gradient blue arrows mean the trends of corresponding aspects. (a) The utilization of polarimetric information has been gradually deepened from raw polarimetric images to preprocessing features. (b) The physical modes have been crucial during the training of network from end-to-end architecture to physical-model-combined structure. (c) The various physical properties have been gradually fused to network. (d) The tasks have been developed from image processing to semantic tasks.
With the exploration of data-driven polarimetric imaging, the application fields and utilization of polarization information have gradually increased. Regarding the input and use of polarization information, directly captured images from detectors, such as division of focal plane (DoFP) images42-45 or 0°, 45°, 90°, and 135°46-52 polarization images, are the most common inputs. It should be noted that, despite having four components, the last one is often neglected in practical applications, leading to the predominant use of only the initial three parameters. This is because, theoretically, the last parameter can be derived from the first three, thereby reducing the difficulty in image acquisition. In addition, polarimetric parameter features were calculated to represent the polarization information more intuitively, such as the degree of polarization (DoP) and angle of polarization(AoP)52-57, [S0, S1, S2]52, 55, 58, 59, Mueller matrix images60-63 and other combinations of these elements. Features based on the corresponding physical model during different tasks were also computed to guide network training, such as zenith and azimuth angle maps derived from specular and diffuse reflection64, 65 in 3D shape reconstruction tasks.
Furthermore, physical modes are crucial during network training, such as polarimetric descattering models, three-dimensional imaging models, Fresnel equations, Mueller matrix interpretation models, and other traditional polarization models. The preliminary methods are often end-to-end architectures43, 46, 47, 56, 64, 66-70, suggesting that the polarization information is input into the network to generate the desired outputs directly; however, the physical models are gradually guided46, 47, 49, 71 or integrated into network training48, 72-74. Furthermore, the physical model and its inverse process can form a self-supervised closed loop to achieve better performance. Furthermore, data-driven polarimetric imaging enables the physical interpretability of networks compared with conventional deep network methods. Polarimetric parameters were initially adopted to perform different tasks without exploring the hidden mechanisms. In more in-depth research on data-driven polarimetric imaging, the estimation of specific physical parameters based on the nonlinear representation of deep networks36, 58 and the physical interpretation of network layers63 have been gradually studied. This process contributes to future research and development. Based on the polarimetric information fused into the network, other physical properties of light have also been introduced in network training, such as spectrum172 and phase71, 75. Finally, the application fields tend to be semantic tasks based on convolutional enhancement or restoration imaging. Initially, image processing was the main field applied by data-driven polarimetric imaging, such as descattering imaging46-49, 66, 70, 72, 76, denoising45, 77, 78, demosaicing43, 44, 68, dynamic range enhancement79, reflection removal53, 73, 74, low-light imaging42, 50, and even 3D reconstruction shape64, 65, 67, 71, 80, 81. Next, semantic tasks appeared gradually similar to semantic segmentation56, 57, 69, 82, 83, camouflage object detection84, classification60, 61, pathological diagnosis62, 63, 85, 86, 87.
Applications of data-driven polarimetric imaging
According to the application field, existing data-driven polarimetric imaging methods can be classified into seven categories: polarimetric descattering, 3D shape reconstruction, reflection removal, restoration and enhancement of polarization information, target detection, biomedical imaging and pathological diagnosis, and semantic segmentation, as shown in Fig. 3.
Figure 3.Applications of data-driven polarimetric imaging.
Restoration and enhancement of accurate polarization information
Accurate polarization information is the foundation of imaging and its applications. In the real world, theoretical constraints and technological limitations lead to distortion of polarization information. Additionally, the polarization parameters of the nonlinear operators are sensitive to noise. Consequently, the effective restoration and enhancement of polarization information are crucial for subsequent applications. In this section, we analyze the limitations of polarimetry techniques and review the existing restoration and enhancement of accurate polarization information methods.
Polarimetry techniques
Polarimetry techniques are crucial in obtaining polarization information. Several sub-polarized direction images (0°, 45°, 90°, and 135° or 0°, 60°, and 120°) must be captured to obtain the polarization characteristics based on the Stokes vector model. However, the obtaining and calculating process, which acquires polarization information indirectly, introduces extra imaging noise to sharply reduce the accuracy of the polarization information. To date, there are four typical methods for measuring polarization images: division of time/rotating elements, division of amplitude88-90, division of aperture91, 92, and division of the focal plane93-95.
The division of time/rotating elements is the most common method, depending on the time-sequential activity. The polarizers and retarders were rotated, and measurements were made at different positions of the polarimetry elements. Furthermore, the time gap between operations may cause misregistration of polarization images in a dynamic scene or the motion of the camera case. The division of amplitude could capture multiple images simultaneously; however, the inherent drawback is that the intensity of images would decrease to less than a quarter of the intensity of the original signal, which causes the image contrast to decrease sharply and amplify the image noise. Moreover, misregistration must be solved. The aperture captures images simultaneously using four coaxial cameras with different polarization directions. Distinctly, it is expensive and has a fixed misregistration. The division of the focal plane is the method used in real-time polarimetric imaging, even in dynamic scenes. However, instantaneous field-of-view errors (i.e., mosaicking and low-resolution problems) affect the calculation of polarization parameters. Table 1 presents a comparison of the polarimetric elements.
Table Infomation Is Not Enable
After capturing the images using the aforementioned polarimetric techniques, Stokes vectors were adopted to display the polarization characteristics. The relationship between them can be characterized by the follow equation:
where Iφ is the image with the polarization direction φ. Then, the Stokes vectors can be calculated from the sub-polarization direction images96, 97:
where S0, S1, S2, S3 are the components of Stokes vectors. The parameters [S0, S1, S2] are the most common representations of the linear polarization components. S3 shows the circular polarization component. Therefore, the polarization characteristic parameters can be obtained using Eq. (3):
where P, PL, PC are DoP, degree of linear polarization (DoLP), degree of circular polarization (DoCP), respectively. is the AoP.
Furthermore, the Mueller matrix is another common but comprehensive parameter that describes the modulation of light after interaction with a material or media98-100. The Mueller matrix contained 16 elements using a 4×4 matrix. The Stokes vector of the output light Sout can be expressed by the Mueller matrix after the incident light Sin propagates through the medium as follows:
where m00 represents the transformation of intensity and the other 15 elements encode the vectorial properties of the object. Furthermore, Mueller matrix polar decomposition (MMPD)101, Mueller matrix transformation (MMT)102, Mueller matrix anisotropy coefficients (MMAC)103, and other decompositions of the Mueller matrix104-106 have been proposed to quantitatively characterize the properties of an object.
The estimation of these parameters, which represent different polarizations, is always based on nonlinear operations, which are unavoidable when introducing errors and noise. To achieve the desired performance, the calculated polarization parameters must be refined or restored to obtain highly accurate information. In addition, obtaining precise polarization information can be challenging when operating in special imaging environments, such as low light and noise, or with special imaging devices. In these scenarios, the disturbed polarization parameters amplify errors and negatively affect subsequent imaging performance.
In summary, the acquisition device for raw sub-polarization images, the use of nonlinear operators to calculate polarization characteristic parameters, and special imaging environments also decrease the inaccuracy of the polarization information. Therefore, polarization information with high accuracy must be restored and enhanced. In this section, we examine photostarvation, mosaicking, and noising to demonstrate the restoration and enhancement of polarization information, as depicted in Fig. 4.
Figure 4.The considered tasks of restoration and enhancement of polarization information, including low-light imaging, Muller matrix denoising, high dynamic range (HDR) reconstruction, polarimetric parameters denoising, demosaicing, transformation between holographic amplitude & phase and polarization channels, transformation between Stokes vectors and MMPD images.
Restoration and enhancement of polarization information methods
In photostarved environments, imaging always suffers from a low signal-to-noise ratio, which affects the imaging quality, making low-light imaging challenging. In polarimetric imaging, the accuracy of polarization information is degraded. Existing methods consider denoising, correction of color bias, and exposure time in intensity imaging107, 108. However, the difference between polarimetric and conventional intensity imaging has not been fully considered. Based on the power of being data driven, Hu et al. presented a one-to-three (intensity, DoLP, and AoP) hybrid network called IPLNet to simultaneously enhance the image quality of intensity and polarization information42, as shown in Fig. 5(a). The enhanced RGB image generated by the chromatic RGB subnetwork is divided into three channels, and each channel is fed into the polarization subnetwork to predict the polarization information. Enhanced results and corresponding comparisons with mainstream methods are shown in Fig. 5(b) with corresponding values of structural similarity index (SSIM) of each image. However, many parameters are generated in this network, which sharply reduces the operational efficiency. The image color is also inaccurate. Therefore, Xu et al. first performed initial denoising and color deviation correction of four polarization orientation images by network named ColorPolarNet and used the polarization difference network to enhance intensity details, DoLP, and AoP maps50. The results demonstrate that the proposed methods have faster processing speed and better performance regarding signal fidelity, contrast enhancement, and color reproduction50, 109. Compared with IPLNet, ColorPolarNet demonstrates slightly superior performance and achieves higher peak signal-to-noise ratio (PSNR), SSIM and patch-based contrast quality index (PCQI) in terms of S0, DoLP, and AoP. Additionally, ColorPolarNet achieves notably lower color difference (CD), indicating reduced distortion compared to IPLNet. In processing speed, ColorPolarNet (2.88 s) is more than twice as fast as the IPLNet model (6.10 s).
Figure 5.Enhanced method of polarization information. (a) Architecture of the Hu et al.’s method. (b) Enhanced results and corresponding comparisons with mainstream methods. Figure reproduced with permission from ref.42, Optical Society of America Publishing AG.
Division of the focal plane polarimeter is one of the most common polarimetric imaging sensors that can instantaneously capture dynamic polarization information. Each individual pixel with different polarization orientations is situated in a 2×2 superpixel, recording only one from the four essential intensity measurements; therefore, demosaicing and reconstruction of full-resolution and accurate polarization information are indispensable. Zhang et al. proposed a convolutional demosaicing network called the PDCNN to learn end-to-end mapping between coarse interpolation results and full-resolution polarization images68, which is the first typical demosaicing architecture, as shown in Fig. 6(a). The bicubic-interpolated results are used as the input, which introduces an interpolation bias, resulting in inaccurate reconstructed results. After comparison with several mainstream methods, the reconstruction results of DoLP and AoP outperformed those of other methods, as shown in Fig. 6(b) and 6(c) with corresponding values of PSNR of each image110-113. Zeng et al. proposed a four-layer end-to-end fully convolutional neural network that directly learns mapping from DoFP to three polarization properties: intensity, DoLP, and AoP43. However, the noise in the AoP images remained significant. Wu et al. provided a more physically relevant loss function for the angle of linear polarization (AoLP) reconstruction, establishing a two-stage lightweight approach for reconstructing the intensity and polarization information in real time44. The improved version meets the demand for real-time inference. Wen et al.114, Sargent et al.115, Sun et al.116 and Pistellato et al.117 also proposed the data-driven demosaicing methods to ensure the fidelity of polarization signatures and enhance image resolution. Besides deep learning-based methods, other approaches also yield favorable results in this domain, such as the sparse tensor factorization-based model, which introduces a combination of tensor factorization and sparse coding for the first time118.
Figure 6.The convolutional demosaicing network proposed by Zhang et al. and results68. (a) Architecture of the Zhang et al. method. (b) Reconstructed DoLP images of different methods on the test images. (b1) The results of method113. (b2) The results of method110. (b3) The results of method111. (b4) The results of method112. (b5) and (b6) are respectively PDCNN and ground truth. (c) Reconstructed AoP images of different methods on the test images. (c5) and (c6) are respectively PDCNN and ground truth. Figure reproduced with permission from ref.68, Optical Society of America Publishing AG.
Because the polarimetric parameters are always derived from the measured intensities through nonlinear operators, which would amplify the noise for the AoP, removing noise to precisely restore the polarization information is a significant task107, 119, 120. CNN have distinct advantages regarding extracting image features and hidden structures; thus, they are suitable for image restoration and enhancement in noisy environments. Li et al. employed deep neural networks to significantly suppress the noise in polarimetric images and enhance the image quality45. However, all channel-wise features are treated equally, resulting in a lack of flexibility. Inspired by the attention mechanism, Liu et al. proposed an attention-based residual neural network to remove noise and restore the polarization information of polarimetric images77, as Fig. 7. Therefore, the proposed methods can suppress noise more effectively and restore polarization information more accurately, as shown in Fig. 7(b)109. Additionally, SSIM is used to compare the quality of images obtained by different methods. Focusing on the denoising of Mueller matrix images, Yang et al. built a deep residual U-Net that incorporated channel attention with many paired low- and high-SNR Mueller matrix images78. The ground truth is obtained based on the low equally weighted variance (EWV), which can be expressed as:
Figure 7.Liu et al. method. (a) Architecture of the proposed method. (b) Enhanced results and corresponding comparisons with mainstream methods. Figure reproduced with permission from ref.77, Optical Society of America Publishing AG.
where is the variance of Gaussian noise and N denotes the number of states of polarization. The larger the value of N, the higher is the signal-to-noise ratio. The proposed method can effectively resolve the conflict between the measurement accuracy and acquisition time.
The fused images contain more information than single physical properties because the mixed information describes various characteristics. The fusion of polarization and intensity is the most common method used in practical applications121-125. The intensity image describes the reflectivity and transmissivity of the object, whereas the polarization image describes the texture details, material properties, shape, shading, and roughness. These two types of images provide complementary information from different aspects to obtain images with rich physical features and improve the performance of practical tasks. Conventional fusion methods are challenging to handle various scenes because a manually designed fused factor is adopted. Based on the excellent performance of CNN, Zhang et al. proposed an unsupervised deep network called PFNet to fuse the intensity and DoLP images126. The feature extraction module transforms the images of S0 and DoLP into high-dimensional nonlinear feature maps using two Dense Blocks, and the concatenation operator is used to fuse the feature maps to reconstruct the fused image using the reconstruction module. The architecture of PFNet is shown in Fig. 8(a). Therefore, the method based on deep learning outperforms other state-of-the-art methods, as shown in Fig. 8(b)122-125. SSIM is used to compare the quality of images obtained by different methods. Contrarily, they modified the architecture to enhance performance. A Dense Block is used to encode the input images and the fusion subnetwork rather than a concatenation operator to fuse the feature maps127. New loss function strategies are adopted, such as the loss between fused and input images and the loss between fused and encoded features. The proposed architecture can also be used for infrared and visible image fusion, and multi-focus image fusion.
Figure 8.The unsupervised deep network PFNet proposed by Zhang et al.126 (a) The architecture of PFNet. (b) Fused results compared with conventional methods. Figure reproduced with permission from ref.126, Optical Society of America Publishing AG.
Because of the limitations of the response function of the camera, a digital camera always captures only a limited fraction of the range, resulting in low-dynamic-range images with over- or underexposed areas that cannot reflect real-world scenes in high-dynamic-range images128, 129. Ting et al. studied the relationship between polarization parameters and the exposure time of a pixel in a polarization image and trained the reconstruction framework to recover a high-dynamic-range image using polarization images79.
Other applications of restoration and enhancement are based on data-driven polarimetric imaging. The polarization parameters and other physical properties are interconvertible. Liu et al. used a deep neural network to transform holographic images reconstructed from a single state of polarization into images equivalent to those captured using a single-shot computational polarized light microscope75. Si et al. fed Stokes images to a well-designed deep learning network to generate Mueller matrix-based parameter images, such as linear retardance and diattenuation parameters58.
Polarimetric descattering
Clear vision in scattering media is critical for various applications such as industrial and civil fields130, traffic surveillance systems131, automatic drives132, remote sensing133, rescue operations134, seabed mapping135, monitoring of marine species migration and coral reefs, and scene analysis136. However, when capturing images in a scattered environment, the visibility of objects is typically sharply degraded, which is caused by the scattering of suspended particles such as clouds, water, haze, smoke, smog, fog, and mist in the air, soil particles, floating excrement of marine animals, algae, and mineral salt in underwater scenes. The backscattered light was mixed with the object signal during its propagation towards the camera. Because the physical properties of imaging in haze and underwater environments are similar, we analyzed the dehazing and descattering processes based on deep learning and polarimetric imaging.
Polarimetric imaging model in scattering media
Based on the atmospheric transport model137, 138, image I captured by the camera after propagation in scattering media consists of two components: (1) direct transmission signal D, which represents the light that an object reflection is scattered by the media during propagation towards the camera; and (2) backscattered light A, which denotes the light backscattered by the particles in the object light line of sight without an object signal139. The model can be expressed as:
First, as the light reflected from the object propagates towards the camera via the medium, the object radiance suffers from absorption and scattering, yielding a degraded signal. This process is described as follows:
where z is the distance between the object and the camera; β is the degraded coefficient, and Lobject is the original object signal not attenuated by the scattering media along the line of sight. The transmitted process e–βz is the delay of exponential function, which also expressed as t.
Second, backscattered light A is an undesired component that veils the object light to reduce the contrast of the image. The backscattered light can be expressed as:
where A∞ is the saturated backscattering light as the distance increases. We aimed to reconstruct the original object signal Lobject by combining Eq. (6) and (7). Then, Lobject can be expressed as:
as a result, the estimation of two unknown components A and A∞ is key to reconstructed Lobject. Schechner et al. proposed a polarization descattering model based on an atmospheric scattering model. The proposed method obtains two orthogonally polarized images, and through two orthogonal polarization states of the polarizer139. This method assumes that the air light is usually partially polarized, whereas the object is not polarized. Thus, the captured images can be described by:
Because the object is not polarized, the DoP of the images is equal to that of the backscattered light. Furthermore, the backscattered light is estimated using the polarization model:
where PA and P are the DoPs of the backscattered light and captured images, respectively. The saturated backscattering light A∞ is the mean of the background region selected in the image where there is no object and the PA is calculated by the selected region which is the DoP of total backscattering light. Finally, the reconstructed Lobject can be expressed as:
A polarimetric imaging model is a physical, low-cost, and effective method for restoring clear images in a scattering environment. However, the method is based on the following assumptions:
1) The backscattered light at infinity is uniform, but in the real world, clouds, the solar radiation angle, and other factors may influence the distribution.
2) The DoP of the object is not polarized, but it does not make sense.
3) The DoP of the backscattered light is constant; however, it must be spatially variable.
4) The polarized direction of the image is equal to that of the backscattered light and the object signal, but there are possible differences.
5) The degraded coefficients of backscattering light and object signal are similar; however, Akkaynak et al.’s study has proved that attenuation coefficient of object signal depends on the distance z, reflectance ρ, spectrum of ambient light E, spectral response of the camera Sc, and beam attenuation coefficients of the water body βb; however, the backscattering light is related to E, Sc, βb and scattering of the water body b84, 140, 141.
Several improved methods focusing on this insufficiency have been proposed. For example, Huang et al. estimated the polarization difference image (PDI) of an object signal using feasible region fitting to overcome the limitations of the second assumption. Hu et al. estimated the spatial distributions of the DoP of an object and backscattered light by extrapolation fitting to overcome the first and third assumptions. However, selecting the fitting function when the light was irregular was challenging142. Wei et al. considered the difference between the AoP of backscattering light and the object signal, using independent component analysis (ICA) to estimate the object signal with nonuniform polarization characteristics to avoid the limitation of assumptions (3) and (4)143. These methods are based on physical models that lack the robustness and effectiveness of a single method in complex scenes because accurate estimation of key parameters may not be achieved. However, the latent factors influencing image quality have not been explored.
Deep learning methods based on CNN are adept at extracting hidden features and fitting the nonlinear relationships between backscattering light and object signals. Thus, it is a promising choice for descattering and dehazing. Polarimetric data-driven descattering methods combine polarization information and deep learning. Moreover, the existing model can guide the training of the descattering network to combine its advantages. Three different pipelines were used in the data-driven polarization descattering method. First, the end-to-end architecture without the physical model, which always uses the polarization images to feed into the network. Second, the physical-model-guided network methods, which are guided by the existing or proposed but non-participation in network training; Third, the physical-model-integrated network methods, which integrate the physical model into the network to train the descattering network together. The following section provides a detailed explanation of these three perspectives.
End-to-end descattering network
The end-to-end architecture is a common structure that enhances image quality and relies on higher-order nonlinear representations. The descattering process of a network is the fitting of the descattering transmission of a high-order function, which has been successfully applied to complete scattering removal using intensity information136, 144-146. In recent studies, the introduction of polarization information has proven that the input of polarization images into the network can improve the quality of qualitative and quantitative evaluations. This section discusses the polarization end-to-end descattering network.
In 2020, Hu et al. first employed a deep learning technique in polarimetric underwater imaging46, as shown in Fig. 9(a), which is a typical end-to-end descattering architecture. The three dimensional inputs of different linear polarization orientations with 0°, 45°, 90° are fed into an end-to-end polarimetric dense network (PDN) rather than intensity images, which contains three main components: Shallow Feature Extraction (SFE) used to extract shallow polarization features, Residual Dense Block (RDB) as the basic structure to connect with each other, and concatenated by Dense Feature Fusion (DFF). The third crucial component was utilized to fuse all the features and output the descattering results. A dataset containing abundant polarization image pairs was built using a commercial DoFP camera, and a water tank filled with milk was used to capture turbid and clear object signals. The same network structure, based only on intensity images, was trained to verify the significance of the polarization information. Compared with the intensity network and existing methods, the polarization network demonstrated higher values of image contrast (IC), measure of enhancement (EME), PSNR, and SSIM, which indicate higher image quality6, 147, as shown in Fig. 9(b).
Figure 9.The flow chart and results of the Hu et al. method, which is a typical end-to-end descattering architecture46. (a) The architecture of proposed method. (b) Comparison of the enhanced images by different methods. Figure reproduced with permission from ref.46, Elsevier BV.
Another end-to-end architecture was proposed by Zhang et al., which contained four pairs of networks consisting of polarized and gray versions. Furthermore, this method assumes that the intensity and polarization images are information streams, two types of information flows in their respective networks, and join together at the end of the model. Furthermore, the addition of polarization information to gray information and feeding these two parts into fused networks, called DENSE-U-NET BLOCK, at the forefront of the network could achieve better results than the front or end of the network70. The experiments with different turbidity levels performed better than the other methods and demonstrated the excellent robustness of the proposed method.
In remote sensing, the scattering medium significantly impacts on the results of object reconstruction, even producing speckle patterns. Obtaining the correspondence between the original object and the imaging process is challenging and crucial. Li et al. combined an object’s polarization information with a modified U-net-based deep learning network (MU-DLN) to retrieve the original object’s information influenced by the scattering medium66. Data were acquired using a Monte Carlo simulation system and deep learning technology to learn a physical model of the scattering process. The experimental results show that the object’s information for the Q-component can be reconstructed very well because of the suppression of scattering light and highlighting of ballistic light. Several fixed optical thickness environments are tested to reflect the superiority of the trained MU-DLN.
Physical-model-guided descattering network
The physical-model-guided descattering network is trained to remove the scattering effect guided by the physical model but does not participate in network training. Based on the physical model, the theoretical feasibility was proven before guiding the design of the network architecture. Therefore, the middle stage of combining the physical priors and models is instructive for further fusing the physical model into the design of the network pipeline.
Guided by a physical imaging model, Ren et al. trained a lightweight dehazing CNN to rapidly process turbid images, comparing it with conventional dehazing methods and introducing additional circular polarization information148. Furthermore, this is the first time circular polarization information has been fed into a network. Two unknown parameters in the polarimetric descattering model resulted in an underdetermined function by directly generating these parameters. Therefore, the proposed method combines these two parameters into a single formula to avoid an underdetermination problem and minimize the reconstruction error. The new parameter K is expressed as:149
The Eq. (12) can be rewritten as an , after which the descattering process is viable once the parameter K is obtained. Tests were conducted in different turbid environments to verify the feasibility of the proposed method, and the results indicated the effectiveness and high efficiency of the lightweight architecture. Subsequently, Ding et al. adopted a multi-polarization fusion adversarial generative network to enhance turbid images47. Compared with the conventional model, the proposed method introduces an angle of polarization to calculate the backscattered light, expressed as:
where the is AoP of backscattering light computed by the selected background region. They built the first color polarization image datasets in the natural underwater environment, which selected the visually better enhanced results among the results produced by several conventional methods150-154. Compared with the underwater results by four supervised data-driven polarimetric methods mentioned above, the experimental results in laboratory simulated by the milk have huge improvement; however, the natural results are improved to a lesser extent because the complicated environment would increase the diversities of known or hidden parameters and then result in the networks become more generalized with higher robustness but lower performance in a specific example.
Physical-model-integrated descattering network
The physical-model-integrated descattering network integrates the descattering model into the network as the backbone to guide the descattering process, which introduces constraints compared to the physical-model-guided descattering network. Therefore, the main task of the network is to generate or refine specific parameters before they are utilized to generate improved results. Furthermore, the physical model and its inverse process can form a self-supervised closed loop to achieve improved performance.
To further combine the physical formation model with deep learning methods, some researchers have embedded existing dehazing approaches into the proposed pipeline. Zhou et al. proposed a robust polarization-based dehazing architecture with a generalized physical formation model that requires no specific clues to estimate the required physical parameters or handcrafted priors48. Figure 10(a1) and 10(a2) show the network architecture and corresponding example with evaluation index(PSNR and multi-scale SSIM). The transmitted light D (T in Fig. 10(a1)) and original scene radiance Lobject (R in Fig. 10(a2)) can be calculated using the following equations:
Figure 10.The methods combining the physical formation model with deep learning methods and corresponding examples. (a1) Zhou et al. method and (a2) the corresponding example. (b1) Shi et al. method and (b2) the corresponding examples.
where PT and PA define the DoP of the transmitted light and backscattered light, respectively, which are estimated by the subnetworks. The symbols I and P are the haze image and its DoP, respectively, which can be calculated from the polarized images. The DoPs of the object signal and scattered light can be generated by subnetwork g1 before estimating the transmitted light using the imaging model. The refined light transmitted by subnetwork g2 is utilized to calculate the original scene radiance, in which the saturated backscattered light is obtained by subnetwork g3. Finally, the refined subnetwork g4 was adopted to generate the refined results. The raw polarization direction images join each subnetwork.
The generation of the synthetic dataset was instructive. Clear images with depth and semantic segmentation maps must be provided for the generation process. The Foggy Cityscapes-DBF dataset was eligible, and reasonable values of the corresponding parameters were set to generate the synthetic dataset155-157. Gaussian noise was introduced to make them spatially variant and conform to real-world scattering conditions to improve the robustness of the network158, 159.
Contrarily, Shi et al. processed a polarization-based self-supervised dehazing network called PSDNet to eliminate the influence of haze on images72. Figure 10(b1) and 10(b2) show the proposed network architecture and corresponding results, with assessment criteria(IC and entropy-based no-reference image quality assessment (ENIQA)) which consist of three subnetworks to compute the object radiation, transmitted light, and backscattered light. The pipeline processed a self-supervised closed loop to optimize the network. The end-to-end descattering network is part of the total pipeline, which effectively reduces the scale of the network and enhances performance. Additionally, a secondary product with an accurate transmission map was produced, which may be helpful for other computer vision tasks, such as 3D reconstruction. Several experiments demonstrated that the proposed architecture can effectively improve the visibility of object details and is highly robust for the scene.
Because it is hard to capture the ground truth corresponding to the object in the underwater environment, unsupervised learning polarimetric underwater methods were proposed. Zhu et al. synergistically make use of an untrained network and polarimetric imaging formation model to recover images from scattering in underwater scenarios without requiring additional datasets49. There are two stages during the network training. First, the raw input images are input to the network to generate the optimized image. Furthermore, using the imaging formation model to recalculate the degraded image. After the circular process, the loss of raw image and calculated image is adopted to optimize the network:
where w is the weights, Θ is parameters of image formation model including {α, η, ps°, S∞}. These parameters are estimated using a neural network, where the initial value is selected by traditional methods. Specificly, α is a compensation factor for the water absorption is set to 0.99 as the subject was placed in a comparably shallow position with pure water in the underwater environment, where the influence on the absorption of natural light could be ignored. η is a bias factor and ps° is degree of the scattering light, where the range of η is from 1 to 1/ ps°. The measurement value of ps° is set to 0.8333 and η=1.13 to meet the range of 1 ≤ η ≤ ps°. S∞ is the scattered light radiation at an infinite distance estimated by the intensities of the brightest region of the no object.
The proposed methods not only overcome the acquisition of polarization characteristics of the environment and object in the conventional process but also minimize the dependency on datasets even when training on only one image. Moreover, the mismatch between the model and a real scene lacking environmental priors is significantly reduced. Figure 11(a) and 11(b)49 show the proposed architecture and a visual comparison among the different descattering methods147, 160-163 together with statistical index of image contrast. The method represents a pioneering attempt in the realm of unsupervised descattering imaging. However, its capacity for enhancing imaging outcomes remains somewhat limited.
Figure 11.Unsupervised underwater descattering method. (a) Architecture of Zhu et al. method49 and (b) visual comparison among different de-scattering methods. Figure reproduced with permission from ref.49, Optical Society of America Publishing AG.
Yang et al. trained a network to inpaint backscattered light with different polarization orientations, which was used to calculate the DoP and AoP of the background light76. Furthermore, this is another unsupervised method that does not require clear ground truth. The primary task of the proposed method is to calculate the complete backscattered light. After removing the region of the object in the captured polarized images using the GrabCut algorithm, the incomplete image, randomly erasing a region, is input to the network to generate a mission region before using a filtering method that compares the gray value of one pixel with those in other nearby pixels and replacing the singular point with the averaged value in a 7×7 square. Furthermore, a clear image is calculated using the following modified function:
where the ε is bias factor. Consequently, the object radiance was optimized and recovered based on the underwater image recovery process. The proposed method has a much lower cost for preparing the training datasets and demonstrates the capability of recovering underwater images under different nonuniform optical fields. The flow chart, proposed architecture of the Yang et al. method76, corresponding results, and comparisons with existing methods142, 147, 164, 165 are shown in Fig. 12.
Figure 12.Unsupervised underwater descattering methods and corresponding results. For the equations and variables, see the ref.76. (a) The flow chart. (b) Proposed architecture of Yang et al. method. (c) The results of generated backscattering light; (c1) the captured polarized image; (c2) the generated backscattering image of proposed trained network; (c3) the backscattering image after smooth filtering; (c4) the corresponding ground truth. (d) Final descattering results and comparisons with other methods. Figure reproduced with permission from ref.76, Optical Society of America Publishing AG.
Data-driven polarization descattering methods have been gradually introduced into physical models to guide network training, which resolves the limitations of traditional methods. Data-driven methods learn more comprehensive features and adjust them for more complex media. Furthermore, it improves the training performance and provides another training process for a self-supervised closed loop to optimize the network. Additionally, the physical model in the network may generate extra parameters that are probably helpful for other tasks, such as depth maps, backgrounds, and inherent coefficients of media. The acquisition of dataset is key for network training in scattering media. The synthetic and generated methods may solve the problem despite remaining crucial and challenging in the future.
Three-dimensional shape reconstruction
By analyzing the interactions between light and surface geometry, we can reconstruct the 3D shapes of objects166, 167, where polarization is crucial. Natural illumination becomes partially polarized after reflection from an object’s surface. Polarized reflection implies shape information because the Fresnel equations relate the DoP, AoP, and micro-surface zenith and azimuth. An intrinsic drawback of deriving a shape from polarization is the ambiguous estimation of surface orientation. The suitable arctangent function in the model results in a multivalued azimuth, commonly known as azimuth ambiguity. Cues from various aspects, such as geometry168-170, spectrum171, 172, photometry173-176. However, relying only on a physical-based imaging model, recovering the shape with high accuracy remains challenging under nonlaboratory conditions. The excellent nonlinear representation ability of deep neural networks can narrow the gap between ideal and real-world conditions. This section reviews the existing 3D shapes from polarization methods combined with deep learning (DL).
Principles of polarization 3D shape reconstruction
The surface shape changed the polarization states of the incident illumination, providing the possibility of recovering the shape from polarization. In polarization detection, polarization information can be obtained using a camera and a rotated linear polarizer mounted in front of it or a camera with a pixelated polarizer. The captured image intensity varied sinusoidally.
where ϕpol denotes the angle of polarizer axis relative to a chosen reference orientation, ϕ denotes the azimuth angle of micro-surface of an object, Imax and Imin refer to the observed maximum and minimum intensity during ϕpol from 0 to π. A whole 2π period of sinusoidal function results in a π ambiguity. For instance, if ϕ=ϕpol, the maximum intensity is obtained. However, the minimum intensity corresponds to two azimuth angles, i.e. ϕ±π/2. The π-ambiguity problem is the one of shape from polarization.
The polarization state of the reflected light directly depends on the reflection type occurring over the surface, which is primarily specular or diffuse reflection, as shown in Fig. 13.
Figure 13.Polarization of specular reflection and diffuse reflection. (a) Specular reflection. (b) Diffuse reflection.
The Fresnel equations describe how incident light changes when propagating in media with different refractive indices. When specular reflection dominates, the DoP of the specular reflection is calculated using the Fresnel reflection coefficients:
Combined with the Fresnel function, it has the expression in Eq.(20)177:
where n denotes the refractive index, θ refers to the zenith angle, assuming that ηi =1 because in most conditions light is incident from air. The refractive index of specular surface is denoted ηt = n. Because the azimuth angle is perpendicular to the phase of the specular polarization178, leading to the π/2 shift of azimuth angle. This is another ambiguity problem regarding shape owing to polarization.
Material object reconstruction is more complicated than that of regular specular surfaces. The refraction index of metal is a complex number defined as , where κ is the attenuation coefficient. And then Eq. (20) can be derived as:
Diffuse reflection originates from the light refracted by the shallow surface of an object, in which it is partially polarized owing to the irregular interactions between the light and the interior particles. Therefore, the DoP of diffuse reflection is determined by the Fresnel transmission coefficients. The relationship between the DoP and the Fresnel coefficients of diffuse reflection is defined as:
Under these conditions, light is refracted to the air in the object. Therefore ηt=1, i.e. the refraction index of air, and ηi=n, i.e., the refraction index of diffuse surface. Eq. (23) can then be derived as:
Given the azimuth angle and zenith angles θ, the normal vector of shape surface at any point could be expressed as179:
where nx(u), ny(u), nz(u) are the normal vectors of element surface u. The normal vector can be expressed using the surface gradient, as shown in Eq. (25)
where , , i.e., . Finally, the surface shape z(u) was reconstructed.
The shape of the polarization differs according to the type of reflection. Compared to specular reflection, the DoP and zenith angles for diffuse reflection are one-to-one maps. However, diffuse reflection makes the reconstruction process challenging because of the lower signal-to-noise ratio and higher dependence of the DoP on the refractive index. For specular reflection, the mirror-like surface maintains a relatively uniform direction and phase, thus avoiding the influence of random noise. However, specular reflection leads to more ambiguous problems than diffuse reflection does. Moreover, it is challenging to find an object when the viewing direction exceeds the reflection direction. Table 2 compares the advantages and disadvantages of the shape from polarization based on both specular and diffuse reflections.
Table Infomation Is Not Enable
The shape of the polarization depends on the estimated azimuth and zenith angles. Ambiguity is a critical challenge. Regarding the azimuth angle, two phase angles with a π shift are derived from the period of the sinusoid function. For specular reflection, the azimuth angle would be retrieved with ± π/2 operation. However, the two zenith angles of the ambiguous solution are determined for a given DoP in the specular reflection, which cannot be excluded without other information. These contributing factors result in high error rates and limitations in the generalization to mixed materials and lighting conditions using only polarization images.
In addition to the ambiguity problem, other limitations to the shape owing to polarization exist. For example, estimating the zenith angle requires an unknown prior for the refractive index, which limits the reconstruction of complex objects and natural scenes. Second, when the zenith angle is close to zero, the influence of noise increases because the DoP is small. Third, mixed reflections were common in real-world scenarios. Moreover, achieving satisfactory reconstruction results in complex scenarios by using the linear superposition of a single physical model is challenging. Finally, the discontinuous depth is also a significant challenge for recovering the shape from the derived surface normal by integration. Consequently, the introduction of other information is essential to avoid the problems mentioned above and expand the application fields.
Typical methods include a combination of heuristic priors, such as the boundary and convexity of objects180, shading181, and photometric stereo64, but noise is a major limitation. This complicated calculation amplifies the noise, leading to a degraded texture or profile in the recovered shapes. Data-driven imaging is powerful for 3D imaging with a nonlinear modeling ability. Furthermore, this primarily depends on the semantic information of the image. Under the guidance of the physical model, this brings new possibilities for shapes from polarization.
Data-driven shape from polarization in single reflection
In certain situations, the reflection can be purely specular or diffuse. For instance, in human face recognition or clothed body reconstruction tasks, the skin, clothes, and other human tissues are diffuse surfaces, and specular reflection is negligible. In transparent object reconstructions, specular reflection dominates.
For 3D clothed human shape reconstruction with clothing details, Zou et al. introduced polarization images and two ambiguous normal maps into the designed network65, as shown in Fig. 14. Specular reflection was omitted because of the rough surface of the clothing. Owing to the azimuth ambiguity problem, two possible maps were resolved and input into the network as physical priors. The two ambiguous normal maps, n1 and n2 are classified into three categories: n1, n2, and background. Each pixel point was classified as belonging to one of these three categories and then merged into n3 using Eq. (26) with probabilities p0, p1 and p2.
Figure 14.The proposed architecture of the Zou et al. method.
The final surface-normal prediction is refined using a denoising network. The smoothed normal concatenates the fused normal and raw polarization direction images as the input to accurately estimate the surface normal. Subsequently, the skinned multi-person linear (SMPL) representation182 and deformation stage were used to reconstruct the refined 3D human shape with clothing details rather than naked.
Regarding diffuse-reflection-dominated cases, such as human face reconstruction, Han et al. proposed a learning-based method for passive 3D face reconstruction from polarization183, as shown in Fig. 15(a). Furthermore, it derives the ambiguous normal of each microfacet over the face at the pixel level based on the polarization of the diffuse reflection. The CNN-based 3D morphable model (3DMM) generates a rough depth map of the face based on a directly captured polarization image, and is used to amend the ambiguous polarization normal and further reconstruct an accurate 3D face using Frankot–Chellappa 3D surface restoration functions. Figure 15(b) illustrates the final results, including a male face under indoor lighting, a male face under natural outdoor illumination, and an indoor plaster statue. The 3D rendering features fit well with the original appearance, and the lighting conditions had little influence. The experiments also demonstrate the benefits of introducing deep learning into 3D polarization reconstruction.
Figure 15.Han et al. passive 3D polarization face reconstruction method. (a) Overall schematic of the proposed method. (b) 3D polarization face reconstruction results. Figure reproduced with permission from ref.183, Multidisciplinary Digital Publishing Institute.
The transparent objects exhibited typical specular reflections. Shao et al. proposed a multibranch fusion network to reconstruct 3D shape-transparent objects from specular reflections81. However, transmitted light is often a diffuse reflection. Therefore, separating the transmitted light is critical. The AoLP features indicate stronger background noise in areas with higher transmittance. The closer it is to the center area, the stronger the noise, as shown in Fig. 16. The physical prior confidence concept is based on intrinsic faults in the AoLP maps of transparent objects.
Figure 16.The physical-based prior confidence map according to the differences of polarization characteristics between transparent object and background.
where pi,j represents the pixel values in the K×K neighborhood of the point (x, y), is the mean of the pixel values in this area, m is the smoothing exponential term. H and W denote the height and width of the map, respectively. This physics-based prior confidence map is then input into the network as an attention map to guide the fusion of the original-polar (DoLP and AoLP maps) and physics-based prior (four ambiguous maps). The proposed method achieves optimal performance and provides a new perspective for further transparent shapes from polarization research.
Data-driven shape from polarization in mixed reflections
Mixed reflections stemming from the two primary conditions were prevalent in the natural scenarios. First, the reflectance of the surface determines the type of reflection: specular reflection or diffuse reflection. For materials such as ceramics, plastics, and lacquers, specular reflection dominates the highlighted areas, whereas diffuse reflection dominates the other areas. Second, objects made of different materials create reflections that vary in each segmented area. Moreover, this problem can also be solved by reconstructing each area separately; however, the segmentation algorithm and stitching of the 3D shape of an object are huge challenges. Neural networks provide solutions to fuse explainable or inexplicable features with mixed reflections, relying on their excellent nonlinear representation ability.
The first method combines deep learning and polarization-reconstructed models; Ba et al. fed polarization images and ambiguous normal maps into the network and trained the network to learn the effective inputs from training data automatically64. The inputs were four images captured with a polarizer, and the ambiguous normal maps consisted of one diffuse and two specular ambiguous maps. The proposed method achieved the lowest test error on the tested data under the three lighting conditions compared with conventional methods.
Based on a polarimetric Bidirectional Reflectance Distribution Function (pBRDF) model and real polarization scene rendering, Kondo et al. applied rendered polarized images to train a network for an accurate surface normal estimation71. A physics-based renderer was built to simulate the polarization behavior of the rays based on the proposed pBRDF model for each material. Furthermore, it can correctly reproduce the polarization property, including the inter-reflection effect, in real-life scenes. Therefore, the synthetic-colored image and simulated polarization information, such as the phase and DoP, were fed into the CNN to estimate the surface normal. The detailed process and reconstruction results are shown in Fig. 17(a).
Figure 17.The 3D shape reconstruction methods based on the pBRDF. (a) Y Kondo et al. method and the reconstruction examples. (b) V Deschaintre et al. method and corresponding results.
The proposed pBRDF model is described by the angle of incident light, plane, reflection angle, camera direction, and half-vector, which allows accurate transmission Mueller matrix modeling for arbitrary cameras and lighting positions. Specular and diffuse reflection models were established separately. The Mueller matrix of specular reflection considers the rotation matrix of light in the incident plane, Fresnel elements, delay metrics, and the rotation matrix of light in the camera, as described by Eq. (28),
where , , and are Fresnel reflection coefficients. The other coefficients are elements of the rotation matrix. Accordingly, the Mueller matrix of diffuse reflection contains a rotation matrix of light into the incident plane, two Fresnel elements into and out of the surface, a depolarization matrix, and a rotation matrix of light into the camera, as denoted by Eq. (29),
where and are Fresnel transmission coefficients. The final normalized Mueller matrix is the linear superposition of the reflection matrix and depolarization matrix mentioned above, representing the diffraction and scattering characteristics of the light inside the materials. The linear superposition process can effectively simulate the mixed reflections. A generalized Lambertian reflection distribution function model was used to parameterize luminance and linear combination coefficients. Through the final optimization, all parameters can be calculated. Experiments show well-rendered results close to the real ones used to generate polarized images as synthetic datasets. This study guides the establishment of 3D reconstruction of polarized datasets and encourages exploration to accurately transmit the interaction process between polarized light and objects, even in the entire scenario.
Similarly, Deschaintre et al. coupled polarimetric imaging with a CNN to estimate the 3D shape and calculated the Spatially Varying Bidirectional Reflectance Distribution Function (SVBRDF) using single-view polarimetric imaging under frontal flash illumination80, as shown in Fig. 17(b). U-Net contains three branched decoders to generate the 3D shape: 1) surface normal and depth maps, 2) spatially varying reflectance properties as diffuse, and 3) specular albedo maps and specular roughness maps. Next, it was fed by the flash image, normalized diffuse color, and the Stokes map computed by the polarization image results, which were plausible, and the proposed method captured the real appearance of the inputs. However, as the lighting or object becomes gradually complex, for instance, multi-illumination, multiple objects with blurred details, etc., the methods based on BRDF71, 80 will suffer poor recovery, and accurate simulation of mixed polarization remains an open challenge.
Mixed reflection dominated the outdoor scenario with multiple objects with different refractive indices. Lei et al. constructed the first real-world SfP dataset for complex scenes to train a network67. The proposed network input includes three parts: 1) four raw captured polarization images and 2) polarization feature images, including intensity, DoP, and encoded AoP by sine and cosine operations. Encoded AoP solves the problem regarding raw AoP maps being similar at two given polarization angles with π difference; 3) The viewing encoding to account for non-orthographic projection in scene-level SfP. The introduction of viewing encoding effectively calibrates the polarization parameters influenced by the spatially varying viewing directions.
In summary, existing data-driven polarization 3D shape reconstruction methods are always end-to-end structures guided by a physical model. The basic structure is input to the raw polarization images and other prior images, which are always ambiguous, to optimize the network, and the refined normal image is generated. In addition, other prior images are crucial in the corresponding methods for enhancing the reconstructed effects. Different inputs may also result in various network architectural designs. Therefore, as shown in Fig. 18, the inputs in the 3D shape reconstruction task are visually displayed.
Figure 18.The inputs of network in 3D shape reconstruction task. (a) The raw polarization images as well as the diffuse and specular normal prior. (b) The raw polarization images and two diffuse ambiguous normal. (c) The DoFP image consists of four polarization sub-images, viewing encoding, intensity, DoP image and encoded AoP image. (d) The chromatic intensity, phase and DoP image. (e) The raw polarization images, flash image, normalized color and Stokes map. (f) The maximum and minimum polarization images. (g) The raw polarization images, physics-based prior confidence, DoLP and AoLP.
The data-driven dataset for the ground truths was obtained in two ways. First, a Kinect depth camera is the most commonly used equipment for capturing a coarse depth map as the ground truth. Other operations may be conducted to refine the captured depth map, such as denoising, exclusion of inaccurate values, and sparse point cloud67. Second, a simulation method based on the BRDF is adopted to generate the synthetic datasets. Plausible results could be produced, and other features such as roughness and depth maps80 could also be generated, which may be helpful for other computer vision tasks. However, robustness will decrease in a complicated environment.
Reflection removal
Limitations of reflection removal based on polarization
Removal of reflection contamination is a challenging but critical and frequently encountered task because it may contaminate image quality. Several studies have been conducted based on diverse physical and image characteristics, but this remains an unsubstantial task14, 184, 185. Because the transmission image through the surface and the image reflected by the surface are simultaneously captured by a photographer, recovering two images from a single-mixture image is a highly ill-posed problem. Constraints are essential to solve this problem.
The reflected light is polarized, and polarization has proven to be a feasible solution to this problem. Based on the Fresnel functions, the number of unknowns was similar to that of the input images. The reflection and transmission components can be separated by viewing the Brewster angle. Moreover, the closer to the Brewster angle, the better the reflection removal performance. Despite the incident angle having to be known, obtaining it in the real world is challenging. Additionally, the robustness and generalization of existing polarization-based methods cannot meet the requirements of real-world high-quality imaging, considering the various viewing angles, complex refractive indices, smoothness, and local curvature of the surface.
Deep learning methods based on CNN are excellent for extracting hidden features. Furthermore, it enables the prediction of potential prior information from captured images and demonstrates good performance in the reflection removal task. Introducing polarization and imaging models into the network can improve removal performance and expand the diversity of datasets and architectures. However, acquiring the ground truth dataset is crucial. Thus, the generation of a synthetic dataset based on the proposed method and refined real dataset methods was proposed. Table 3 lists the elements that must be considered in synthetic and real-world dataset acquisition and their advantages and disadvantages.
Table Infomation Is Not Enable
Therefore, artificial manipulations must be added to generate synthetic images or collect real-world images. In this section, based on the acquired aspects of the training datasets, we review the existing reflection removal methods using synthetic and real-world datasets.
Data-driven polarization reflection removal based on synthetic datasets
Synthetic datasets are commonly used to train networks because they are hard-accessible real-world datasets. The traditional method directly sums the candidate image as a reflection and transmission using normalized weights. Real-world tests exhibit poor performance. Therefore, based on the polarimetric imaging model, synthetic datasets of high quality and robustness were generated.
By leveraging the properties of light polarization and residual representation, Wieschollek et al. presented the first deep learning approach to separate reflected and transmitted components73, as shown in Fig. 19(a1). The proposed network architecture uses three polarization orientation images to calculate the parallel and perpendicular components I∥, I⊥ as the input into the network. The output of network comprises the residual images , and the two single-channel weights ξ∥, ξ⊥. The final estimates of the reflection and transmission can be computed as follows:
Figure 19.The typical data-driven polarimetric methods of reflection removal. (a1) P Wieschollek et al. method. (a2) The image-based data generation procedure of (a1). (b) Y Lyu et al. method.
where Ir and It denote the estimation of reflected and transmitted light, respectively, represented as predicted transmission and reflection in Fig. 19(a1).
An accurate synthetic data generation pipeline is introduced, including the simulation of realistic reflections, such as high-dynamic-range scenes, nonstatic scenes, and curved and nonideal surfaces, to enhance the robustness of the proposed method, as shown in Fig. 19(a2). First, because the world consists of high-dynamic-range elements, the light intensity naturally diminishes as it travels. Consequently, artificial adjustments match this phenomenon in real-world environments. The proposed method separately manipulates the dynamic range of the transmitted and reflected input images using a random factor. Second, for nonstatic scenes, such as cases where a swaying tree branch occurs during capture, local and nonrigid deformations are adopted by perturbing each grid over a patch. Third, for curved and non-ideal surfaces, a parabola was utilized to simulate unconstrained surface curvatures with four variable parameters.
Lyu et al. exploited the physical constraints from a pair of unpolarized and polarized images to separate reflection and transmission74, as shown in Fig. 19(b). The coefficients of the glass plane are predicted by the semireflector orientation module to compute the reflection and transmission based on the proposed physical image formation, denoted as:
where ϕpol denotes the polarization angle and ϕ denotes the azimuth angle. The unpolarized and polarized images were then calculated using:
where Iunpol, Ipol, ξ and ζ denotes the unpolarized, polarized images and weights for reflection and transmission, respectively. Next, the reflection and transmission images can be computed as:
Finally, to close the gap between the physical model and real data, a refined module was adopted to improve the initial estimation. Additionally, the proposed capturing setup can potentially be integrated into smartphones without affecting the original photography quality or achieving reliable results. The dataset was generated based on the PLACE2186 dataset, in which two random images were selected as the original reflection and transmission images. Reflection is blurred by Gaussian smoothing based on the assumption that people take photos of the transmitted light.
Pang et al. proposed a novel progressive polarization-based reflection removal network (P2R2Net) to generate a preliminary estimation of coarse transmission images before guiding the final reflection removal187. The input images for the network consisted of two parts: a reflection-obstructed image and high-column features from a pre-trained VGG19, which is a successful example of using pre-trained features as prior information. Reflection-obstructed images are synthetic, based on physical function Eq. (33). The high dynamic range in the real-world, considered as light intensity, is nonlinearly compressed in the captured image through the power function of gamma encoding73. Two independent parameters were used to simulate diverse practical imaging environments. Additionally, flat and parabolic surface models were adopted to simulate curved surfaces, which can be calculated using the camera position (xc, yc) and incidence point (xp, yp):
where the θ is the angle of incidence. Additionally, random deformation, rotation, and wrapping expand the scale of the dataset and improve the robustness when synthesizing the reflected obstructed image.
Data-driven polarization reflection removal under real-world datasets
Despite synthetic datasets not being easy to obtain, they are often too ideal, and complex conditions cannot be fully considered in real-world environments. Real-world datasets are also crucial but challenging to obtain with the influence of glass and misalignment issues.
Real-world datasets are typically collected using removable glass. Reflection-obstructed images were captured using a camera with glass in front of the detector. The ground truth of the transmission was captured after the glass was removed. However, the difference between transmission and refractive transmission cannot be ignored. Intensity delay caused by attenuation and color distortion caused by colored glass are also common. To eliminate the misalignment between these two elements, a loss of image similarity at the perceptual level, such as perceptual loss and contextual loss, was designed. However, the intensity delay and color distortion persist.
The collected reflection-free images are not perfectly aligned with the input mixed images owing to glass refraction. To avoid misalignment issues, Lei et al. used a piece of black cloth to cover the back of the glass to block all transmissions for clear reflection53. This dataset includes approximately 100 types of glass in the real world, which guides the proposed method to handle different types of reflections without introducing artifacts. The reflection removal network uses multi-polarization direction images as input. Furthermore, the calculated intensity, degree, and angle of polarization, and the overexposure mask eliminating the overexposed areas, were combined into the network. Two stages were adopted to estimate reflection and transmission. This design significantly improves the performance of the proposed method by a large margin.
In summary, reflection removal is crucial because obtaining analytical solutions to ill-posed problems is challenging. The introduction of polarization information can guide reflection removal using Fresnel functions despite the unknown incident angle. Thus, combining deep learning to learn prior parameters is a feasible method. Acquiring datasets determines the effectiveness and robustness of the parameter estimation of the proposed methods. In this study, synthetic and real-world methods are proposed. In the future, more comprehensive environments and complete theories must be developed to solve reflection-removal tasks effectively.
Target detection
For target classification or detection, polarimetric data-driven methods can improve efficiency and do not require manual extraction of image features compared to traditional methods. However, existing methods use only intensity information images, resulting in a reduction in the accuracy rate for low-light environments or camouflaged targets188-193. The targets and backgrounds also differ in their polarization characteristics. Polarimetric imaging can effectively reveal these differences and assist in target detection194, 195. Therefore, we can expect positive results by introducing polarization into data-driven target detection.
Fan et al. first proposed the use of polarization complementary to intensity-based information to improve car detection accuracy54. A feature-selection process was performed to select the most informative polarization feature. Final detection is based on a fusion rule that takes the polarization-based model to confirm the color-based one. Gao et al. presented a similar work51. Blin et al. proved that polarimetric imaging is useful for target detection in road scenes52. Sun et al. adopted three-dimensional convolutions to consider the relationship among S0, S1, and S2 images to improve the detection rate with limited polarimetric images59. Xie et al. used the Stokes vector to obtain four different configurations of polarization parameter image datasets: I, DoP, [I, DoP, AoP], and [S0, S1, S2] and trained different polarization image detection models, indicating that increased polarization information fusion enabled more learned target features and better target detection55. Tian et al. proposed a human face anti-spoofing method for real-life scenarios, which extracts and classifies the unique polarized features of faces using a CNN and an SVM together196. Experiments covering diverse face-spoofing attacks (print, replay, and mask) under uncontrolled indoor and outdoor conditions were conducted. Usmani et al. proposed unified polarimetric target detection and classification in degraded environments using 3D polarimetric integral imaging data197. 3D polarimetric images with deep neural networks can effectively detect and classify polarimetric targets under different low-light conditions and in the presence of occlusions. Shen et al. combined the advantages of polarimetric imaging and deep learning for rapid target detection of artificial targets camouflaged in natural scenes198, as shown in Fig. 20. The color difference of each image is calculated to prove the proposed method can highlight the camouflaged artificial targets to a greater extent.
Figure 20.Shen et al. method. (a) The flow chart of proposed method. (b) The detected results. Figure reproduced with permission from ref.198, Institute of Electrical and Electronics Engineers.
Biomedical imaging and pathological diagnosis methods based on Mueller matrix features, a typical polarization feature, are emerging label-free and noninvasive techniques suitable for characterizing the microstructures of biological tissues with anisotropic properties. Recently, results have been published based on Mueller matrix imaging for digital pathology25, 28, 29, 199-207. However, achieving accurate pathological diagnosis by observing and evaluating stained pathological sections for interns is challenging. Furthermore, pathological diagnosis is a classification problem; therefore, learning-based methods are crucial in achieving fast and accurate digital pathology. This section reviews the existing data-driven biomedical imaging and pathological diagnosis methods and applications. Next, we discuss the interpretation of physical properties of the network layers based on the distance-based learning classifier.
Existing biomedical imaging methods
Li et al. first presented a Mueller matrix imaging system to classify morphologically similar algae using a CNN60. Because of the low contrast in the polarimetric signals of algae based on previous measurements of the algal Mueller matrix, performing classification without high-precision instruments is challenging. The proposed methods compare the performances of various stacks of network layers to identify the number of convolution layers. The classifier network was trained to extract features from the Mueller matrix and achieved a classification accuracy of 97%. Subsequently, they introduced a distance metric learning method called the Siamese network, which aimed to learn good distance metrics of algal Mueller matrix images in low-dimensional feature spaces61. Compared to the convolutional CNN method, in the Siamese approach, data pairs are generated stochastically as inputs to train the network to determine if they belong to the same category. The experiments demonstrated that the coupling of Mueller matrix imaging and CNN of the Siamese approach may be an efficient solution for the automatic classification of morphologically similar algae.
Zhao et al. proposed a giant cell tumor bone detection method using Mueller matrix polarization microscopic imaging and a multi-parameter fusion network (MPFN) that combines three extracted polarimetric features: deep micro-Pol features, MMPD features, and MMT features62, as shown in Fig. 21. Wang et al. and Zhou et al. used polarized speckle images for in vivo skin cancer detection85 and polarized hyperspectral images for head and neck squamous cell carcinoma detection86. Yao et al. characterized the microstructures of endometrial samples at the typical proliferative and secretory phases using Müller matrix polar decomposition and a set of rotation-invariant parameters and their corresponding angular parameters87. In this study, polarimetric imaging was combined with a digital pathology technique to quantitatively study the microstructural features of endometrial samples. Furthermore, the incorporation of local image texture information through Local Binary Pattern (LBP) analysis improves the characterization ability of the polarization parameter images. The experiments demonstrated the feasibility of combining polarimetric imaging with digital pathological techniques in typical proliferative and secretory phases.
Figure 21.The architecture and results of multi-parameters fusion network (MPFN) and the corresponding results.
However, the physical properties of network layers remain unclear. In data-driven polarimetric imaging, the Muller matrix provides the most comprehensive information representing the polarization information, and most decomposition methods that provide raw fundamental parameters have been proposed. Thus, the Mueller matrix is crucial in exploring the interpretation of network layers.
In ref.61, the authors calculated the Pearson correlation coefficients between the elements of the algal Mueller matrix and features extracted by the CNN from f0 to f15. The experiments demonstrate that features f2, f3, f6, and f7 are positively correlated with depolarization-related elements; however, f1, f4, f10, f12, and f13 are negatively correlated. In addition, the fast-axis-orientation-dependent periodic variations were preserved in f0, f5, f9, and f15. Dong et al. proposed a data-driven polarimetric imaging framework and constructed a dual-modality machine-learning framework for the quantitative diagnosis of cervical precancerous lesions63, as shown in Fig. 22. The U-net architecture was adopted to segment the epithelium in digitized cervical hematoxylin-eosin-stained images and mask the corresponding cervical sample’s polarimetry basis parameters (PBPs), which were decomposed based on the MMPD, MMT, and other Mueller matrix rotation-invariant parameters. Furthermore, these masked parameters are processed by the designed statistical distance-based learning classifier for deriving a polarimetry feature parameter (PFP). The classifier of the negative class and those of the CIN1(mild dysplasia) samples can be expressed as:
Figure 22.Polarization-imaging-based ML framework for quantitative diagnosis of cervical precancerous lesions. Figure reproduced with permission from ref.174, Institute of Electrical and Electronics Engineers.
where xi is an M×1 vector representing PBPs elements. M and N are the number of PBPs and target pixels, respectively. X is an N×1 vector, calculated as a linear projection of the input PBPs. PNormal(X) is the probability distribution of X from Normal cervical pathological tissues, whereas PCIN1(X) represents CIN1 tissues. ω represents weight coefficients of PBPs. L(ω) is the energy distance between PNormal(X) and PCIN1(X) by energy distance function d. PFP can be represented a simplified linear combination of the PBPs, which is similar to the distribution of specific microstructural variations. The different weights indicate the significance of the elements of the PFP feature.
The results demonstrate the physical interpretability of the polarimetry feature parameters. For example, complex cervical precancerous samples exhibit polarization characteristics of various types of anisotropic superpositions. The depolarization ability of precancerous cervical samples changed with the development of lesions. In addition, changes in retardation and depolarization occur during the propagation and scattering of pathological cervical samples at different stages. Therefore, the proposed method has high sensitivity and precision for the screening of cervical lesion pathological tissues, and may bring physical interpretability to the CNN.
Semantic segmentation
Segmentation is a popular topic for understanding the scene in remote sensing and automatic navigation fields. According to learning from different types of massive data, the data-driven segmentation method achieved good performance. However, intensity-based methods always suffer from degradation in scenes with similar colors, clutter, or reflective areas208-212. This choice is oriented towards polarimetric imaging, which provides the ability to distinguish and recover from changes in complex scenes. Several approaches have been proposed to achieve segmentation of remote sensing, road scenes, and transparent objects via polarimetric imaging and deep learning.
Shaunak et al. transformed the information in an augmented dataset into a compact representation of polarimetric synthetic aperture radar data to classify and segment urban areas82. The segmentation of road scenes is a typical application in which water hazards, transparent glass, and metallic surfaces are key challenges. Yang et al. proposed the prediction of polarization information from monocular RGB images as a complement to RGB-based pixel-wise semantic segmentation for applications in real-world wearable assistive navigation systems69, as shown in Fig. 23. Similarly, Zhang et al.56 and Blanchon et al.83 used different architectures to achieve the same goals: robust and accurate scene parsing of outdoor environments paves the way for autonomous navigation and relationship inference. Focusing on transparent object segmentation, the polarization textures of transparent objects provide extra but very different information than the background. Therefore, a polarized CNN framework can be trained based on the intensity and polarization information57, which will be helpful for applications in broad areas such as robotics, autonomous driving, and face authentication, as shown in Fig. 24. The mean average precision(mAP) is used to measure accuracy.
Figure 23.Yang et al. method: proposed architecture and produced results of depth and segmented results.
Figure 24.Transparent object segmentation. (a) The designed architecture. (b) The segmented results of intensity Mask R-CNN and polarized Mask R-CNN in several dataset.
Input information is a crucial element in network training with polarimetric imaging and deep learning. However, various inputs and polarization parameters exist for different tasks. Three perspectives were considered as network inputs: original polarization images (OPI), polarimetric parameter feature maps (PPFM), and associated parameter maps (APM), as shown in Fig. 25.
Figure 25.Input and utilization of polarization information. (a) Original polarization images (OPI). (b) Polarimetric parameter feature maps (PPFM). (c) Associated parameters maps (APM).
Original polarization images are among the most widely used inputs. OPI refers to images captured directly using a DoFP, camera, Mueller Matrix Polarization Microscope, or other equipment. Because of the rapid response and comprehensive polarization information captured in one shot, raw super-pixel images captured by DoFP are common input42-45. Other common inputs are polarization-oriented images, usually set to 0°, 45°, and 90°46-49, 52. Furthermore, there are variants with the addition of 135°50, 51 or circular polarization information46. Parallel and perpendicular polarization components were used to train the dehazing network72. Polarization speckle images are another type of OPI66, 85 captured by the detector after scattering the media.
The polarimetric parameter feature maps were calculated using the OPI. Based on the Stokes representation, [S0, S1, S2] is a common set widely used in polarization network training52, 55, 58, 59. Furthermore, the DoP and AoP computed by the OPI or Stokes vector images are a type of new material for network training52-57. In biomedical diagnosis, Mueller matrix images are the most common input to the network91, 95, and Mueller matrix parameters decomposed by Mueller matrix images can also be fed into network training62, 208. Table 4 lists the existing decomposition elements of Mueller matrix methods.
Table Infomation Is Not Enable
Associated parameter maps are images in which OPI, and PPFM are combined with other information or preprocessing for different tasks. Intensity information is the most common complement based on the OPI and PPFM53, 55-57, 66, 70, 74, 196. Similarly, the spectrum86 and phase71, 75 are general additions to the polarization information. In 3D shape reconstruction tasks, there are different complements such as zenith and azimuth angle maps derived from specular and diffuse reflection64, 65, viewing encoding, encoded AoP67, normalized color80, and physics-based prior confidence81 based on different conditions. The raw images were interpolated using the bicubic interpolation method in the demosaicing task68. An overexposure mask is used in the network input to avoid overexposed areas during reflection removal53. The scene segmentation network utilizes HSL color space representation by incorporating a polarizing pseudo-color image83.
Datasets
Owing to the demand for different tasks, the number of data-driven polarimetric imaging datasets has gradually increased, as listed in Table 5. Seven types of datasets were associated with the corresponding tasks described in previous section. Three strategies for building the datasets were considered. First, ground truths corresponding to the inputs exist; therefore, the real ground truth is captured directly to rectify the outputs. Second, the transfer function of the imaging system was considered, and the generative process was simulated to generate the ground truth. In addition, comparing different traditional methods and selecting the best results regarding the ground truth is another method that combines the advantages of existing methods.
Table Infomation Is Not Enable
The overall datasets have a large gap between each other in terms of number and size, even in the same task, as shown in Table 5, which would not avoid the difference in the extraction of features by the CNN. Furthermore, researchers use self-collected training and test datasets; however, evaluating and comparing different methods is challenging. Therefore, authoritative datasets must be built for this task.
Loss function
Loss functions are critical elements, and their selection is crucial in guiding network training. Each loss function has advantages and disadvantages. Therefore, a specific loss function was adopted based on the given task and imaging environment. Table 6 lists the latest quality-loss functions for data-driven polarimetric imaging. The following is a detailed description of several functions that differ from the intensity loss functions.
Table Infomation Is Not Enable
i) Mean squared error (MSE) is the most widely used indicator in deep learning, which standards the difference between the output images and the ground truth. In data-driven polarimetric imaging, in addition to the difference in intensity, the polarization parameters also ensure the accuracy of the polarimetric representations. The loss can be expressed as:
where y is the ground truth, and is the output of the network. Therefore, the measured information consists of polarimetric representations, such as DoP(DoLP), AoP(AoLP), Stokes vectors, Mueller matrix, spectrum, HSV color space, and their perceptual representations computed by a network model VGG.
ii) The mean absolute error (MAE) is a widely used indicator. Compared to the MSE, the MAE has less blurriness and noise; however, it is more unstable. The utilization of the MAE loss function was similar to that of MSE because the range of AoLP is 0 to π that always maps into 0–177. However, 0 and 1 indicate the same physical meaning where the error is the largest. Therefore, the HSV spatial display rule was introduced to design a closer distance on the circle of the AoLP. The loss function is defined as follows:
iii) Cosine similarity (CS) is commonly used in the surface-normal estimation of 3D reconstructions based on polarimetric imaging. In the 3D polarimetric imaging task, the surface normal map is calculated using Fresnel's formula to generate the normal vectors, which is different from the other information forms, ensuring that the CS loss becomes the most suitable indicator. The loss function is defined as follows:
iv) SSIM is a widely used indicator of end-to-end networks. SSIM focuses on the brightness, contrast, and structural similarity between two images. The SSIM improves as the value increases within the range of [0, 1], opposite to the goal of minimizing the similarity. The loss function is defined as follows:
where and denote the mean and standard deviations of the image, respectively. is the cross-covariance computed from the images of y and . where c1 and c2 are constants.
Future of data-driven polarimetric imaging
The field of polarimetric imaging has been influenced by deep learning, which has recently become one of the most disruptive technologies. First, we analyze the trends in data-driven polarimetric imaging, focusing on the application of data-driven polarimetric imaging and review existing research achievements. Furthermore, the acquisition of high-accuracy polarization information is the foundation for subsequent imaging and semantic processing. Descattering, 3D shape reconstruction, reflection removal, biomedical imaging, pathological diagnosis, target detection, and semantic segmentation are crucial in the application of data-driven polarimetric imaging. A comprehensive discussion is essential owing to the input and utilization of polarization information, datasets, and loss functions.
This section evaluates the future of data-driven polarimetric imaging based on its strengths, weaknesses, and opportunities. Furthermore, this approach is suitable for developing instructive strategies for further studies that combine deep learning and polarization information.
Strengths
Data-driven polarimetric imaging represents a new paradigm that combines deep learning and traditional physical properties, including altering the patterns of physical properties to achieve better results and balancing the physical model and information extraction between the traditional physical and high-order nonlinear representation domains. Applying the information represented by the physical model and the deep network layers enables researchers to exploit the potential features embedded in all information-transmitted paths.
During the imaging process, the light source, transmitted media, imaging system, and image processing method influence visual performance. Moreover, modeling these complex processes using physical functions or traditional methods is challenging. Network layers are high-order polynomials expressed by the convolution neural network. Therefore, nonlinear representations of the network layers may be promising for simulating the process.
The introduction of deep learning into conventional polarimetric imaging and pattern recognition generates more accurate coefficients for the physical model, simulating a complicated processing approach not modeled by physical functions. However, introducing a polarimetric imaging model into the deep network would add physical constraints and polarization information to guide the network training to achieve better performance compared with the intensity network. Therefore, data-driven polarimetric imaging probably enables capabilities that cannot be realized using traditional methods.
Weaknesses
To achieve a better performance by training a polarimetric imaging network, researchers must weigh the costs associated with data-driven polarimetric imaging and conventional approaches, which include the establishment of polarimetric imaging datasets, storage of data always four times that of traditional datasets, and imaging systems. Because balancing these costs and benefits is not exact, some uncertainties are considered in this process.
A fundamental element of data-driven polarimetric imaging is the availability of comprehensive datasets. Most data-driven polarimetric imaging methods have focused on supervised deep learning. However, based on polarimetric imaging methods and imaging environments, the corresponding ground truth is more challenging to capture than in an intensity-based network. Additionally, different polarimetric imaging techniques introduce various errors and influence the visual performance of the network. For example, the division of time always suffers from mismatching in dynamic scenes, and the division of the focal plane has a mosaicking problem in principle. Existing methods have established their datasets to respond to specific tasks. However, ensuring similar performance from other datasets is challenging; moreover, the existing dataset is insufficient to cover all conditions, and its generalization ability is insufficient. Real data changes over time, indicating increasing volume and improper handling of methods.
The loss function, which is crucial in guiding the training of the network, is always similar to the intensity-based methods used in data-driven polarimetric imaging. General operators replace intensity images with polarization parameters. However, the intensity and polarization information have entirely different optical properties. The intensity image describes the reflectivity and transmissivity of the object; however, the polarization image describes the texture details, material properties, shape, shading, and roughness. These differences determine the disparate designs of loss functions. However, there are insignificant loss functions that focus on polarization parameters.
Black boxes and their acceptance by applied researchers are inherent drawbacks of deep learning methods by health professionals. Most researchers in practical applied fields are wary because deep learning theories have not yet provided a complete and reasonable answer. In addition, further development and optimization would depend only on the performance of tasks without any guidance from theories, which results in indeterminacy in the study. Moreover, the legal implications of black box functionality could be another challenge. For example, who would be responsible if the results were incorrect in pathological diagnosis or target detection? In data-driven polarimetric imaging, the introduction of polarization information may be helpful for the interpretability of deep learning. Dong et al. attempted to use a linear projection of input PBPs to interpret their significance by learning the factors of each parameter. However, many studies have been conducted to achieve these goals63.
The combination and utilization of polarization information in deep learning are still in their infancy. In most existing methods, polarization parameter images are the only approach that uses polarization information as the input into the network. The extraction of polarization features relies on the automatic processing of network layers, which remains a challenge to utilize polarization information despite the unlimited opportunity for improvement yet to be explored.
Opportunities
Based on the weaknesses of data-driven polarimetric imaging, solutions have been proposed to address these gaps. Furthermore, many novel training methods and physical models exist, such as unsupervised or semi-supervised training, transfer learning, and computational imaging. Therefore, it must be combined with other imaging or training theories to guide the optimization process. In addition, three broad application areas: descattering imaging, even high-scattering media, detection of camouflage, spoofing targets, and enhancement and fusion of information assess the potential of data-driven polarimetric imaging in future applications.
The opportunities of methods
The assistance of physical model: A synthetic dataset solves the scarcity of datasets. An accurate physical model that simulates information transmission is crucial in dataset generation. With an additional physical constraint on the CNN, fewer training data are required to achieve a more generalized result than conventional methods. In addition, to obtain a synthetic dataset, various parameters covering different imaging conditions are crucial. Thus, the development of conventional polarimetric imaging methods is suitable for designing network architectures.
Unsupervised or semi-supervised learning216-218: Obtaining the ground truth for a large polarization dataset is challenging. Therefore, unsupervised or semi-supervised learning is required to reduce the dependence on the ground truth. However, image enhancement or image processing is an end-to-end task; thus, existing learning methods without ground truth achieve poor performance. A more comprehensive physical model must be established, and more effective loss functions designed to guide pipelines. In addition, the middle parameter may be generated without the ground truth, which is also a feasible way to improve the performance.
Transfer learning219, 220: Transfer learning allows the optimized parameters for one dataset to train a new network as initialization values for another dataset, which is a feasible approach for reducing the dependence on datasets in data-driven polarimetric imaging because the learned features can be promptly transferred from a trained network to a new network for another task. The fine-tuning technique is a typical method used in transfer learning, which is faster and easier than training a network from scratch. Therefore, the extracted features were similar to the shallow layers; moreover, the shallow layers in a trained network can be copied to the new network for another task to reduce the cost of training time.
Multi-dimensional learning221, 222: The introduction of polarization information that displays different physical properties into a traditional intensity network can provide more constraints and information sources to promote network inference. Similarly, the phase, spectrum, and other physical properties can be embedded into a network to enhance performance. Phase is a representative generation of the change in light, and the spectrum describes the characteristics of the wavelength of light. These properties would fill the gap in which a single domain cannot fully represent the light domain for all physical properties.
Federated learning223: To provide a practical training set for deep learning in polarimetric imaging applications, obtaining various available datasets from different institutes or corporations may be a possible solution. There are several datasets for different tasks, as shown in Table 5; however, the datasets collected by different groups are not uniform because it is difficult to guarantee a similar performance from other datasets. Therefore, different datasets were beneficial, increasing the diversity of the collected samples. In addition, different imaging systems, detectors, environments, and observation directions are challenging to simulate using existing physical functions, which can improve the generalization ability of the network and avoid overfitting. Therefore, this is instructive for building and optimizing the evaluation criterion, focusing on polarization images.
Emergence of metasurface and metalens224-227: The utilization of lenses and metasurfaces allows for tailored control over light with specific polarization states, achieved through deliberate design. This deliberate control enables superior capture, separation, and analysis of polarized light signals, thereby significantly enhancing the sensitivity and accuracy of acquiring polarization datasets. These advancements not only amplify the potential of data-driven polarimetric imaging but also complement the capabilities of deep learning methodologies, promising refined insights and higher precision in polarization imaging applications.
The opportunities of applications
In the future, the optimization of methods will aim for better visual performance in more widely applied fields. For polarimetric imaging, the methods that depend on polarization properties include descattering imaging, high-scattering media, detection of camouflage, spoofing targets, and enhancement and fusion of information. Deep learning, nonlinear representation ability, and potential feature extraction improve the accuracy of estimating parameters and feasible transmitted functions compared to conventional methods.
For descattering imaging in high-scattering media, such as clouds, water, haze, smoke, smog, fog, mist in the air, soil particles, algae, and mineral salt, in underwater scenes, there are potential opportunities for data-driven polarimetric imaging. However, further development of physical models has amplified this ability. However, traditional model functions are challenging to handle in complex imaging environments and always use simple assumptions to simulate real parameters. The introduction of deep learning can model complicated conditions in nonlinear cases using convolutional neural layers. Future opportunities will arise from the development of more accurate parameters for forming improved imaging functions generated by deep learning.
Spoofing targets are another opportunity for camouflage detection. Target detection is widely applied in polarimetric imaging because the polarization information can describe the material of an object, which is suitable for camouflaging and spoofing targets of the same color that the intensity information cannot distinguish. Next, more comprehensive extraction of special features by the neural network may further improve the success rate of target detection.
The material surface, texture, and contrast are the main characteristics described by polarization information for enhancing and fusing information. Polarization parameters are observable in low- or hard-light environments because they are unaffected by intensity. Consequently, the fusion of polarization and other images can extend the feature domain of an object. The network generates a fusion based on data-driven polarization fusion, enhancing performance by extracting more features and providing more information on imaging objects or scenes compared with artificial coefficients. Moreover, complementary features from various domains are advantageous for other computer vision tasks, such as object detection.
Conclusion
This review provides an overview of recent efforts to summarize data-driven polarimetric imaging based on seven classifications and discusses them comprehensively from three perspectives. Based on the application fields, the classifications consist of polarimetric descattering, 3D shape reconstruction, reflection removal, restoration, enhancement of polarization information, target detection, biomedical imaging and pathological diagnosis, and semantic segmentation. Subsequently, we synthetically analyze the input, datasets, and loss function, which are crucial in data-driven polarimetric imaging, listing the existing datasets and loss functions with an evaluation of their advantages and disadvantages. In conclusion, deep-learning-based polarimetric imaging introduces polarization information into the convolutional neural network to achieve better performance than traditional intensity imaging, bringing physical interpretability to CNN through physical models. Through research on existing data-driven polarimetric imaging, the study of the corresponding fields can be improved to a higher level, enabling them to enhance high-level visual tasks.
Acknowledgements
We are grateful for the financial support from the National Natural Science Foundation of China (Nos. 62205259, 62075175, 61975254, 62375212, 62005203 and 62105254), the Open Research Fund of CAS Key Laboratory of Space Precision Measurement Technology (No. B022420004), and the Fundamental Research Funds for the Central Universities (No. ZYTS23125).
K Yang, F Liu and SY Liang contributed equally to this work and drafted the manuscript. M Xiang contributed to the part on restoration and enhancement of accurate polarization information. PL Han contributed on the part on polarimetric descattering and three-dimensional shape reconstruction. JP Liu contributed to the part on reflection removal. X Dong contributed to the part on target detection and semantic segmentation. Y Wei contributed to the part on biomedical imaging and pathological diagnosis. BJ Wang, K Shimizu, XP Shao provided resource support and supervised the project. All authors read, corrected and approved the manuscript.
The authors declare no competing financial interests.
References
[1] V Ronchi, V Barocas. The Nature of Light: An Historical Survey(1970).
[10] X Li, F Liu, PL Han et al. Near-infrared monocular 3D computational polarization imaging of surfaces exhibiting nonuniform reflectance. Opt Express, 29, 15616-15630(2021).
[20] BM Ratliff, DA Lemaster, RT Mack et al. Detection and tracking of RC model aircraft in LWIR microgrid polarimeter data. Proc SPIE, 8160, 816002(2011).
[26] DL Le, TN Huynh, DT Nguyen et al. Characterization of healthy and nonmelanoma-induced mouse utilizing the Stokes-Mueller decomposition. J Biomed Opt, 23, 125003(2018).
[55] RC Xie, HY Zu, Y Xue et al. Target detection method for polarization imaging based on convolutional neural network. Proc SPIE, 11455, 114557Z(2020).
[66] DK Li, B Lin, XY Wang et al. High-performance polarization remote sensing with the modified U-Net based deep-learning network. IEEE Trans Geosci Remote Sens, 60, 5621110(2022).
[91] HJ Ju, LY Ren, J Liang et al. A Mueller matrix measurement technique based on a division-of-aperture polarimetric camera. Proc SPIE, 10839, 108391F(2019).
[94] T York, V Gruev. Calibration method for division of focal plane polarimeters in the optical and near-infrared regime. Proc SPIE, 8012, 80120H(2011).
[102] HH He, N Zeng, E Du et al. A possible quantitative Mueller matrix transformation technique for anisotropic scattering media/Eine mögliche quantitative Müller-Matrix-Transformations-Technik für anisotrope streuende Medien. Photonics Lasers Med, 2, 129-137(2013).
[130] Y Xu, J Wen, LK Fei, Z Zhang. Review of video and image defogging algorithms and related studies on image restoration and enhancement. IEEE Access, 4, 165-188(2015).
[132] K Nguyen, P Nguyen, DC Bui et al. Analysis of the influence of de-hazing methods on vehicle detection in aerial images. Int J Adv Comput Sci Appl, 13, 846-856(2022).
[135] YF Song, D Nakath, MK She et al. Optical imaging and image restoration techniques for deep ocean mapping: a comprehensive survey. PFG J Photogramm Remote Sens Geoinf Sci, 90, 243-267(2022).
[142] HF Hu, L Zhao, XB Li et al. Underwater image recovery under the nonuniform optical field based on polarimetric imaging. IEEE Photonics J, 10, 6900309(2018).
[152] PLJ Drews, ER Nascimento, SSC Botelho et al. Underwater depth estimation and image restoration based on single images. IEEE Comput Graph Appl, 36, 24-35(2016).
[198] Y Shen, WF Lin, ZF Wang et al. Rapid detection of camouflaged artificial target based on polarization imaging and deep learning. IEEE Photonics J, 13, 7800309(2021).