1Precision Optical Manufacturing and Testing Center, Shanghai Institute of Optics and Fine Mechanics, Chinese Academy of Sciences, Shanghai 201800, China
2Key Laboratory for High Power Laser Material of Chinese Academy of Sciences, Shanghai Institute of Optics and Fine Mechanics, Shanghai 201800, China
3Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
4China-Russia Belt and Road Joint Laboratory on Laser Science, Shanghai 201800, China
【AIGC One Sentence Reading】:We propose a novel HI system with E-DoF, achieving 5m DoF, 90% spectral accuracy, and reduced aberration using a freeform lens and deep learning.
【AIGC Short Abstract】:A novel HI system with an extended DoF up to 5m is proposed, leveraging a differentiable framework that combines wave propagation modeling and a deep learning-based achromatic reconstructor. It achieves 90% spectral accuracy, doubles performance, and integrates sparse priors to reduce blur and aberration, showcasing potential for broader optical applications.
Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.
Abstract
Traditional hyperspectral imaging (HI) systems are constrained by a limited depth of field (DoF), necessitating refocusing for any out-of-focus objects. This requirement not only slows down the imaging speed but also complicates the system architecture. It is challenging to trade off among speed, resolution, and DoF within an ultra-simple system. While some studies have reported advancements in extending DoF, the improvements remain insufficient. To address this challenge, we propose a novel, to our knowledge, differentiable framework that integrates an extended DoF (E-DoF) wave propagation model and an achromatic hyperspectral reconstructor powered by deep learning. Through rigorous experimental validation, we have demonstrated that the compact HI system is capable of snapshot capturing of high-fidelity images with an exceptional DoF reaching approximately 5 m, marking a significant improvement of over three orders of magnitude. Additionally, the system achieves over 90% spectral accuracy without aberration, nearly doubling the accuracy performance of existing methods. An asymmetric freeform surface design is introduced for diffractive optical elements, enabling dual functionality with design freedom and E-DoF. The sparse prior conditions for spatial texture and spectral features of hyperspectral cubic data are integrated into the reconstruction network, effectively mitigating texture blurring and chromatic aberration. It foresees that the optimal strategy for achromatic E-DoF can be adopted into other optical systems such as polarization imaging and depth measurement.
1. INTRODUCTION
Hyperspectral imaging (HI) enables us to obtain both spatial and spectral information of a target object, offering numerous application prospects in agriculture [1–3], remote sensing [4–6], biomedical imaging [7–9], environmental monitoring [10–12], military reconnaissance [13–15], and other fields [16–19]. Traditional HI systems are based on physical spectroscopic elements, such as prisms, gratings, and filters, which need to be scanned along the spectrum or space to obtain hyperspectral images, leading to large volume, high complexity, and slow speed [20,21]. Building upon the compressive sensing theory [22,23], a number of scan-less HI systems have been successively proposed [24–29] that can reconstruct a complete spectral image snapshot without the need for scanning, but with a coded aperture.
Nevertheless, a common yet critical issue that significantly affects the speed and high-fidelity performance of HI is the depth of field (DoF) of the lens [30,31]. The constraint of a limited DoF requires refocusing for any out-of-focus objects. Besides, it causes blurring in the image, resulting in loss of detailed information and reduced data quality. In compressive HI, this issue is particularly critical because it directly affects the spectral subdivision accuracy, which, in turn, affects the imaging resolution. There is a direct relationship between the theoretical DoF and the numerical aperture (NA) of the lens: where is the input wavelength, is the focal length, and is the radius of the lens. In the context of traditional lens design, achieving a larger DoF requires minimizing the NA as much as possible thereby bringing lower resolution. Therefore, the traditional lens struggle is to strike a balance between the DoF and spatial resolution. To address this challenge, extensive studies have been conducted to extend the DoF. Some studies [32–41] proposed the use of wavefront coding and relative approaches to shape the light field of an optical system and extend its DoF. However, most of approaches require additional optics in the imaging system, thereby increasing its complexity. Owing to recent advancements in diffractive optical elements (DOEs), a type of subwavelength diffractive optical element [42–52], there has been a growing interest in incorporating coding capabilities into a single DOE as a new form of scene codification that allows portable and smaller setups. Thanks to this, some compact extended DoF (E-DoF) system design methods based on computational enhancement have been proposed [53–60]. For the above methods, the improvement in DoF over the diffraction limit is given by
In the vast majority of cases, the is relatively small, less than one order of magnitude. While Banerji et al. [59] achieved three orders of magnitude in , their system is constrained to monochromatic conditions, because there is a challenge to strike the balance among high imaging resolution, high spectral resolution, and large DoF for HI.
Sign up for Photonics Research TOC. Get the latest issue of Photonics Research delivered right to you!Sign up now
Although a few E-DoF works based on DOE in the spectral domain [61–66] have been reported, the enhancement of DoF in these systems is limited () and the image quality is inferior. The main reasons include both hardware and software. In terms of the hardware, traditional DOE in single-lens HI is designed with mainly a symmetric microstructure, to simplify design difficulty and optimization complexity. However, the symmetric structure has a limitation in design freedom and light-beam modulation to affect negatively the reconstruction quality. In terms of the reconstruction algorithm, the U-net networks have shown satisfactory performance to be widely used in HI. The loss function is the core of the network to affect decisively the reconstruction quality. The existing loss functions focus more on the similarity between the reconstruction result and the true value. Some studies have considered the sparse priori condition for spatial texture features [using a mathematical expression of total variation (TV)], which performed favorably in terms of the imaging resolution. Nevertheless, they neglected the sparse priori condition for spectral features, possibly resulting in a poor performance in spectral resolution for some complex optimal targets.
In this paper, we propose a compact achromatic extreme-DoF (AED) HI system based on freeform-DOE, as shown in Fig. 1. Utilizing an end-to-end joint design strategy significantly improves the compatibility between the freeform-DOE and the subsequent recovery algorithm, which results in high-quality recovered hyperspectral images. We introduce an asymmetric freeform design model of DOE to behave as the dual high performance with design freeform and light-field modulation. The point spread function (PSF) modulated by our freeform-DOE can present a large difference in different spectral bands and high similarity at different objective distances over a large-DoF range, providing upfront assurance for AED imaging. For high-performance HI reconstruction, the sparse priori conditions for spatial texture and spectral features of hyperspectral cubic data are introduced into our reconstruction network named as spectral-spatial sparse algorithm (SSA). The sparse priori conditions work in the proposed SSA as a regularization-optimization form with TV, to eliminate texture blurring and chromatic aberration problems (see Dataset 1, Ref. [67]).
Figure 1.Proposed E-DoF HI system. The AED hyperspectral images can be regained from the blurred image captured by the camera through subsequent processing by a deep learning neural network with SSA.
We have demonstrated utilizing a dual-innovation approach integrating hardware (freeform-DOE) with reconstruction algorithms (SSA) to reconstruct a hyperspectral image across a vast imaging volume. The proposed system can capture a snapshot image with an extreme DoF of approximate 5 m (0.5–5.2 m), whose theoretical value for an optic element with the same diameter is at the central wavelength of 540 nm. The significant improvement is higher than any system that has been reported before. Besides, our system shows an exciting performance in other metrics with a spectral resolution of 10 nm, a PSNR of 34.85 dB, an SSIM of 0.976, and a spectral accuracy (SpA) of up to 92.21%. With such a large DoF, it becomes feasible to eliminate focusing mechanisms from cameras and rapidly acquire hyperspectral images across an extremely large range of object distances.
2. DESIGN METHODS
A. Schematic of HI System with E-DoF
Our AED HI system [Fig. 2(a)] comprises a single flat freeform-DOE, serving to modulate and encode the objective light field. Freeform-DOE, as the key hardware of our system, is made by photolithography etching the asymmetric structure onto a glass substrate, as shown in Section 2.D. The PSF, determined by the height map of freeform-DOE, encodes the object light field, and the robustness and accuracy of encoding directly affect the quality of hyperspectral reconstruction. The PSF of the system is consistent across the entire working distance range from to , making it advantageous for hyperspectral compression encoding and reconstruction across a large imaging volume. Figure 2(b) shows a schematic of the end-to-end joint design process. During the forward pass, the object image (considered as the ground truth) is convolved with various PSFs of the freeform-DOE in each spectral channel at different working distances. For one spectrum, the corresponding PSF scheme provides the spatial encoding of the object. For different spectra, the corresponding PSFs can sparsely encode all the spectra. The convolution images are captured by a complementary metal oxide semiconductor (CMOS) and used as the input of the following recovery network. To achieve high-fidelity performance in resolving the spectrum and texture information, we propose SSA [as shown in Fig. 2(c)], which will be detailed in the next section. In the backward pass, the calculated error is back-propagated to the designed height map of freeform-DOE and parameters of the recovery network, guiding their refinement until a well-performing system is achieved. The end-to-end joint optimization design approach enables an optimal match between the hardware freeform-DOE and software recovery network, resulting in a well-performing recovered hyperspectral image.
Figure 2.Overview of AED HI system. (a) Schematic of the proposed system. System allows for high-fidelity imaging across a broad distance from to ; (b) learning pipelines for freeform-DOE. In forward pass, the convolved images of the object and PSFs modulated by the freeform-DOE are captured by a camera and reconstructed by the network. In back pass, the loss function of the recovery network guides the optimization for the freeform-DOE until a satisfactory image can be obtained. (c) Basic framework of SSA. The image is segmented into nine distinct subregions within the 2D plane for processing, each represented by blocks of a unique color. The SSA method is then utilized to limit the difference between neighboring pixels in 2D space and spectral dimension to obtain an optimal output hyperspectral image. (d) Results of hyperspectral reconstruction (illustrated by RGB false color) and its images at different spectral channels.
Combining dual-innovation, we design and fabricate [Fig. 2(a)] a freeform-DOE with a 3 mm diameter capable of encoding in the extended DoF. We illustrate the PSFs of the optimized system for different object distances [Fig. 3(a)] and spectral channels [Fig. 3(b)]. From these results, features of those PSFs remain largely consistent within the working range of 0.5–5.0 m, signifying the shift-invariance characteristic in 3D imaging space. The shift-invariance property facilitates subsequent hyperspectral reconstruction to reduce the complexity of the reconstruction algorithm and depress the difficulty of the extended DoF. At the same distance, large differences in PSFs in different spectral bands have higher measurement randomness, which better satisfies the restricted isometry property (RIP) criterion of compressive theory and provides the necessary conditions for reconstructed high spectral performance. In the range of 520–660 nm, the shapes of the zoomed-in PSFs are similar, but the pixel size of the PSF gradually decreases from to , i.e., the physical size decreases from 75 to 12 μm with a pixel resolution of 3 μm.
Figure 3.PSF characteristics of proposed system. (a) PSFs of the system at different object distances. (b) Zoomed-in PSFs at different spectral channels.
To achieve a balance between extreme DoF and high-quality HI, our objective is to achieve consistency in the corresponding PSFs at different bands and spatial positions after they have been encoded by a freeform-DOE. Mathematically, this encoding is achieved through differential modulation of the amplitude and phase of light at different wavelengths by the freeform-DOE. More specifically, the compression and encoding of light by the freeform-DOE are determined by its PSFs, which in turn depend on the surface profile features of the freeform-DOE, known as the height map. Therefore, the desired PSFs can be obtained by optimizing and designing the height map of the freeform-DOE.
The phase modulation of the freeform-DOE for different wavelengths of light can be expressed as where is the difference between the refractive indices of air and freeform-DOE at different wavelengths. is the height map of the freeform-DOE, which is expressed by where is the th Zernike polynomial and is the coefficient of the th Zernike polynomial. The number of Zernike terms and the value of the coefficient together determine the profile of the freeform-DOE, which in turn affects the imaging results of the entire system. Based on the physical significance of the Zernike polynomials and a large number of simulations, we use polynomial terms to express the surface profile of the freeform-DOE. Further, we optimize to obtain an ideal distribution of the PSFs.
According to the law of light propagation, after passing through the freeform-DOE, the light field on the front surface of the CMOS can be expressed as where is the amplitude modulation of freeform-DOE for different wavelengths, is the aperture limit of freeform-DOE, and is the distance between DOE and CMOS. According to the definition, the light field on the front surface of CMOS is the PSF of the proposed system for point (). Thus, the whole object field at the surface of CMOS for one wavelength is where represents the linear convolution of the PSF and the entire field. Ultimately, the camera captures a coded RGB image by integrating the product of the field and response function across the spectrum, which can be expressed as where is the response function of the camera covering the 420–660 nm wavelength range. Note that the captured image is related to the response function and the field . Since function is completely determined by the parameters of CMOS, there is only one variable parameter for the captured image. From Eq. (6), we find that the PSF of the system directly affects when the input object field is known. Furthermore, from the derivation of the PSF, it can be seen that the PSF is dependent on , which is closely related to the object distance . In practice, the object points can be discretized into an infinite number of points across different spaces and spectra. To simplify the computation and reduce the operating memory, we define a pool of object distances, denoted as instead of using a constant value. This pool contains several different object distance values within the desired E-DoF range (from to ), which are randomly selected to optimize the freeform-DOE, as shown in Fig. 2(a). During the end-to-end joint optimization, it is set to be from 0.5 to 5 m in increments of 0.5 m. We assume that each value in is calculated with the same probability when the number of computations is sufficiently large. Through extensive experiments, the PSF at each is optimized with the same probability for 150 epochs; thus the training epoch is set to 150. In each training iteration, from the pool was randomly selected as the objective distance for each image. If we achieve good quality and stable output from the recovery network, we can assume that each value corresponds to the same PSF, which means that different objective distances correspond to the same calculations of PSF. This indicates that the imaging system can produce high-quality images of all objects across a large imaging volume.
C. SSA for Achromatic Reconstruction
We employ the widely used U-net [68] neural network as a recovery network with a decoding function in our proposed system. The captured 2D encoded images serve as inputs to the recovery network, which incorporate skip connections and three scales involving Maxpools2D and Upsamplings2D. Each scale comprises two consecutive convolutions using the Conv2D operation with a spatial filter size, zero paddings, and rectified linear unit (ReLU) activation.
For a hyperspectral recovery network, it is crucial to consider both spatial and spectral resolutions. On this basis, reconstruction speed and computing memory are criteria for evaluating the quality of a network. Thus, we propose a high-speed SSA based on the U-net framework, as shown in Fig. 2(c). In the proposed SSA, define the loss function as a combination of the following equation: where is the difference between the output (reconstructed image ) and input (considered as the ground truth when training) images in each spectral band and 2D space to maintain data consistency, and is a regularization term to improve the quality and detail fidelity of reconstructed images. represents the horizontal and vertical differences of at pixel in the spatial dimension; represents a difference of at pixel in the spectral dimension. , , and are their weights separately. The TV operation on the spatial–spectral dimension considers the sparse priori conditions of spatial texture and spectral features to eliminate texture blurring and chromatic aberration problems. The SSA method is then utilized to optimize the hyperspectral reconstruction model and determine the optimal result as an output hyperspectral image. Figure 2(d) shows a blurred image captured by the camera in our system and the recovered hyperspectral image that is reconstructed by SSA and illustrated by false RGB colors. From the results in different spectral channels, we can see that each spectral channel is recovered with high spatial resolution and high spectral resolution using the proposed system, which covers the range from 420 to 660 nm in steps of 10 nm.
D. Fabrication
The diameter of the fabricated freeform-DOE is 3 mm and it is etched on a glass substrate with a thickness of 1 mm and an edge length of 25.4 mm, using an μ thickness positive photoresist. The photoresist is roasted at 100°C for 5 min. The grayscale facts of the designed freeform-DOE are diverted to the photoresist layer by using grayscale laser lithography at 405 nm wavelength. Last, freeform-DOE with the designed asymmetric height map information is obtained after developing the exposed photoresist layer through rotation, spraying, soaking, and development.
3. RESULTS AND DISCUSSION
A. Achromatic Imaging
We first assess the final imaging performance of three different end-to-end designed HI systems based on a single DOE: (1) a conventional system without E-DoF, (2) a system introducing E-DoF, and (3) the proposed AED HI system. Simulations of these systems are performed using a computer equipped with an NVIDIA RTX 3090 with 24 GB of memory. The ARAD hyperspectral dataset [69] is utilized. It consists of 454 hyperspectral images with spatial dimensions of and 31 spectral bands ranging from 400 to 700 nm with a 10 nm step. The dataset is divided into 404 images for training and 50 images for testing. In our evaluation, we select 25 spectral bands ranging from 420 to 660 nm to consider the spectral response of our system. During the training epochs, one hyperspectral image is sequentially read at a time.
Figure 4 illustrates a comparison of the simulation results of the three different HI systems for the same object at different object distances with their corresponding PSNR, SSIM, and SpA. The recovered images are displayed using a false-color RGB representation for more intuitive visualization. The results show that the conventional system without E-DoF performs unsatisfactorily at out of focus, as shown in Fig. 4(a). As the object distance increases, noticeable degradation in PSNR, SSIM, and SpA is reflected in blurring or even losing details. This is due to the fact that the parameters and performance of the trained recovery network are directly related to the training images. Insufficient DoF can lead to system defocusing and blur, causing a change in the imaging model of the training image acquisition system (need to add a defocusing degradation model). This results in a mismatch of neural network parameters, reducing both imaging quality and spectral accuracy. Besides, when the object moves out of the DoF, it becomes necessary to refocus or move the system to capture it, an operation that is time-consuming. It follows that a limited DoF seriously affects the dynamic performance of the system, encompassing aspects such as imaging quality, spectral accuracy, and imaging speed. Therefore, we consider to introduce E-DoF for freeform-DOE in the end-to-end design. As shown in Fig. 4(b), this system performs similarly for objects at different such that the PSNR and SSIM of the reconstructed images are almost unchanged (PSNR remains around 26 dB), which proves that the proposed design method of the extended DoF performs well. However, the reconstructed images suffer from severe blurring and chromatic aberration, resulting in the SpA of them within the range of DoF being quite low. There are three reasons for this problem: (1) the conventional algorithm focuses more on the data similarity, but neglects the sparse priori condition for spatial texture features, resulting in a poor performance in technical metrics; (2) they also do not consider the sparse priori condition for spectral features, so the reconstructed spectrum performs poorly in SpA; (3) facing those high-dimensional optimal problems, the end-to-end joint learning gets trapped in the local optimization solution that is unbalanced between the objectives of extending the DoF and reconstructing the hyperspectral image.
Figure 4.Comparison of simulation results of the three different HI systems considered in the study for the same object at different objective distances . The images are illustrated by RGB false color (Dataset 1, Ref. [67]).
Accordingly, introducing the sparse priori conditions for spatial texture and spectral features of hyperspectral images into our reconstruction network, we perform another simulation using the proposed E-DoF freeform-DOE and SSA. Notably, when , our system exhibits superior quantitative metrics compared to conventional systems (designed with an optimal object distance at 1 m), a significant improvement attributed to the integration of the SSA. For the proposed system, even when , the PSNR, SSIM, and SpA remain relatively high at 30.387 dB, 0.961, and 87.050% with a PSNR of up to 35.148 dB, an SSIM of 0.981, and an SpA of up to 97.324%, which are achieved at . Moreover, as the object distance varies, the variation in the PSNR is less than 5 dB, while the SSIM is less than 0.02 and the SpA is less than 10%. In the proposed system, despite that SpA varies with , we consider it to be acceptable. Furthermore, compared with the “Introduce E-DoF” system, we only change the reconstruction algorithm to SSA, and the reconstructed hyperspectral images are not blurred or affected by chromatic problems. In summary, the proposed AED system demonstrates excellent performance in the simulations, yielding high-quality recovered images with a high level of agreement with the ground truth data.
B. Extended DoF
Based on the proposed hyperspectral design method, a freeform-DOE is fabricated using grayscale lithography (more details in Section 2.D). We use an industrial camera from FLIR (Blackfly S, pixel μ, ) for the acquisition of coded images. The freeform-DOE is placed 50 mm in front of the sensor. Before starting our experiments, we calibrate the errors caused by the installation and alignment that affected the performance of the proposed method. The images from the dataset were first displayed on a screen, and the blurred coded images were obtained by photographing the displayed dataset using our proposed system. Using the hyperspectral images from the datasets as the ground truth (GT), we retrain the recovery network with a learning rate of , a batch size of one, and 100 epochs. Finally, as the proposed freeform-DOE can provide the reliable and precise PSF to obtain the optimal compressive approach that matched the reconstruction algorithm, a test with the available data is carried out to recover high-fidelity spectral information with 25 spectral bands from 420 to 660 nm for the working distance from 0.5 to 5.2 m. To the best of our knowledge, this is the first system able to recover this number of bands in the visible spectrum over such a large DoF.
To assess and analyze the extended DoF capabilities of the proposed system, multiple objects are simultaneously positioned at varying distances. For the recovery method, we utilize the same network architecture as the decoders, which successfully restore 25 spectral bands. These recovery networks are trained using the hyperspectral datasets constructed as described in Section 2. Figure 5 shows the experimental results obtained using the proposed hyperspectral system. In order to ensure that all the light entering the camera is modulated by the DOE rather than passing through the surrounding glass sheet, we place an aluminum sheet with a small hole of the same diameter as DOE in front of the camera. Owing to field-of-view limitations, accommodating numerous objects within close- and far-distance areas is impractical. Therefore, two groups of objects are considered, with the experimental scenes are depicted in Fig. 5(a): (1) simultaneous imaging is performed at 1.2 m (digital blocks), 2.2 m (3D portrait sculpture), 3.2 m (cartoon dog), and 5.2 m (house); (2) simultaneous imaging is performed at 0.5 m (avocado doll) and 0.7 m (tiger doll). Notably, the digital blocks of the reconstructed image at 1.2 m exhibit an impeccable reconstruction, and the details of the numbers (white dots) are completely recovered, as shown in Fig. 5(b). Additionally, we observe that the 3D portrait sculpture positioned 2.2 m away demonstrated properly restored hair, face, and light details. Furthermore, the object’s intricate features, such as the distribution of clouds and windows, situated at 5.2 m from the camera, are also preserved. Consequently, we conduct another experiment involving a tiger doll positioned at 0.7 m and an avocado doll positioned at 0.5 m to verify the extended DoF results at close distances, as shown in Fig. 5(c). Notably, the small size of the avocado doll necessitates its upside-down placement in a black box. From the provided views, we can observe that the colors of both dolls are accurately restored. Despite being almost absent in the captured image, the ears of the tiger doll and letters on the box are recovered in the reconstructed image. Moreover, the expression and outline of the avocado doll are remarkably clear. To selectively illuminate the desired objects for our experiment, we extinguish the indoor lighting and utilize a flashlight to direct light onto the target objects. However, the flashlight employed as a supplementary light source exhibits a blue hue, which consequently causes the recovered images to appear overly blue. The results demonstrate that our proposed system can deliver high-performance extreme DoF images of multiple objects positioned at different distances.
Figure 5.Experimental results for extended DoF. (a) The experimental scenario diagram. (b) Objects at 1.2–5.2 m from the camera are all in focus; (c) experimental results for extended DoF at a close distance. Objects at 0.5–0.7 m from the camera are also in focus (Dataset 1, Ref. [67]).
Figure 6.(a) Visual comparison between proposed system and ground truth (GT). Recovered hyperspectral image and GT are both illustrated by RGB false color. (b) Intensity and accuracy of chosen dots at different wavelengths.
Furthermore, to quantitatively investigate the advancement of the proposed system, we have analyzed the spectral resolution, PSNR, system complexity, and DoF of the state-of-the-art methods for E-DoF in the spectral domain, as shown in Table 2. Combined with the dual innovation in hardware and reconstruction algorithm, we first realize an achromatic high-fidelity freeform-DOE HI with an extreme DoF of approximately 5 m, a spectral resolution of 10 nm, and a PSNR of up to 34.85 dB.
Quantitative Comparison of the State-of-the-Art Methods for E-DoF in Spectral Domain
Application Scenario
Reference
Dispersion
Spectral Resolution (nm)
PSNR (dB)
System Complexity
DoF (m)
Depth detection
Zhang et al. [63]
Metalens
40
\
Simple
0.16
RGB imaging
Fontbonne et al. [65]
Phase mask
\
\
Complex
0.5 (0.4–0.9)
HI
Baek et al. [51]
DOE
10
29.31
Ultra-simple
1.6 (0.4–2.0)
Kou et al. [66]
Spectral camera
1
30.56
Ultra-complex
1.5
Sahin et al. [64]
DOE
10
28.86
Simple
1.6 (0.4–2.0)
Ours
DOE
10
34.85
Ultra-simple
4.7 (0.5–5.2)
D. Imaging Speed
This section demonstrates the effectiveness of the proposed device for snapshot HI. The speed of image acquisition, as determined by the sensor, and the recovery speed of the neural network are the key factors influencing the HI speed of our system. The camera used in our setup supports continuous acquisition at a rate of 30 frames per second, which is equivalent to a speed of 33 ms/frame. Additionally, the recovery speed of the neural network is approximately 10 ms for a single image. To assess the imaging speed of our system, we record the motion of falling blocks using continuous acquisition. Figure 7 shows snapshots of the block movement at specific moments. We select only a subset of images with notable differences for display purposes (the complete set of results can be found in Visualization 1). The results show that our method successfully preserves both fine details and color fidelity. Upon analyzing the recovery results depicting the captured falling process of the moving objects, we infer that the proposed AED HI system is capable of providing accurate achromatic hyperspectral information for rapidly moving objects.
Figure 7.Experimental results of reconstruction of moving objects. The blocks are pushed down from a height by a pen and their falling process is captured by a camera used in our system. The results of selected moments are shown with the RGB false color and the full results can be found in Visualization 1.
In summary, we propose a novel AED imaging optimization paradigm that effectively combines the advancement of the state-of-the-art imaging element freeform-DOE and physics-driven deep learning reconstruction algorithm. Zernike polynomials are selected as the basic surface of the freeform-DOE, prior to optimization, to reduce the overall design workload and improve the light-field modulation ability. The SSA with spatial–spectral regularization is introduced into the deep neural network, enabling the high-fidelity reconstruction of the achromatic hyperspectral images. Then, we adopt an end-to-end architecture to optimize the detail of the freeform-DOE surface, achieving a compact HI system that can capture images in a single frame with a large DoF. Through iterative optimizations, the compact system with a spectral range of 420–660 nm achieves extreme-DoF imaging that is times higher than that of the traditional lens in the same aperture. The comprehensive optimization greatly enhances the DoF range of HI without degrading any performance of a compact flat optical system.
Although ultralight and ultrathin structures can be achieved by utilizing flat lenses and meta-lenses, they are limited by chromatic aberration problems. As the bandwidth of the illumination and DoF increase, the chromatic aberration becomes more severe and degrades the quality of HI. By optimizing the freeform-DOE with an extended DoF and specific chromatic aberration, we transform the unfavorable chromatic aberration problem into the spectral compression coding of the light field, which is beneficial to our task. The PSFs modulated by our asymmetric freeform-DOE can present a large difference in different spectral bands and high similarity at different objective distances over a large DoF range. The large differences in PSFs in different spectral bands have higher measurement randomness, which better satisfies the RIP criterion of compressive theory and provides the necessary conditions for reconstructed high spectral performance. High similarity of PSFs at different objective distances implies the shift-invariance property in 3D imaging space, which facilitates subsequent hyperspectral reconstruction to reduce the complexity of the reconstruction algorithm and depress the difficulty of extending DoF. For HI reconstruction, the sparse priori conditions for spatial texture and spectral features of hyperspectral cubic data are introduced into our reconstruction network. In our SSA, the sparse priori conditions are considered through a regularization-optimization form with TV operation on the spatial–spectral dimension to eliminate texture blurring and chromatic aberration problems.
In combination with our SSA with an achromatic purpose in the spatial and spectral dimensions, end-to-end joint learning can achieve an optimization solution that is balanced between the targets of extending the DoF and reconstructing the hyperspectral image. We demonstrate that an ultralight and ultrathin snapshot achromatic HI system with a flat extended DoF lens can be jointly optimized to obtain achromatic HI across a large imaging volume of up to 5.2 m. The HI system can capture achromatic, high-fidelity hyperspectral snapshot images at the speed of 33 ms/frame with 25 spectral channels ranging from 420 to 660 nm, covering distances from 0.5 to 5.2 m. To the best of our knowledge, this is the first time that HI with an extreme DoF has been achieved, while no existing achromatic flat lens is reported approaching a comparable performance like this.
Finally, we believe that the proposed end-to-end joint optimization approach combines the advantages of asymmetric free-form surfaces, computational imaging, and deep learning reconstruction, providing a new perspective for optical design. Under the guidance of the loss function, the proposed AED HI system created a new record for E-DoF achromatic spectral imaging, which will enable more possibilities for the application of spectral imaging such as spectral-depth detection and coherent chromatography.
Acknowledgment
Acknowledgment. Zhenqi Niu, Chaoyang Wei, and Jianda Shao conceived the original idea and supervised the experiment. Yitong Pan wrote the original draft of the manuscript and designed, established, and performed simulation of the model. Xiaolin Li and Zhen Cao helped design and draw the figures. Songlin Wan helped develop the ideas and concepts presented herein. Yuying Lu assisted with data processing and analysis. All authors discussed the results and reviewed and revised the manuscript accordingly.
[26] D. P. Casasent, A. A. Wagadarikar, T. Clark. Single disperser design for compressive, single-snapshot spectral imaging. Proc. SPIE, 6714, 67140A(2007).
[51] S.-H. Baek, H. Ikoma, D. S. Jeon. Single-shot hyperspectral-depth imaging with learned diffractive optics. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2651-2660(2021).
[64] E. Sahin, U. Akpinar, A. Kim. Learning extended depth of field hyperspectral imaging. IEEE International Conference on Image Processing, 1850-1854(2023).
[69] B. Arad, R. Timofte, O. Ben-Shahar. NTIRE 2020 challenge on spectral reconstruction from an RGB image. Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 446-447(2020).