
- Journal of the European Optical Society-Rapid Publications
- Vol. 19, Issue 1, 2023023 (2023)
Abstract
1 Introduction
Driver assistance systems need to be robust against a variety of different environmental conditions, especially bad weather conditions such as heavy rain, fog or snow, if we envision them to drive our cars autonomously one day. Current systems rely on many different sensors, such as cameras, lidar, radar etc., which produce a significant amount of data. Nevertheless, the performance in bad weather conditions is still often poor. One promising sensor for bad weather conditions is the so-called time-gated camera [
Apart from robustness on the hardware side, we also need to very robustly detect objects in real time, combining as much different sensor data as possible. Although computation power seems to be ever increasing, the live readout and interpretation of different sensor data is still challenging – even more so with modern evaluation algorithms such as neural networks. A reduction in data on the recording side while still maintaining all relevant scene information is therefore highly commendable. Such a measurement system can be understood in the framework of compressed sensing (CS) [
In recent years, some pioneering work towards single-pixel 3D-scene detection was reported in literature. Ren et al. first proposed the combination of a single-pixel-camera with time-gating in 2011 [
In this paper, we investigate the possibility to use a time-gated-single-pixel-camera for autonomous vehicles in harsh environmental conditions. Using a sensor in this setting directly leads to two major problems:
Illumination power: Due to the extended intermediate medium between illumination/detector and object, only a very small amount of the illumination power ever reaches the sensor. In order to overcome the inherent detector noise, pulse energy of the illumination system should be high. On the other hand, the overall power needs to be low enough to respect eye-safety norms.
Rapid scene changes: Driving inherently implies a highly dynamic environment. Accordingly, high frame rates of 25 Hz or more are mandatory. Moreover, the obscuring medium itself is dynamic, such that we have to deal with additional fluctuations in our single-pixel measurements.
We believe that a low compression ratio is the key to solve both problems. If only some few recordings are necessary, we can either measure fast enough to ensure a quasi-static scene during acquisition or even measure in parallel as it was briefly mentioned in [
The basic idea of image-free classification, i.e. direct classification on the single-pixel signal without reconstruction of an image, was given by Davenport et al. very early after CS-theory was formulated, but was mostly unnoticed at the time [
This paper contains our preliminary study on the possible data compression we can hope to achieve in a time-gated-single-pixel-camera for autonomous vehicles in harsh weather conditions. We will demonstrate that a time-gated-single-pixel-camera is able to robustly detect objects with a minimum of recorded data, i.e. fast enough to even cope with adverse weather conditions and highly dynamic environments.
We will shortly revise the principles of time-gating and a single-pixel-camera in
2 Theoretical background
A time-gated camera consists of a pulsed laser and a camera with a very fast shutter (opening time typically in the nanosecond range). The camera shutter is triggered such that it opens only for a very short time after a laser pulse was sent at a user-defined delay time. The pulse length is typically much lower than the shutter time or gating time of the camera. Independent thereof, the gate can be expressed as the convolution of the detector gate with the laser pulse:
G denotes the gate, GD the detector gate and P the illumination pulse. Here, we directly express the gate as a range gate, as the range z is proportional to the transit time t via the velocity of light c: z = ct/2. Therefore, photons are filtered according to their path lengths through the medium: Only photons with path lengths corresponding to the delay time arrive at the camera while the gate is open. Mathematically this is expressed by the convolution of the gate with the product of atmospheric attenuation and object or medium reflectivity:
Figure 1.Operation principle: For time-gating, a laser pulse illuminates the scene. After some delay time tdelay the camera shutter is opened for a very short time (nanoseconds). Thereby, the ballistic photons of a certain depth are filtered (a). A single-pixel-camera consists of a photodiode in combination with several (binary) masks (b).
In order to get the whole scene information, several images with different delay times need to be recorded. Apart from an enhanced recording time, this leads to more data than is strictly necessary (gated images have a lot of dark pixels, see e.g.
Figure 2.Examples for image conversion: Original simulated RGB images (top row) and corresponding simulated foggy gated NIR images with active laser illumination (bottom). The original image size is 512 × 512 pixels (13.3° FOV) whereas the gated images are downsampled to 64 × 64 pixels. The attenuation length of the fog was set to 13 m, the gate from 40 m to 60 m, the illumination FWHM to 6.4°, the image blurring to one pixel (gaussian filter) and the background illumination to 1%. RGB images taken from [
In a single-pixel-camera, the signal is recorded by a photodiode, which makes it especially suitable in wavelength ranges away from the visible spectrum, where no low-cost cameras exist. The lateral resolution is gained by pixel masks in front of the detector (see
In contrast to conventional image compression methods, we want to use neural networks to generate the masks of our single-pixel camera. Thereby, we are not reduced to specific base functions for our compression, but are able to find the optimal basis for our dataset. Moreover, we aim to implement an image-free detection scheme, i.e. we want to directly extract the object information out of the single-pixel signal using neural networks.
Combining equations
3 Results and discussion
In the introduction, we mentioned that a low compression ratio may deliver better preconditions for the real-time data evaluation for autonomous vehicles in harsh weather conditions. Therefore, we carried out several simulation experiments to understand which compression ratio might be feasible for our specific use case. In a first step, we reconstructed images out of the single-pixel information in order to have a reference. As we envision to directly detect objects within the single-pixel signal to further reduce the compression ratio, we additionally trained a classification network.
3.1 Determination of feasible compression ratio via simulations and neural networks
In this section we want to first explain how we simulated an adequate dataset, then proceed to give details about the neural networks and finish with their results.
3.1.1 Creation of dataset
We are not aware of any dataset comprising time-gated images with different delay and gating times in harsh weather conditions. Therefore, we created our own dataset. The data was taken from simulated RGB images of the DENSE dataset [
The synthetic DENSE dataset is not labeled. To produce an adequate dataset for the classification task, we added objects of different classes. We fixed the classes to “traffic sign”, “human” and “vehicle”. A total of ten different object images per class were used, each of these were pasted with a random size at a random position within our background gated images (see
Figure 3.Examples from the datasets produced for classification: One of ten different object images for each of the three classes “traffic sign”, “human” and “vehicle” were pasted with different sizes at a random position within the gated images. We produced two different datasets, one with a gating range of 1 m (left column) and one with a gating range of 15 m around a distance of 50 m. The images with the longer gating range exhibit significantly more background.
Table Infomation Is Not EnableWe split all datasets in 94.5% training data, 5% validation data and 0.5% test data.
3.1.2 Neural network architecture
For the image reconstruction task, we trained an autoencoder-type network. An autoencoder (AE) compresses the data in the encoder part down to a latent vector with size L followed by a subsequent decompression in the decoder to reconstruct the original image [
Figure 4.Schematic drawing of the network architecture of the autoencoder (a) as well as the single-pixel-decoder (b) for 6.25% compression ratio. For lower compression ratios, a down/up-convolutional block as well as an encoder block was added. The encoder/decoder block consists of two convolutional layers with kernel size 3 and 128 filters each as well as a skip connection. As activation function we used LeakyRelu. The weights of the decoder part of the single-pixel-decoder are shared with the autoencoder network and not retrained. DownConv: Convolutional layer with stride 2, EncBlock/DecBlock: Encoder/Decoder block, MaxPool: Maximum pooling layer, UpConv: Convolutional layer followed by a transpose convolutional layer with stride 2, UpSamp: Upsampling layer.
The classification network consisted of two hidden dense layers. The first has 128 nodes, the second 12. As loss function we chose the categorical crossentropy loss and as evaluation metric the accuracy, i.e. the number of correctly classified images over all test images.
3.1.3 Reconstruction
We trained our reconstruction networks with two different compression ratios cr = 6.25% and cr = 1.6%. Some exemplary results thereof can be found in
Figure 5.Examples of reconstruction: Comparison of ground truth (GT) images with reconstructed (rec) results of the autoencoder (AE) and single-pixel-decoder (SPD) networks for two different compression ratios cr = 6.25% and cr = 1.56%. The first line shows the reconstructed images, whereas the second line shows the difference image Imagerec − ImageGT. Generally, fine details get neglected for lower compression ratios. Therefore, only the shapes of objects with fine details like e.g. greenery (see first line) get reconstructed. Objects with less high frequency components like traffic signs (last two examples) get reconstructed near perfectly even for the lower compression ratio.
Figure 6.Quantitative analysis of reconstruction results: The pixel-wise reconstruction performance – expressed by the MSE – decreases with decreasing compression ratio as expected. Moreover, the reconstruction ability of the autoencoder (AE) always outperforms the single-pixel-decoder (SPD). The SSIM on the other hand only decreases slightly with decreasing compression ratio, which indicates that the overall structure of the reconstructed images, i.e. the general shapes of the objects, are unaltered even for the lower compression of 1.56%.
Generally, the lower the compression ratio, the more the high spatial frequency components get neglected (see difference images in
3.1.4 Classification
For three object classification with a short gating interval of 1 m, i.e. negligible background (see
Figure 7.Quantitative analysis of classification results for short gate of 1 m (a) and longer gate of 15 m (b) around a range distance of 50 m. Whereas the test accuracy is nearly 100% for the short gate down to a compression ratio of 0.5%, it never exceeds 90% for the longer gate, even if the compression ratio is set to one. Nevertheless, the test accuracy only starts decreasing for a compression ratio of 0.5% for the longer gate as well.
We attribute the mal-classification observed even for the non-compressed signal to our construction of the dataset: The network cannot classify correctly if either the object is pasted unluckily within the background or the background itself confuses the network. Generally, we can observe that the prediction probability, i.e. the probability with which the network associates an image with a specific class, is more ambiguous if the image displays a richer background. One example thereof is given in
Figure 8.Two examplary images for one specific vehicle with the long gating range of 15 m and their corresponding prediction probabilities for the three classes: One with significant background (a, b) and one with nearly no background (c, d). For the former, the fidelity of the prediction is low for all compression ratios (b), while for the latter the prediction probability is 100% down to 0.5% compression ratio (d).
3.1.5 Feasible compression ratio
In
3.2 Determination of system performance
The results of the last section indicate that a very low compression ratio is sufficient to carry out object detection on the single pixel information. As already discussed in the introduction, this is crucial to deal with the highly dynamic environment as well as keep the illumination power within eye safety constraints. This is near impossible in the visible or near-infrared waveband, we therefore opt for an operating wavelength above 1400 nm. The exact value can be chosen according to the availability of laser sources, for our system we chose 1550 nm. Due to the dynamic environment, we believe a frame rate of ffr = 25 Hz or higher is necessary. In this case, the limiting factor for eye-safety is the total intensity emitted by the system within 10 s [
The maximum possible energy per measurement can then be calculated to be,
Here, ρ′ is the pixelwise reflection coefficient considering only absorption as the lambertian nature is expressed via
Let us proceed with an example. According to the simulations in
Let us consider for the moment only one illuminated pixel with no absorption and a fill factor of 0.8. Then our system would be able to detect such an object down to 5.6 attenuation lengths. This is comparable to time-gated systems which have been reported to perform down to approximately six attenuation lengths in fog and smoke [
In our noise calculation, we have estimated the noise floor with the NEP, the true noise floor might be slightly higher due to additional electronics and the rather high operation frame rate. Moreover, we most certainly have got background illumination in the scene. We have not included it in our analysis as we believe it to be small, if a narrowband wavelength filter is used in front of the photodiode. Even if the sun directly shines within our sensor, sun light in the short-wave-infrared (SWIR) region is comparable to our active illumination (0.62 W/m2/nm for reference air mass 1.5 spectrum [
In a next step, we plan to implement first a time-gated camera on a vehicle to validate our simulated data with real data. Moreover, we want to implement direct object detection on the single-pixel information, which we are currently working on in another project. Then, all relevant parameters for the time-gated-single-pixel-camera can be fixed and such a sensor system tested.
4 Conclusion
We have introduced the concept of a time-gated-single-pixel-camera as a promising sensor to tackle robust object detection in bad weather conditions for autonomous vehicles. Simulations of the concept in combination with neural networks show good performance. In particular, they prove that as few as 41 masks could suffice. In this case, the masks can either be hard-coded in front of several photodiodes or projected onto them with a single digital mirror device. Due to the live read-out of the photodiodes, a true single-shot detection of the whole scene would then be possible.
References
[1] A. Medina.
[2] Y. Grauer, E. Sonn. Active gated imaging for automotive safety applications.
[3] T. Gruber, F. Julca-Aguilar, M. Bijelic, W. Ritter, K. Dietmayer, F. Heide.
[4] B. Göhler, P. Lutzmann. Review on short-wavelength infrared laser gated-viewing at fraunhofer iosb.
[5] A.H. Willitsford, D.M. Brown, K. Baldwin, R.T. Hanna, L. Marinello. Range-gated active short-wave infrared imaging for rain penetration.
[6] D.L. Donoho. Compressed sensing.
[7] M.F. Duarte, M.A. Davenport, D. Takhar, J.N. Laska, T. Sun, K.F. Kelly, R.G. Baraniuk. Single-pixel imaging via compressive sampling.
[8] C.F. Higham, R. Murray-Smith, M.J. Padgett, M.P. Edgar. Deep learning for real-time single-pixel video.
[9] X. Ren, L. Li, E. Dang. Compressive sampling and gated viewing three-dimensional laser radar.
[10] L. Li, L. Wu, X. Wang, E. Dang. Gated viewing laser imaging with compressive sensing.
[11] M.J. Sun, M.P. Edgar, G.M. Gibson, B. Sun, N. Radwell, R. Lamb, M.J. Padgett. Single-pixel three dimensional imaging with time-based depth resolution.
[12] W. Gong, C. Zhao, H. Yu, M. Chen, W. Xu, S. Han. Three-dimensional ghost imaging lidar via sparsity constraint.
[13] L. Li, W. Xiao, W. Jian. Three-dimensional imaging reconstruction algorithm of gated-viewing laser imaging with compressive sensing.
[14] G.A. Howland, P.B. Dixon, J.C. Howell. Photon counting compressive sensing laser radar for 3D imaging.
[15] G.A. Howland, D.J. Lum, M.R. Ware, J.C. Howell. Photon counting compressive depth mapping.
[16] N. Radwell, S.D. Johnson, M.P. Edgar, C.F. Higham, R. Murray-Smith, M.J. Padgett. Deep learning optimized single-pixel lidar.
[17] M. Bashkansky, S.D. Park, J. Reintjes. Single pixel structured imaging through fog.
[18] C.O. Quero, D. Durini, R. Ramos-Garcia, J. Rangel-Magdaleno, J. Martinez-Carranza. Evaluation of a 3D imaging vision system based on a single-pixel InGaAs detector and the time-of-flight principle for drones.
[19] M.A. Davenport, M.F. Duarte, M.B. Wakin, J.N. Laska, D. Takhar, K.F. Kelly, R.G. Baraniuk. The smashed filter for compressive classification and target recognition.
[20] S. Jiao. Fast object classification in single-pixel imaging.
[21] Z. Zhang, X. Li, S. Zheng, M. Yao, G. Zheng, J. Zhong. Image-free classification of fast-moving objects using learned structured illumination and single-pixel detection.
[22] Z. Yang, Y.M. Bai, L.D. Sun, K.X. Huang, J. Liu, D. Ruan, J.L. Li. SP-ILC: Concurrent single-pixel imaging, object location, and classification by deep learning.
[23] D.J. Field. Relations between the statistics of natural images and the response properties of cortical cells.
[26] L. Theis, W. Shi, A. Cunningham, F. Huszár.
[27] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli. Image quality assessment: From error visibility to structural similarity.
[30] F. Christnacher, S. Schertzer, N. Metzger, E. Bacher, M. Laurenzis, R. Habermacher. Influence of gating and of the gate shape on the penetration capacity of range-gated active imaging in scattering environments.
[31] R. Tobin, A. Halimi, A. McCarthy, P.J. Soan, G.S. Buller. Robust real-time 3D imaging of moving scenes through atmospheric obscurant using single photon lidar.
[33] B. Sun, M.P. Edgar, R. Bowman, L.E. Vittert, S. Welsh, A. Bowman, M.J. Padgett.
[34] F. Soldevila, P. Clemente, E. Tajahuerce, N. Uribe-Patarroyo, P. Andrés, J. Lancis. Computational imaging with a balanced detector.
[35] M. Laurenzis, J.M. Poyet, Y. Lutz, A. Matwyschuk, F. Christnacher. Range gated imaging with speckle-free and homogeneous laser illumination.
[36] M. Laurenzis, Y. Lutz, F. Christnacher, A. Matwyschuk, J.M. Poyet. Homogeneous and speckle-free laser illumination for range-gated imaging and active polarimetry.

Set citation alerts for the article
Please enter your email address