High performance “non-local” generic face reconstruction model using the lightweight Speckle-Transformer (SpT) UNet

Yangyundou Wang; Hao Wang; Min Gu

doi:10.29026/oea.2023.220049

Journals >Opto-Electronic Advances >Volume 6 >Issue 2 >Page 220049 > Article

Opto-Electronic Advances
Vol. 6, Issue 2, 220049 (2023)

High performance “non-local” generic face reconstruction model using the lightweight Speckle-Transformer (SpT) UNet

Yangyundou Wang^1,2,*, Hao Wang³, and Min Gu^1,2,**

Author Affiliations

¹Institute of Photonic Chips, University of Shanghai for Science and Technology, Shanghai 200093, China

²Centre for Artificial-Intelligence Nanophotonics, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China

³School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China

show less

DOI: 10.29026/oea.2023.220049 Cite this Article

Yangyundou Wang, Hao Wang, Min Gu. High performance “non-local” generic face reconstruction model using the lightweight Speckle-Transformer (SpT) UNet[J]. Opto-Electronic Advances, 2023, 6(2): 220049 Copy Citation Text

EndNote(RIS)

BibTex

Plain Text

show less

SpT UNet architecture for spatially dense feature reconstruction (a) with the multi-head attention (or cross attention) module (b) included transformer encoder block (c) and decoder block (d).

Fig. 1. SpT UNet architecture for spatially dense feature reconstruction (a) with the multi-head attention (or cross attention) module (b) included transformer encoder block (c) and decoder block (d).

Download full size | View in the Article

Fig. 2. The puffed downsampling - module architecture.

Download full size | View in the Article

Fig. 3. The leaky upsampling - module architecture.

Download full size | View in the Article

Fig. 4. Experiment set-up.

Download full size | View in the Article

Fig. 5. Overview of the data acquisition under various conditions and the training/testing/validation of the SpT UNet. (a) The training/testing data set is captured at T1 (0 mm), and T2 (20 mm). And the validation data set is captured at T3 (40 mm). The training/testing stage (b) and the validation stage (c) of the SpT UNet for the speckle reconstruction of the generic face images.

Download full size | View in the Article

Fig. 6. The ground truth (left column) and prediction (right column) of the trained SpT UNet with the camera placed at 40 mm away from the focal plane.The prediction results are overlaid with the true positive (white), false positive (green), and false negative (red).

Download full size | View in the Article

Fig. 7. Quantitative analysis of the trained SpT UNet using NPCC as the loss function (a) and SSIM as the indicator for accuracy (b).

Download full size | View in the Article

Method	Image size	FLOPs/10⁹	Parameters	Throughput (image/s)	Inference time(batch/ms)
SpT UNet	256×256	31.7	6.6 M	62.5	31.34
SpT UNet	224×224	24.3	6.6 M	83.3	24.02
SpT UNet	200×200	19.4	6.6 M	86.9	23.02
SpT UNet-B	256×256	19.6	4.0 M	78.5	25.46
SpT UNet-B	224×224	15.0	4.0 M	90.8	22.02
SpT UNet-B	200×200	12.0	4.0 M	95.1	21.02

Table 0. Performance of the SpT UNet.

View in the Article

Indicator	Diffuser/grit
Indicator	120	220	600	1500
PCC	0.98986	0.98988	0.98990	0.98994
JI	0.97655	0.97658	0.97661	0.97666
SSIM	0.95001	0.95009	0.95024	0.95035
PSNR/dB	19.3826	19.3887	19.3954	19.4052