- Photonics Research
- Vol. 13, Issue 11, 3121 (2025)
Abstract
1. INTRODUCTION
Single-photon avalanche diode (SPAD) arrays provide single-photon sensitivity and eliminate readout noise [1–6]. SPAD technology has already been widely adopted and continues to advance in the fields of LiDAR [7–9], fluorescence lifetime microscopy [10], and quantum imaging [11,12]. However, the practical implementation of SPAD arrays in high-speed imaging systems is limited by two types of dead time introduced by the quenching and readout circuit. The first, in-frame dead time, refers to the interval after an avalanche event during which the quenching circuit deactivates the SPAD arrays [13,14]. Photons arriving in this interval are undetected, disrupting continuous photon counting and degrading temporal fidelity. The second is the inter-frame dead time caused by the hardware readout time, as SPAD arrays could not detect photons until completing data readout [1]. Although SPADs support ns-scale acquisition, transferring 8-bit data typically requires around 3 μs to 4 μs. The combined impact of these two dead time causes interruptions in photon detection and compromises the continuity of data acquisition. This limits the performance of high-speed imaging systems.
Recent hardware developments have focused on reducing dead time and improving the temporal resolution of SPAD arrays by employing architectural mechanisms. Active quenching is a circuitry technique that, immediately after an SPAD avalanche, forcibly drops the diode’s bias to stop the current, drains the residual charge, and then quickly restores the bias, allowing the diode to be rebiased almost immediately. Malanga
Deep-learning-based interpolation techniques tackle temporal constraints introduced by dead time. Choi
Sign up for Photonics Research TOC. Get the latest issue of Photonics Research delivered right to you!Sign up now
In this study, we first establish a physics-based space–time model for SPAD imaging. A key feature of SPAD arrays is their parallel integrate-and-read architecture, which enables photon counting and data readout to operate simultaneously. Figure 1 shows the resulting inter-frame dead time and how it changes with the hardware integration interval. We then present two reconstruction strategies. The non-equivalent time integration strategy (NETIS) reconstructs two frames from a single measurement, while the equivalent time integration strategy (ETIS) reconstructs an intermediate frame from two consecutive measurements. Next, we design a transformer architecture integrating a temporal encoder, a spatial transformer encoder, and a decoder to extract complementary temporal and spatial features. We built three experimental setups to capture both macroscopic and microscopic scenes. Experimental results confirm state-of-the-art temporal and spatial super-resolution performance in single-photon imaging.

Figure 1.SPAD pixel hardware timing and reconstruction strategy. (a) Non-equivalent time integration strategy integration, readout, and inter-frame dead time. (b) Equivalent time integration strategy integration, readout, and inter-frame dead time. (c) Illustration of measurement and reconstruction with NETIS and ETIS. (d) Results of reconstructed frames with NETIS and ETIS.
2. METHOD
Single-photon imaging systems that use SPAD arrays detect individual photons with high sensitivity. However, this technology introduces challenges that do not arise with conventional CMOS sensors. One key difference is the correlation between speed and sensitivity. In traditional cameras, shorter exposure times lead to higher frame rates. In our type of SPAD system, where the frame rate is determined by the hardware readout time, reducing the integration time below this readout-defined limit does not increase the fundamental frame acquisition rate. This decoupling arises from dead time. Hardware constraints introduce in-frame and inter-frame dead time during integration and readout, which prevents continuous acquisition and sets an upper bound on the frame rate. The detection mechanism adds further complexity. Each photon event results in a pixel being inactive for a brief period of time. The inter-frame dead time limits the system’s information capacity, reduces the achievable frame rate, and constrains the bit depth of each measurement. It often leads to trade-offs in image quality or temporal resolution that are not present in conventional sensors.
Overcoming the temporal limitations and noise of SPAD data is crucial for high-performance imaging. Our framework starts with a physics-based spatial–temporal model that captures both dead time and noise sources. We then explore two reconstruction strategies. The first is the non-equivalent time integration strategy, which features long hardware integration time and negligible inter-frame dead time. This causes the output to reflect the total photons detected in each integration period. The second strategy is equivalent time integration strategy (ETIS), which employs an integration time shorter than the readout time, resulting in inter-frame dead time and limiting frame rate improvements despite shorter exposures. Finally, we introduce the single-photon temporal and spatial resolution network (SPTSR-Net), a bespoke deep learning architecture that fuses temporal and spatial features to deliver high-fidelity reconstructions directly from raw SPAD outputs.
A. Physics-Based Temporal-Spatial SPAD Imaging Model
1. Dead Time Modeling
Dead time is the period after photon detection during which the SPAD cannot detect new photons. This period reduces measurement accuracy and temporal resolution. We model two categories of dead time.
The overall measurement is modeled as
2. Spatial Noise Modeling
Accurate noise modeling is crucial for effective data simulation for SPAD arrays. Building on our previous work [26], we represent the total noise as the summary of multiple independent noise sources, expressed as
To mitigate these noise sources, we employ the same strategies in our previous work [26]. First, we correct for fixed-pattern noise using the manufacturer’s photon detection efficiency map. Subsequently, we acquired 60,000 single-photon (1-bit) dark-field images to calibrate time-dependent noises. Based on a temporal and spatial correlation analysis of these dark frames, we distinguish and quantify the afterpulsing probability, crosstalk probability, and dark count rate for our noise model.
B. Time Integration Strategies
Time integration strategies in SPAD arrays improve the temporal resolution and dynamic range of captured images. The choice of strategy affects how temporal information is recorded and reconstructed, as well as influencing the frame rate and data quality. In this study, we consider two strategies: the non-equivalent time integration strategy and the equivalent time integration strategy. Understanding the characteristics and limitations of these strategies enables the effective design of data processing and reconstruction algorithms.
1. Non-Equivalent Time Integration Strategy
In NETIS the hardware integration time is set significantly longer than the readout time of 10.40 μs. The inter-frame dead time remains minimal, often below 10 ns, due to synchronization signals and fast quenching circuits. Photon events are collected continuously over the integration period.
The main advantage of this mode is that it enables large photon counts to be accumulated, thereby improving the signal-to-noise ratio and allowing low-light information to be detected. The data acquired in this strategy represents the total number of photons detected during integration.
The measurement in NETIS can be expressed as
2. Equivalent Time Integration Strategy
In ETIS, the hardware integration time is shorter than or equal to the readout time. Reducing the integration period enables more accurate temporal information to be captured, but the readout circuitry limits the increase in frame rate. Photons arriving after the integration time but before the readout is complete cannot be counted, which introduces a significant inter-frame dead time.
The measurement in ETIS for each frame can be expressed as
To address these challenges we generated simulated datasets based on the physics-based temporal–spatial model for both integration strategies and designed a network architecture that improves temporal and spatial resolution. We generated simulated datasets from the UCF101 dataset [28], and the generation details are provided in Algorithm 1.
SPAD Data Simulation1: 2: 3: In-frame dead-time effect: 4: Calculate total photon flux: 5: Downsample spatial resolution from 6: Scale temporal data using photon flux: 7: Generate time-resolved counts with SPAD noise over 8:
The two proposed strategies provide a clear trade-off between temporal resolution and signal quality. The ETIS is designed for dynamic scenes, using short integration times to provide high temporal resolution information. However, this approach results in low photon counts and a reduced signal-to-noise ratio (SNR). It is also more sensitive to the readout circuit. In contrast, the NETIS is optimized for static or slow-moving scenes. It uses long integration times to increase photon counts and SNR, at the expense of lower temporal resolution. Our models can adapt to different integration times and noise levels, providing high-quality reconstructions that achieve the balance between temporal resolution and image quality in SPAD imaging systems.
C. Single-Photon Temporal and Spatial Resolution Network (SPTSR-Net) Architecture and Evaluation
We propose the single-photon temporal and spatial resolution network (SPTSR-Net), a framework that jointly enhances temporal and spatial resolutions in single-photon imaging. The network comprises a spatiotemporal encoder, a vision transformer backbone for feature refinement, and a U-Net decoder for image reconstruction as shown in Fig. 2(a). The proposed SPTSR-Net takes single-photon frames of pixels as input and outputs reconstructed frames of pixels, thereby achieving a spatial resolution enhancement.
![]()
Figure 2.Qualitative comparison of single-photon temporal super-resolution. (a) Overall architecture of the proposed network. (b) PSNR and SSIM comparison across different methods. (c) Reconstruction results in the equivalent time integration strategy. (d) Reconstruction results in the non-equivalent time integration strategy.
The temporal–spatial encoder takes raw input
First, a 3D convolution with batch normalization and GELU activation extracts primary features:
Next, depthwise 3D convolution enriches temporal cues:
Finally, a pointwise convolution block joins channel information:
A vision transformer refines these features. The encoder output is split into patches of dimension :
Each transformer block applies self-attention and a feed-forward network with residual connections:
The U-Net decoder integrates skip connections and upsample features to reconstruct the high-resolution frame. At decoder level , upsampling proceeds as
The final output is obtained by
Figure 2 presents qualitative simulation results. We used different image samples in Figs. 2(c) and 2(d) to show our method’s performance across diverse reconstruction scenarios rather than identical samples due to limited space constraints. This approach demonstrates the generalization capability of our ETIS and NETIS strategies by showing improvements across various scenarios. Figure 2(c) shows reconstructions with ETIS, and Fig. 2(d) shows results with NETIS. SPTSR-Net effectively suppresses noise and preserves fine motion details. Quantitatively, SPTSR-Net achieves a peak SNR (PSNR) of 25.57 dB and an SSIM of 0.85, significantly outperforming baseline methods as shown in Fig. 2(b). The insets highlight robustness in preserving intricate structures such as limb contours and ground textures. The model was trained for 150 epochs on a single NVIDIA RTX 4090 GPU.
NETIS distinguishes the current and future frames, adds a temporal consistency loss, and reduces motion blur via optical flow:
3. RESULTS
All high-speed imaging experiments were performed on an SPAD array MPD-SPC3 with nanosecond-scale dead time and configurable integration timing. The specific timing parameters used in our experiments, such as the 20 ns reference clock period and the 50–150 ns selectable dead time range, are based on the manufacturer’s hardware specifications for this device. In the non-equivalent time integration strategy, frames are acquired in rapid succession with only a 20 ns interval, while in the equivalent time integration strategy, the exposure can be shortened below the readout time to capture fast dynamics at the expense of additional dead time. Using these strategies, we successfully captured and analyzed the high-speed events.
A. High-Speed Imaging and Temporal Enhancement of Fan Rotation
To demonstrate the capabilities of our temporal super-resolution imaging enhancement techniques for capturing rapid mechanical motion, we performed an experiment involving a rotating fan, as illustrated in Fig. 3(a). This setup consists of a high-speed camera and an illumination source aimed at the rapidly rotating fan. Capturing the position and structure of the fan blades at high speeds usually results in noisy and blurred images, making accurate analysis challenging. We applied two temporal processing strategies to enhance the captured data.
![]()
Figure 3.High-speed imaging of fan rotation with temporal enhancement. (a) Experimental setup. (b) Comparison of raw frames and reconstructions with ETIS and NETIS. Timestamps indicate the corresponding time. Angles indicate the measured fan orientation.
The top row of Fig. 3(b) shows results with the ETIS method. Raw frames captured at specific time points such as frame 0 at 0 μs and frame 1 at 10.4 μs show significant noise due to the short exposure times. Although the shape of the fan is visible, the details remain blurred. The ETIS reconstruction performed between input frames at 5.2 μs provides a clear improvement in image quality. Noise is greatly reduced, and the structure of the fan hub and blades appears much sharper. Quantitative tracking captures the change in relative angles from 70.99° to 65.52° over 10.4 μs in the first example and produces a reconstruction angle of 67.86°. The enhanced clarity of the reconstruction enables more precise angle measurements with the noisy raw frames.
The bottom row of Fig. 3(b) shows results from the NETIS method. Raw frames recorded over a 10 μs interval display motion blur and noise. NETIS reconstructions captured at 5.2 μs and 10.4 μs suppress both noise and motion blur compared to the raw data and provide sharper details of the fan orientation. Measured angles of 71.34° at 5.2 μs and 67.35° at 10.4 μs in the first NETIS example contrast with 69.10° in the raw frame and demonstrate how the enhancement process extracts sharp temporal features from blurred input.
Both ETIS and NETIS enhance the quality of high-speed video frames of the rotating fan. These improvements enable clearer visualization and more accurate measurement of the angular position in rapid mechanical dynamics occurring on microsecond timescales.
B. High-Speed Imaging and Temporal Enhancement of Plasma Discharge
To evaluate the performance of our temporal super-resolution imaging strategies on highly dynamic phenomena, we conducted experiments capturing the arc discharge within a plasma ball. The experimental setup, shown in Fig. 4, involved imaging the plasma ball with SPAD arrays. The plasma ball generates transient discharge patterns from the central electrode that are challenging for conventional imaging due to their rapid changes and complex structures. We employed two different temporal strategies to improve the results of the discharge dynamics shown in Fig. 4.
![]()
Figure 4.High-speed imaging of plasma ball discharge with temporal super-resolution. (a) Experimental setup. (b) Raw frames and reconstructions with ETIS and NETIS. Timestamps denote the corresponding time.
Figure 4(b) top row shows the ETIS method results. Raw frames captured at frame 0 and frame 1 separated by 10.4 μs, show high noise levels and limited details due to the short exposures. Frame 614 shows a very sparse signal, illustrating the photon-limited conditions when imaging such rapid events. The central reconstruction in each ETIS sequence is produced at 5.2 μs, between input frames. These reconstructions reduce noise and sharpen spatial details, making the intricate branching of the plasma filaments visible and enabling effective observation of intermediate measurement results.
The bottom row of Fig. 4(b) illustrates the NETIS method. Raw frames recorded during 0–10 μs serve as the input. The first enhanced image represents data integrated over 5.2 μs, providing a temporal super-resolution result of discharge activity. The second enhanced image corresponds to 10.4 μs, providing a sharper snapshot at the end of that interval. Both enhanced images reduce noise while preserving the dynamic morphology of the filaments, providing a clearer result of the discharge.
Both ETIS and NETIS address the issues of noise and low photon counts in raw, high-speed data. Their reconstructions provide much clearer visualizations of transient plasma discharge dynamics, enabling the detailed analysis at the microsecond level.
C. Temporal Super-Resolution Imaging of Fluorescence Quenching Dynamics
We conducted an experiment on fluorescence quenching in dyed microspheres to demonstrate the capabilities of fast fluorescence imaging under photon-limited conditions. We observed this process with a microscope that guided emission light onto SPAD arrays, as shown in Fig. 5(a). Because the fluorescence decay was relatively slow, we used each set of 30 raw data intervals as our inputs and employed the equivalent time integration reconstruction method. Two frames were used to record the brightness decay, and our objective was to recover the continuous intensity drop with improved temporal resolution and reduced noise.
![]()
Figure 5.Temporal–spatial super-resolution imaging of fluorescence quenching in microspheres. Here, F1 and F2 refer to the raw input frames Frame 1 and Frame 2, and “Recon.” denotes the reconstructed frame. (a) Experimental setup schematic. (b) Comparative analysis with an upper histogram showing F2–F1 Diff. from raw frames and Recon.–F1 Diff. from reconstructions. The plot tracking means reconstructed intensity in regions A, B, and C for Recon. Frame 1, Recon. Frame 1.5, and Recon. Frame 2. (c) Raw frames from the SPAD arrays. (d) Recon. Frame 1 and Recon. Frame 2 are network outputs when Frame 1 and Frame 2 are used as inputs, respectively, and Recon. Frame 1.5 is reconstructed while Frame 1 and Frame 2 are used as the inputs. (e) Difference maps highlighting intensity change with F2–F1 Diff. on the left and Recon.–F1 Diff.
Figure 5(c) presents the raw data. Each frame shows the raw photon counts demonstrating the fluorescence quenching process. After applying our temporal super-resolution technique, the corresponding reconstructed sequence is shown in Fig. 5(d). The first and second reconstructed frames correspond to the measurements, while the middle frame is reconstructed through the measurements. We effectively reduce noise, clearly reconstruct individual microspheres, and enable precise tracking of the fading signals in regions A, B, and C.
The plots in Fig. 5(b) illustrate these results. The histogram compares pixel intensity change between the two sampling measurements for raw and reconstructed data. The reconstructed distribution is narrower but centered on the same mean, which shows that the method preserves the quenching process. The lower graph tracks average reconstructed intensity in regions A, B, and C across three relative time indices. Figure 5(e) complements this analysis. The difference map from the raw frames is noisy, whereas the map from the reconstructed frames clearly shows spatial patterns of diminishing fluorescence. Negative values indicate areas where quenching is strong, demonstrating that the method achieves temporal super-resolution in terms of both numerical values and visual clarity.
In summary, our temporal super-resolution method significantly improves temporal resolution and reduces photon noise. This enables the reconstruction of dynamics and provides reliable statistics on fluorescence processes.
D. Evaluation of Geometric Accuracy and Measurement Precision
Figure 6 benchmarks the proposed method on three fundamental geometric reconstruction experiments, demonstrating improvements in temporal resolution between two synthetic noisy observations. The synthetic data refers to data generated through a physics-based simulation, rather than data captured from SPAD arrays. Figure 6(a) evaluates our method’s ability to perform temporal super resolution for motion along the depth axis. In this synthetic experiment, we use the square’s vertical position in the image as a measure for its depth, simulating the effect of it moving closer to or further from the camera. For instance, given noisy inputs where the square’s position is recorded at 28 and 32 pixels from the border, our method accurately reconstructs the intermediate frame at its ground truth position of 30 pixels.
![]()
Figure 6.Evaluation of the proposed method on synthetic geometric shapes under severe noise. (a) Reconstructions of depth estimation for squares of varying sizes. (b) Reconstructions of translation for triangles with horizontally shifted bases. (c) Reconstructions of rotation for diamonds. Columns one, three, and five show noisy inputs with initial measurements while columns two and four show reconstructions with sharper edges and better measurement accuracy.
Figure 6(b) evaluates the accuracy of horizontal translation. Initial noisy measurements indicate the horizontal location at 74, 70, and 66 pixels. The reconstructed frames accurately refine the horizontal positions to 72 and 68 pixels, clearly demonstrating the capability of the method to enhance temporal resolution and precisely reconstruct horizontal translation.
Figure 6(c) shows the accuracy of estimating rotation with diamond shapes. Initial noisy frames produce rotational angles of 65.64° and 78.53°. Our method accurately reconstructs these frames, providing corrected angles of 71.85° and 83.97°, respectively. This demonstrates the effectiveness of our approach in improving both temporal resolution and rotational accuracy in noisy conditions.
E. Ablation Studies
We conducted ablation experiments to evaluate the impact of transformer layer depth and temporal encoding strategies. Table 1 shows the performance measured by PSNR and SSIM. Four transformer configurations with varying layer depths and three temporal encoding methods were examined. Results of Ablation Studies on Transformer Layer Depth and Temporal EncodingAblation Aspect Configuration PSNR (dB) SSIM Transformer layer depth 1 layer 25.0299 0.8362 2 layers 25.2172 0.8411 4 layers 25.5717 0.8489 8 layers 25.5739 0.8500 Temporal encoding strategy 3D encoder 25.5717 0.8489 2D Conv. 25.5267 0.8466 Reshape 25.3291 0.8448
Increasing the number of transformer layers improved reconstruction quality, with the PSNR rising from 25.03 dB for one layer to 25.57 dB for four layers. The corresponding SSIM also improved from 0.84 to 0.85. Further increasing to eight layers only slightly enhanced performance, suggesting limited benefits from deeper architectures beyond four layers.
Temporal encoding methods affected the results differently. The 3D encoder method produced the best results, achieving a PSNR of 25.57 dB and an SSIM of 0.85. The 2D convolution approach decreased the PSNR slightly to 25.53 dB, while the reshape method produced the lowest performance, with a PSNR of 25.33 dB. This demonstrates the effectiveness of explicit temporal modeling.
We attribute the robustness of our network to three architectural principles designed to process spatial–temporal information. First, an efficient temporal encoder with depthwise separable convolutions captures temporal dynamics without excessive computational cost. Second, our core fusion encoder implements a dual-path strategy, fusing these encoded temporal features with a parallel stream of raw input features. This fusion ensures that both high-level motion information and low-level spatial details are preserved for the transformer blocks. Finally, within the transformer encoder, we introduce an additional long-range skip connection to continuously provide the initial raw patch embeddings. This persistent access to low-level information serves as a strong regularizer, crucial for reconstructing sharp edges and textures that may not be fully captured by global metrics. Together, these design choices create a robust architecture that produces high-fidelity reconstructions.
In summary, the most effective model configuration is a combination of a four-layer transformer and the 3D temporal encoding strategy. This provides high-quality reconstruction while achieving the balance between complexity and performance.
4. CONCLUSION AND DISCUSSION
We present a physics-based temporal–spatial model combined with a transformer network trained on noise-calibrated simulations. We propose two integration strategies and achieve improvement in both temporal and spatial resolutions. The average PSNR increases by 8.14 dB without any hardware modification. This approach tackles the challenge of the temporal cutoff in single-photon imaging, enabling enhanced fidelity and temporal and spatial resolution that significantly improve SPAD imaging performance. Our method could improve the performance of real-time single-photon microscopy [29], high-speed LiDAR [30,31], live-cell fluorescence imaging [32,33], and plasma diagnostics [34]. Our method increases robustness to SPAD arrays noise and provides a significant improvement on current reconstruction methods.
However, this implementation relies on training sequences with calibrated statistics. Performance may degrade if the photon flux falls far below the calibrated range. Reconstruction also introduces computational overhead during high-resolution video processing. Furthermore, dependence on simulated data may result in certain complexities of real scenes being ignored, which could lead to differences in performance [35].
Future work will refine the temporal–spatial integration strategy. We will investigate self-supervised learning using real photon data and extend the framework to multispectral, quantum, and autonomous sensing. Adaptive mechanisms that dynamically respond to fluctuating noise and photon flux will further enhance system robustness. Integrating the method with emerging SPAD technologies and diverse imaging platforms will broaden its applicability to multiple domains.
References
[4] A. Kirmani, D. Venkatraman, D. Shin. First-photon imaging. Science, 343, 58-61(2014).
[9] O. Kumagai, J. Ohmachi, M. Matsumura. 7.3 a 189 × 600 back-illuminated stacked SPAD direct time-of-flight depth sensor for automotive LiDAR systems. IEEE International Solid-State Circuits Conference (ISSCC), 110-112(2021).
[13] E. Sarbazi, M. Safari, H. Haas. The impact of long dead time on the photocount distribution of SPAD receivers. IEEE Global Communications Conference (GLOBECOM), 1-6(2018).
[21] M. Choi, H. Kim, B. Han. Channel attention is all you need for video frame interpolation. Proceedings of the AAAI Conference on Artificial Intelligence, 10663-10671(2020).
[22] L. Lu, R. Wu, H. Lin. Video frame interpolation with transformer. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3532-3542(2022).
[23] G. Zhang, Y. Zhu, H. Wang. Extracting motion and appearance via inter-frame attention for efficient video frame interpolation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5682-5692(2023).
[24] J. Liang, J. Cao, G. Sun. SwinIR: image restoration using Swin transformer. Proceedings of the IEEE/CVF International Conference on Computer Vision, 1833-1844(2021).
[25] M. Cheng, H. Ma, Q. Ma. Hybrid transformer and CNN attention network for stereo image super-resolution. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1702-1711(2023).
[34] D. Faccio, G. Gariepy, G. S. Buller. SPAD array imaging and applications: from laser plasma diagnostics to tracking objects behind a wall. Imaging and Applied Optics, LM3D.3(2015).

Set citation alerts for the article
Please enter your email address


AI Video Guide
AI Picture Guide
AI One Sentence


