Complex-valued universal linear transformations and image encryption using spatially incoherent diffractive networks

Xilin Yang; Md Sadman Sakib Rahman; Bijie Bai; Jingxi Li; Aydogan Ozcan

doi:10.1117/1.APN.3.1.016010

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Abstract

As an optical processor, a diffractive deep neural network (D²NN) utilizes engineered diffractive surfaces designed through machine learning to perform all-optical information processing, completing its tasks at the speed of light propagation through thin optical layers. With sufficient degrees of freedom, D²NNs can perform arbitrary complex-valued linear transformations using spatially coherent light. Similarly, D²NNs can also perform arbitrary linear intensity transformations with spatially incoherent illumination; however, under spatially incoherent light, these transformations are nonnegative, acting on diffraction-limited optical intensity patterns at the input field of view. Here, we expand the use of spatially incoherent D²NNs to complex-valued information processing for executing arbitrary complex-valued linear transformations using spatially incoherent light. Through simulations, we show that as the number of optimized diffractive features increases beyond a threshold dictated by the multiplication of the input and output space-bandwidth products, a spatially incoherent diffractive visual processor can approximate any complex-valued linear transformation and be used for all-optical image encryption using incoherent illumination. The findings are important for the all-optical processing of information under natural light using various forms of diffractive surface-based optical processors.

Keywords

diffractive neural networks diffractive optical networks image encryption machine learning optical computing optical networks

1 Introduction

The recent resurgence of analog optical information processing has been spurred by advancements in artificial intelligence (AI), especially deep-learning-based inference methods.1^–9 These advances in data-driven learning methods have also benefited optical hardware engineering, giving rise to new computing architectures such as diffractive deep neural networks ( $D^{2} NN$ ), which exploit the passive interaction of light with spatially engineered surfaces to perform visual information processing. $D^{2} NNs$ , also referred to as diffractive optical networks, diffractive networks, or diffractive processors, have emerged as powerful all-optical processors9^,10 capable of completing various visual computing tasks at the speed of light propagation through thin passive optical devices; examples of such tasks include image classification,11^–13 information encryption,14^–17 and quantitative phase imaging (QPI),18^,19 among others.20^–24 Diffractive optical networks comprise a set of spatially engineered surfaces, the transmission (and/or reflection) profiles of which are optimized using machine-learning techniques. After their digital optimization (a one-time effort), these diffractive surfaces are fabricated and assembled in 3D to form an all-optical visual processor, which axially extends at most a few hundred wavelengths ( $λ$ ).

Our earlier work10^,25 demonstrated that a spatially coherent $D^{2} NN$ can perform arbitrary complex-valued linear transformations between a pair of arbitrary input and output apertures if its design has a sufficient number ( $N$ ) of diffractive features that are optimized, i.e., $N \geq N_{i} N_{o}$ , where $N_{i}$ and $N_{o}$ represent the space-bandwidth product of the input and output apertures, respectively. In other words, $N_{i}$ and $N_{o}$ represent the size of the desired complex-valued linear transformation $A \in C^{N_{o} \times N_{i}}$ that can be all-optically performed by an optimized $D^{2} NN$ . For a phase-only diffractive network, i.e., only the phase profile of each diffractive layer is trainable, the sufficient condition becomes $N \geq 2 N_{i} N_{o}$ due to the reduced degrees of freedom within the diffractive volume. Similar conclusions can be reached for a diffractive network that operates under spatially incoherent illumination: Rahman et al.26 demonstrated that a diffractive network can be optimized to perform an arbitrary nonnegative linear transformation of optical intensity through phase-only diffractive processors with $N \geq 2 N_{i} N_{o}$ . However, encoding information with spatially incoherent light inherently confines both the input and output to nonnegative values, as they are represented by intensity patterns at the input and output apertures of a $D^{2} NN$ . To process complex-valued data with spatially incoherent light, other optical approaches were also developed;1^,27^–29 however, these earlier systems are limited to one-dimensional (1D) optical inputs and do not cover arbitrary input and output apertures, limiting their functionality and processing throughput. An extension of these earlier 1D input approaches introduced the processing of 2D incoherent source arrays using relatively bulky and demanding optical projection systems that are hard to operate at the diffraction limit of light.30^,31

Here, we demonstrate the processing of complex-valued data with compact diffractive optical networks under spatially incoherent illumination. We show that a spatially incoherent diffractive network that axially spans $< 100 \times λ$ can perform any arbitrary complex-valued linear transformation on complex-valued input data with negligible error if the number of optimizable diffractive features is above a threshold dictated by the multiplication of the input and output space-bandwidth products, determined by both the spatial extent and the pixel size of the input and output apertures. To represent complex-valued spatial information using spatially incoherent illumination, we preprocessed the input information by mapping complex-valued data to a real and nonnegative, optical intensity-based representation at the input field of view (FOV) of the diffractive network. We term this mapping the “mosaicking” operation, indicating the utilization of multiple intensity pixels at the input FOV to represent one complex-valued input data point. Similarly, we used a postprocessing step, which involved mapping the output FOV intensity patterns back to the complex number domain, which we termed the “demosaicking” operation. Through these mosaicking/demosaicking operations, we show that a spatially incoherent $D^{2} NN$ can be optimized to perform an arbitrary complex-valued linear transformation between its input and output apertures while providing optical information encryption. The presented spatially incoherent visual information processor, with its universality and thin form factor ( $< 100 \times λ$ ), shows significant promise for image encryption and computational imaging applications under natural light.

Sign up for Advanced Photonics Nexus TOC. Get the latest issue of Advanced Photonics Nexus delivered right to you！Sign up now

2 Results

Figure 1(a) outlines a spatially incoherent $D^{2} NN$ architecture to synthesize an arbitrary complex-valued linear transformation ( $A$ ) such that $o = A i$ , where the input is $i \in C^{N_{i}}$ , the target is $o \in C^{N_{o}}$ and $A \in C^{N_{o} \times N_{i}}$ . The mosaicking process involves finding the nonnegative (optical intensity-based) representation of each complex-valued element of $i$ using $E$ nonnegative values; here, $E$ bases, $e_{k}$ , $k = 0, \dots, E - 1$ [see Fig. 1(c)], are used for representing the intensity-based encoding of complex numbers. Based on this representation, the 2D input aperture of a spatially incoherent $D^{2} NN$ will have $E N_{i}$ nonnegative (optical intensity) values, denoted as $i_{r} \in R_{+}^{E N_{i}}$ , representing the input information under spatially incoherent illumination. The output intensity distribution, denoted with $\hat{o_{r}} \in R_{+}^{E N_{o}}$ , undergoes a demosaicking process where a complex number is synthesized from the intensity values of $E$ output pixels, yielding the complex output vector $\hat{o} \in C^{N_{o}}$ such that $\hat{o} \approx A i$ .

$(a) Complex-valued universal linear transformations using spatially incoherent diffractive optical networks. (b) Amplitude and phase of the target complex-valued linear transformation. (c) Mosaicking and demosaicking processes. (d)–(e) Image encryption. (d) Complex-valued images are digitally encrypted (A−1), and subsequently decrypted using the diffractive system that performs A (diffractive key). (e) The encryption is performed through the spatially incoherent diffractive network (diffractive lock), and the decryption is performed digitally (digital key).$

Figure 1.(a) Complex-valued universal linear transformations using spatially incoherent diffractive optical networks. (b) Amplitude and phase of the target complex-valued linear transformation. (c) Mosaicking and demosaicking processes. (d)–(e) Image encryption. (d) Complex-valued images are digitally encrypted ( $A^{- 1}$ ), and subsequently decrypted using the diffractive system that performs $A$ (diffractive key). (e) The encryption is performed through the spatially incoherent diffractive network (diffractive lock), and the decryption is performed digitally (digital key).

In our analyses, we used $E = 3$ , except in Fig. S5 in the Supplementary Material, where $E = 4$ results are shown for comparison. We chose the basis complex numbers as $e_{k} = \exp (j k \frac{2 π}{E})$ , $k = 0, \dots, E - 1$ such that the set of bases $S$ is closed under multiplication, and the product of any two of the bases in the set is also a basis; for example, for $E = 3$ we have $e_{k} e_{l} = e_{(k + l \mod 3)}$ . Based on this representation of information, with $E = 3$ and $e_{0}$ , $e_{1}, e_{2}$ , we can decompose any arbitrarily selected complex-valued transformation matrix $A$ into $E = 3$ matrices ( $A_{0}$ , $A_{1}$ , $A_{2}$ ) with real nonnegative entries such that $A = e_{0} A_{0} + e_{1} A_{1} + e_{2} A_{2} .$ (1)

For a given complex-valued input $i = e_{0} i_{0} + e_{1} i_{1} + e_{2} i_{2}$ , where $i_{k} \in R_{+}$ , the corresponding target output vector can be written as $o = A i = (e_{0} A_{0} + e_{1} A_{1} + e_{2} A_{2}) (e_{0} i_{0} + e_{1} i_{1} + e_{2} i_{2}),$ (2) $o = e_{0} (A_{0} i_{0} + A_{2} i_{1} + A_{1} i_{2}) + e_{1} (A_{1} i_{0} + A_{0} i_{1} + A_{2} i_{2}) + e_{2} (A_{2} i_{0} + A_{1} i_{1} + A_{0} i_{2}),$ (3)i.e., we have $o_{r} = [\begin{matrix} o_{0} \\ o_{1} \\ o_{2} \end{matrix}] = [\begin{matrix} A_{0} & A_{2} & A_{1} \\ A_{1} & A_{0} & A_{2} \\ A_{2} & A_{1} & A_{0} \end{matrix}] [\begin{matrix} i_{0} \\ i_{1} \\ i_{2} \end{matrix}] = A_{r} i_{r},$ (4)with a nonnegative real-valued matrix $A_{r}$ $A_{r} = [\begin{matrix} A_{0} & A_{2} & A_{1} \\ A_{1} & A_{0} & A_{2} \\ A_{2} & A_{1} & A_{0} \end{matrix}] .$ (5)

For $E = 4$ , where $e_{k} e_{l} = e_{(k + l \mod 4)}$ and $A = e_{0} A_{0} + e_{1} A_{1} + e_{2} A_{2} + e_{3} A_{3}$ , a similar analysis yields $A_{r} = [\begin{matrix} A_{0} & A_{2} & A_{3} & A_{1} \\ A_{2} & A_{0} & A_{1} & A_{3} \\ A_{1} & A_{3} & A_{0} & A_{2} \\ A_{3} & A_{1} & A_{2} & A_{0} \end{matrix}] .$ (6)

Based on these equations, one can conclude that to all-optically implement an arbitrary complex-valued transformation $o = A i$ using a spatially incoherent $D^{2} NN$ , the layers of the $D^{2} NN$ need to be optimized to perform an intensity linear transformation $A_{r} \in R_{+}^{E^{2} N_{i} N_{o}}$ such that $o_{r} = A_{r} i_{r}$ . The entire system, upon convergence, performs the predefined complex-valued linear transformation $A$ on any given input data using spatially incoherent light, based on Eqs. (2) and (4). In the following sections, we numerically explore the number of optimizable diffractive features ( $N$ ) needed for accurate approximation of $A$ using a spatially incoherent $D^{2} NN$ .

2.1 Complex-Valued Linear Transformations through Spatially Incoherent Diffractive Networks

We numerically demonstrated the capabilities of diffractive optical processors to universally perform any arbitrarily chosen complex-valued linear transformation with spatially incoherent light. Throughout the paper, we used $N_{i} = N_{o} = 16$ . To visually represent the data, we rearranged the 16-element vectors into $4 \times 4$ arrays of complex numbers, hereafter referred to as the “complex image.” We arbitrarily selected a desired complex-valued transformation, $A \in C^{16 \times 16}$ , as shown in Fig. 1(b).

To explore the number of diffractive features needed, we trained nine models with varying values of $N$ and evaluated the mean-squared-error (MSE) between the numerically measured ( $\hat{A_{r}}$ ) and the target all-optical linear transformation, $A_{r}$ (see Fig. 2). Our results, summarized in Fig. 2, highlight that with a sufficient number of optimizable diffractive features, i.e., $N \geq 2 E^{2} N_{i} N_{o} = 2 N_{i, r} N_{o, r}$ , our system achieves a negligible approximation error with respect to the target $A_{r} \in R_{+}^{48 \times 48}$ . In Fig. 2(c), we also visualize the resulting all-optical intensity transformation $\hat{A_{r}}$ compared to the ground truth $A_{r}$ . In essence, this comparison reveals the spatially varying incoherent point spread functions (PSFs) of our diffractive system optimized using deep learning; a negligible MSE between $\hat{A_{r}}$ and $A_{r}$ shows that the resulting spatially varying incoherent PSFs match the target set of PSFs dictated by $A_{r}$ .

$Performance of spatially incoherent diffractive networks on arbitrary complex-valued linear transformations. (a) The all-optical linear transformation error as a function of the number of diffractive features (N). The red dot represents the design corresponding to the results shown in (b)–(d). (b) The phase profiles of the K=4 diffractive layers of the optimized model (N=2×2Ni,rNo,r). (c) Evaluation of the resulting all-optical intensity transformation, i.e., the spatially varying PSFs. (d) The complex linear transformation evaluation. For εr and ε, |·|2 represents an element-wise operation.$

Figure 2.Performance of spatially incoherent diffractive networks on arbitrary complex-valued linear transformations. (a) The all-optical linear transformation error as a function of the number of diffractive features ( $N$ ). The red dot represents the design corresponding to the results shown in (b)–(d). (b) The phase profiles of the $K = 4$ diffractive layers of the optimized model ( $N = 2 \times 2 N_{i, r} N_{o, r}$ ). (c) Evaluation of the resulting all-optical intensity transformation, i.e., the spatially varying PSFs. (d) The complex linear transformation evaluation. For $ε_{r}$ and $ε$ , $| {\cdot |}^{2}$ represents an element-wise operation.

We also evaluated the numerical accuracy of our complex-valued transformation in an end-to-end manner, as illustrated in Fig. 2(d). For this numerical test, we sequentially set each entry of $i$ to $e_{0}$ , evaluated the corresponding complex output $\hat{o}$ , and stacked them to form $\hat{A_{0}}$ , where the subscript represents that the measurement was evaluated using the complex impulse along the basis $e_{0}$ as input. Then, we repeated this process for the other two bases to obtain $\hat{A_{1}}$ and $\hat{A_{2}}$ , and stacked these matrices as a block matrix $[\hat{A_{0}} | \hat{A_{1}} | \hat{A_{2}}]$ , shown in Fig. 2(d). Each row of the images $amp (\hat{o})$ and $phase (\hat{o})$ in Fig. 2(d) represents one of these complex output vectors, while the corresponding target vectors are presented in the same figure through $amp (o)$ and $phase (o)$ . The small magnitude of the error $ε = {| \hat{o} - o |}^{2}$ shown in Fig. 2(d) illustrates the success of this spatially incoherent $D^{2} NN$ model in accurately approximating the complex-valued linear transformation $o = A i$ , implemented for an arbitrarily selected $A$ .

2.2 Complex Number-based Image Encryption Using Spatially Incoherent Diffractive Networks

In this section, we demonstrate a complex number-based image encryption–decryption scheme using a spatially incoherent $D^{2} NN$ . In the first scheme, shown in Fig. 1(d), the message is encoded into a complex image, employing either amplitude and phase encoding or real and imaginary part encoding. Then, a digital lock encrypts the image by applying a linear transformation ( $A^{- 1}$ ) to conceal the original message within the image. At the optical receiver, the encrypted message is deciphered by an optimized incoherent $D^{2} NN$ that all-optically implements the inverse transformation $A$ . In an alternative scheme, as depicted in Fig. 1(e), the key and lock are switched, i.e., the spatially incoherent $D^{2} NN$ is used to encrypt the message with a complex-valued $A$ while the decryption step involves the digital inversion using $A^{- 1}$ .

For our analysis, we used the letters “U,” “C,” “L,” and “A” as sample messages. “U” and “C” are used in amplitude-phase-based encoding (Fig. 3), whereas “L” and “A” are used for real-imaginary-based encoding of information (Fig. S1 in the Supplementary Material), forming complex-number-based images. To accurately model the spatially incoherent propagation26 of light through the $D^{2} NN$ , we averaged the output intensities over a large number of $N_{φ} = 20,000$ of randomly generated 2D phase profiles at the input (see Sec. 4 for details).

Figure 3.Image encryption with the letters “U” and “C” encoded into amplitude and phase, respectively, of the complex-valued image. (a) The input, target, output, and the approximatn error, both in complex and real nonnegative (intensity) domains. The original information is represented by $o$ , while $i$ is obtained by digital encrypting $o$ following Fig. 1(d). (b) The input, output (resulting from optical encryption), and digitally decrypted output and the error between the input and the decrypted output. The result of digital decryption matches the input information. The second row shows the corresponding input, target, and output intensities and the approximation error. ${| \cdot |}^{2}$ represents an element-wise operation.

In Fig. 3(a), we show the results corresponding to digital encryption and optical diffractive decryption, i.e., the system shown in Fig. 1(d). The digitally encrypted complex information $i = A^{- 1} o$ , and its intensity representation $i_{r}$ , are shown in Fig. 3(a). The optically decrypted output $\hat{o}$ (through the spatially incoherent $D^{2} NN$ ) and its intensity-based representation $\hat{o_{r}}$ are shown in the same Fig. 3(a), together with the resulting error maps, i.e., ${| \hat{o} - o |}^{2}$ and ${| \hat{o_{r}} - o_{r} |}^{2}$ , which reveal a very small degree of error. This agreement of the recovered and the ground-truth messages in both the intensity and complex-valued domains confirms the accuracy of the diffractive decryption process through an optimized spatially incoherent $D^{2} NN$ . Figure 3(b) shows the successful performance of the sister scheme [Fig. 1(e)], which involves diffractive encryption through a spatially incoherent $D^{2} NN$ and digital decryption, also revealing a negligible amount of error in both ${| A^{- 1} \hat{o} - i |}^{2}$ and ${| \hat{o_{r}} - o_{r} |}^{2}$ . As reported in Fig. S1 in the Supplementary Material, we also conducted a numerical experiment using the letters “L” and “A,” encoded using the real and imaginary parts of the message. The visualizations are arranged the same way as in Fig. 3, where for both schemes depicted in Figs. 1(d) and 1(e), the degree of error between the recovered and the original messages is negligible, affirming the success of using the real and imaginary part-based encoding method. For the assessment of the approximation errors when the number of diffractive features is smaller, we compared the decryption performance of three models with different numbers of diffractive features/neurons, i.e., $N = (0.5, 0.7, 2) \times 2 E^{2} N_{i} N_{o}$ , for the same setup outlined in Fig. S1(a) in the Supplementary Material. The results are summarized in Fig. S2 in the Supplementary Material: for models with $N < 2 E^{2} N_{i} N_{o}$ , the decryption quality is compromised, exhibiting a pixel absolute error of $> 0.1$ . However, this error reduces to $< 0.05$ for $N = 4 E^{2} N_{i} N_{o}$ where the decrypted images display significantly enhanced contrast and reduced noise levels.

To further evaluate the efficacy of our encryption method, we analyzed the complex image entropy, examining both the real and imaginary components separately (refer to the Sec. 4 for details). The original image $i$ , $D^{2} NN$ encrypted output $\hat{o}$ , and the digitally encrypted output $A i$ , along with the corresponding image entropies, are shown in Fig. S3(a) in the Supplementary Material for two complex image examples. We repeated this analysis for a set of 1000 complex images with the resulting entropy distributions reported in Fig. S3(b) in the Supplementary Material. These results demonstrate that the entropy of the encrypted images is statistically higher than that of the original images. This increase in entropy signifies a heightened level of randomness within the encrypted images, thereby validating the effectiveness of our encryption process. In addition, the entropy distributions of the $D^{2} NN$ encrypted images show excellent agreement with the digitally encrypted corresponding images, further demonstrating the success of our spatially incoherent optical encryption scheme.

2.3 Different Mosaicking and Demosaicking Schemes in a Spatially Incoherent $D^{2} NN$

How we assign each element in the vector $i_{r}$ and $o_{r}$ to the pixels at the input and output FOVs of the diffractive network does not affect the final accuracy of the image/message reconstruction. For example, we can arrange the FOVs in such a manner that the components $i_{r, k}$ corresponding to a basis $e_{k}$ are assigned to the neighboring pixels, in two adjacent rows, as shown in Fig. S4(a) in the Supplementary Material; in an alternative implementation, the assignment/mapping can be completely arbitrary, which is equivalent to applying a random permutation operation on the input and output vectors (see Sec. 4). When compared to each other, these two approaches of mosaicking and demosaicking schemes show negligible differences in the error of the final reconstruction of the letters “U,” “C,” “L,” and “A” as shown in Fig. S4(b) in the Supplementary Material. These results underscore that the specific arrangement of the mosaicking/demosaicking schemes at the input and output FOVs does not impact the performance of the incoherent $D^{2} NN$ system.

3 Discussion and Conclusion

In this article, we employed a data-free PSF-based $D^{2} NN$ optimization method (see Sec. 4),26 since we can determine the nonnegative intensity transformation $A_{r}$ from the target complex-valued transformation $A$ based on the mosaicking and demosaicking schemes; the columns of $A_{r}$ represent the desired spatially varying PSFs of the $D^{2} NN$ . The advantage of this data-free learning-based $D^{2} NN$ optimization approach is that computationally demanding simulation of wave propagation with large $N_{φ}$ is not required during the training. Coherent propagation is appropriate for simulating the spatially varying PSFs, point by point, since a point emitter at the input aperture coherently interferes with itself during optical diffraction within a $D^{2} NN$ ; this approach makes the training time much shorter. On the other hand, this approach necessitates prior knowledge of $A_{r}$ , which might not always be available, e.g., for tasks such as data classification. An alternative to this data-free PSF-based optimization approach is to train the diffractive network in an end-to-end manner, using a data-driven direct training approach.26 This strategy advances by minimizing the differences between the outputs and the targets on a large number of randomly generated examples, thereby learning the spatially varying PSFs implicitly from numerous input-target intensity patterns corresponding to the desired task – instead of learning from an explicitly predetermined $A_{r}$ . This direct approach, however, requires a longer training time, necessitating the simulation of incoherent propagation for each training sample on a large data set.

In our presented approach, the choice of $E$ is not restricted to $E = 3$ , as we have used throughout the main text. As another example of encoding, we show the image encryption results with $E = 4$ in Fig. S5 in the Supplementary Material, where the four bases are $\exp (j \frac{π}{2} k) (k = 0,1, 2,3)$ . The reconstructed “U,” “C,” “L,” and “A” letters are also reported in the same figure, confirming that given sufficient degrees of freedom (with $N \geq 2 E^{2} N_{i} N_{o}$ ), the linear transformation performances are similar to each other. However, compared to $E = 3$ , this choice of $E = 4$ necessitates 4/3 times more pixels on both the diffractive network input and output FOVs—reducing the throughput (or spatial density) of complex-valued linear transformations that can be performed using a spatially incoherent $D^{2} NN$ . Accordingly, more diffractive features and a larger number of independent degrees of freedom (by 16/9-fold) are required within the $D^{2} NN$ volume to achieve an output performance level that is comparable to a design with $E = 3$ . Note that while $E \geq 3$ is sufficient to reconstruct the original complex-valued images regardless of the image complexity, the redundancy provided by larger $E$ values might offer increased resilience against noise at the cost of reducing the image-processing throughput (per input aperture area) with larger $E$ .

Our framework offers several flexibilities in implementation, which could be useful for different applications. First, the flexibility to arbitrarily permute the input and the output pixels following different mosaicking and demosaicking schemes (as introduced earlier in Sec. 2) could enhance the security of optical information transmission. A user would not be able to either spam or hack valuable information that is transferred optically without specific knowledge of the mosaicking and demosaicking schemes, thus ensuring the security of this scheme. Note that this enhancement in security is achieved without adding complexity to the system by just permuting the assignment of data elements to the pixels of the input and output devices, e.g., spatial light modulators (SLMs) and complementary metal-oxide-semiconductor (CMOS) detector-arrays. Second, the flexibility in choosing $E$ , as discussed above, could be useful in adding an extra layer of security against unauthorized access, albeit with a trade-off in system throughput that comes with larger $E$ . Furthermore, we can use different sets of bases for mosaicking and demosaicking operations by applying offset phase angles $θ_{i}$ and $θ_{o}$ , respectively, to the original bases $e_{k} = \exp (j k \frac{2 π}{E})$ , $k = 0, \dots, E - 1$ . This will result in a set of modified/encrypted bases: $e_{k, i} = \exp [j (k \frac{2 π}{E} + θ_{i})]$ for mosaicking and $e_{k, o} = \exp [j (k \frac{2 π}{E} + θ_{o})]$ for demosaicking. This powerful flexibility in representation further enhances the security of the system.

Regarding image encryption-related applications, we demonstrated two approaches [Figs. 1(d) and 1(e)] to utilize $D^{2} NNs$ for encryption or decryption. However, it is also possible to deploy a pair of diffractive systems in tandem, with one undertaking the matrix operation $A$ for encryption and the other undertaking the inverse operation $A^{- 1}$ for decryption. Furthermore, potential extensions of our work could explore a harmonized integration of polarization state controls32 and wavelength multiplexing33 to build a multifaceted, fortified encryption platform. In addition to increasing the data throughput, these additional degrees of freedom enabled by different illumination wavelengths and polarization states would further enhance the security of a diffractive processor-based system.

In this work, we focused on the numerical analysis of the presented concept. However, we should note that various $D^{2} NNs$ designed using deep-learning-based approaches have been experimentally validated over different parts of the electromagnetic spectrum, e.g., from terahertz (THz)9^,14 to near-infrared (NIR)15 and visible wavelengths,24 showing a good agreement between the numerical and experimental results. To address some of the experimental challenges associated with fabrication errors and mechanical misalignments, a “vaccination” strategy34^,35 has been devised. This approach enhances the robustness of the diffractive optical designs by incorporating such aberrations/imperfections as random variables during the training phase, thereby preparing the system to better withstand and adapt to the uncertainties inherent in real-world experimental conditions.

Although spatially coherent light is more suitable for complex-valued information processing in laboratory settings, the use of spatially incoherent light offers various practical advantages. For example, speckle noise, which is inevitable in coherent systems, can be suppressed by using partially or fully incoherent illumination. An additional benefit of spatially incoherent designs is the range of viable illumination sources that can be used: instead of using specialized coherent sources, a spatially incoherent system can work with standard light-emitting diodes (LEDs), or even under natural light, which is important for some applications of diffractive information processing.

To conclude, we demonstrated the capability of spatially incoherent diffractive networks to perform arbitrary complex-valued linear transformations. By incorporating various forms of mosaicking and demosaicking operations, we paved the way for a wider array of applications by leveraging incoherent $D^{2} NNs$ for complex-valued data processing. We also showcased potential applications of these spatially incoherent $D^{2} NNs$ for complex number-based image encryption or decryption, highlighting the security benefits arising from the system’s flexibility. Our exploration marks a significant stride toward enhanced versatility and robustness in optical information processing with spatially incoherent diffractive systems that can work under natural light.

4 Appendix: Methods

4.1 Linear Transformation Matrix

In this paper, we use $N_{i} = N_{o} = 16$ so that $A \in C^{16 \times 16}$ ; see Fig. 1(b). To generate $A$ , we randomly sample the amplitude of each element from the uniform distribution $U n i f o r m (0, 1)$ and the phases from $U n i f o r m (0, 2 π)$ . For the encryption application, to ensure that the result of inversion is not sensitive to small errors, we performed QR-factorization on $A$ to obtain a condition number of one.36

4.2 Real-Valued Nonnegative Representation of Complex Numbers

Following Eq. (4), the complex-valued input and target vectors $i \in C^{N_{i}}$ and $o \in C^{N_{o}}$ are represented by the corresponding real and nonnegative intensity vectors $i_{r} = [\begin{matrix} i_{0}^{T} & \dots & i_{E - 1}^{T} \end{matrix}]^{T} \in R_{+}^{E N_{i}}$ and $o_{r} = [\begin{matrix} o_{0}^{T} & \dots & o_{E - 1}^{T} \end{matrix}]^{T} \in R_{+}^{E N_{o}}$ , where $i = \sum_{k = 0}^{E - 1} e_{k} i_{k}$ and $o = \sum_{k = 0}^{E - 1} e_{k} o_{k}$ . The desired all-optical intensity transformation $A_{r}$ between $i_{r}$ and $o_{r}$ is derived from the target complex-valued linear transformation $A$ following Eqs. (1) and (5). We should note that deriving $A_{r}$ from $A$ requires mapping each complex element $a$ to its real and nonnegative representation $(a_{0}, \dots, a_{E - 1})$ based on the $E \geq 3$ complex bases $e_{k}$ such that $a = \sum_{k = 0}^{E - 1} e_{k} a_{k}$ . To define a unique mapping, we follow an algorithm29 by imposing additional constraints: $a_{k} = 0$ if $\frac{2 π}{E} \leq phase (e_{k} a^{*}) \leq 2 π - \frac{2 π}{E}$ , i.e., $a_{k} = 0$ if the angle between $a$ and $e_{k}$ is greater than $\frac{2 π}{E}$ ; here $a^{*}$ represents the complex conjugate of $a$ . The same constraints were also used while mapping the complex input vectors $i$ to the real and nonnegative intensity vectors $i_{r}$ .

4.3 Mosaicking and Demosaicking Schemes

For mosaicking (demosaicking) assignment of each element of $i_{r}$ ( $o_{r}$ ) to one of the $N_{i, r} = E N_{i}$ ( $N_{o, r} = E N_{o}$ ) pixels of the 2D input (output), the arrangement of the FOV can be regular, e.g., in a row-major order as shown in Fig. S4(a) in the Supplementary Material, “Regular mosaicking.” Alternatively, the pixel assignment on the input (output) FOV can follow any arbitrary mapping which can be defined by a permutation matrix $P_{i}$ ( $P_{o}$ ) operating on the input (output) vector; see Fig. S4(a) in the Supplementary Material, “Arbitrary mosaicking.” For such cases, when ordered in a row-major format, intensities on the input (output) FOVs $i_{r}$ ( $o_{r}$ ) can be written as $i_{r} = P_{i} [\begin{matrix} i_{0}^{T} & \dots & i_{E - 1}^{T} \end{matrix}]^{T}$ ( $o_{r} = P_{o} [\begin{matrix} o_{0}^{T} & \dots & o_{E - 1}^{T} \end{matrix}]^{T}$ ). Accordingly, such an arbitrary arrangement of pixels was accounted for by redefining the all-optical intensity transformation as $P_{o} A_{r} P_{i}^{T}$ .

4.4 Spatially Incoherent Light Propagation through a $D^{2} NN$

The 1D vector $i_{r}$ is rearranged as a 2D distribution of intensity $I (x, y)$ at the input FOV of the $D^{2} NN$ . To numerically model the spatially incoherent propagation of the input intensity distribution $I (x, y)$ through the $D^{2} NN$ , we coherently propagated the optical field $\sqrt{I} \exp (j φ)$ through the trainable diffractive surfaces to the output plane, where $φ$ is a random 2D phase distribution, i.e., $φ (x, y) \sim U n i f o r m (0, 2 π)$ for all $(x, y)$ . If we denote the coherent field propagation operator as $D {\cdot}$ (see Sec. 4.5), then the instantaneous output intensity is $| D {\sqrt{I (x, y)} \exp [j φ (x, y)]} |^{2}$ and the time-averaged output intensity $O (x, y)$ for spatially incoherent light can be written as $O (x, y) = ⟨ {| D {\sqrt{I (x, y)} \exp [j φ (x, y)] |}^{2} ⟩ .$ (7)

The average output intensity can be approximately calculated by repeating the coherent wave propagation $D {\cdot}$ $N_{φ}$ -times, each time with a different random phase distribution $φ_{r} (x, y)$ , and averaging the resulting $N_{φ}$ output intensities, $O (x, y) = \lim_{N_{φ} \to \infty} \frac{1}{N_{φ}} \sum_{r = 1}^{N_{φ}} {| D {\sqrt{I (x, y)} \exp [j φ_{r} (x, y)]} |}^{2} .$ (8)

We used $N_{φ} = 20,000$ for estimating the incoherent output intensity $O (x, y)$ corresponding to any arbitrary input intensity $I (x, y)$ . Note that when only one pixel at the input aperture is activated, with all other input pixels being inactive with zero intensity, as is the case while evaluating spatially varying PSFs, the application of Eq. (8) becomes redundant, although one could still use it. In this scenario, all the light diffracted from a single point source is mutually coherent. Consequently, for the purposes of evaluating the spatially varying PSFs of the system, as elaborated later in Sec. 4.7, employing a coherent propagation model for each point emitter at the input aperture is accurate and provides a faster solution.

4.5 Coherent Propagation of Optical Fields: $D {\cdot}$

The propagation of spatially coherent light patterns through a diffractive processor, denoted by $D {\cdot}$ , involves a series of interactions with consecutive diffractive surfaces, interleaved by wave propagation through the free space separating these surfaces. We assume that these modulations are introduced by phase-only diffractive surfaces, i.e., the field amplitude remains unchanged during the light–matter interaction. Specifically, we assume that a diffractive surface alters the incident optical field, symbolized as $u (x, y)$ , in a localized manner according to the optimized phase values $ϕ_{M} (x, y)$ of the diffractive features, resulting in the phase-modulated field $u (x, y) \exp [j φ_{M} (x, y)]$ . The diffractive surfaces are coupled by free-space propagation, allowing the light to travel from one surface to the next. We used the angular spectrum method to simulate the free-space propagation,37 $u (x, y; z = z_{0} + d) = F^{- 1} {F {u (x, y; z = z_{0})} \times H (f_{x}, f_{y}; d)},$ (9)where $F {\cdot}$ is the 2D Fourier transform and $F^{- 1} {\cdot}$ is its inverse operation. $H (f_{x}, f_{y}; d)$ is the free-space transfer function corresponding to propagation distance $d$ . For wavelength $λ$ , $H (f_{x}, f_{y}; d) = {\begin{cases} \exp [j \frac{2 π}{λ} d \sqrt{1 - {(λ f_{x})}^{2} - {(λ f_{y})}^{2}}], & f_{x}^{2} + f_{y}^{2} < 1 / λ^{2} \\ 0, & otherwise \end{cases} .$ (10)

The fields were discretized with a lateral sampling interval of $δ \approx 0.53 λ$ to accommodate all the propagating modes and sufficiently zero-padded to remove aliasing artifacts.38

4.6 Diffractive Network Architecture

We modeled the diffractive surfaces by their laterally discretized heights $h$ , which correspond to phase delays $φ_{M} = \frac{2 π}{λ} (n - 1) h$ , where $n$ is the refractive index of the material. The connectivity between consecutive diffractive layers9 was kept equal across the diffractive designs with varying $N$ by setting the separation between the layers as $d = \frac{W δ}{λ}$ , where the width of each diffractive layer is $W = \sqrt{\frac{N}{K}} δ$ . Here, $K$ is the number of diffractive layers; we used $K = 4$ throughout the paper.

4.7 Training and Evaluation of Spatially Incoherent Diffractive Processors

For performing an arbitrary complex-valued linear transformation with a diffractive processor, we used the PSF-based data-free design approach, where the diffractive features were optimized so that the all-optical intensity transformation of the diffractive processor achieves $\hat{A_{r}} \approx A_{r}$ . To evaluate $\hat{A_{r}}$ , we used $E N_{i}$ intensity vectors ${i_{r, t}}_{t = 1}^{E N_{i}}$ , where $i_{r, t} [l] = 1$ if $l = t$ and 0 otherwise. In other words, ${i_{r, t}}_{t = 1}^{E N_{i}}$ are unit impulses, located at different input pixels. We simulated the all-optical output intensity vectors ${o_{r, t}^{'}}_{t = 1}^{E N_{i}}$ corresponding to these unit impulses and stacked them, i.e., $\hat{A_{r}^{'}} = [o_{r, 1}^{'} | o_{r, 2}^{'} | \dots | o_{r, E N_{i}}^{'}] .$ (11)

Finally, we compensated for the optical diffraction efficiency-related scale mismatch through multiplication by a scalar, i.e., $\hat{A_{r}} = σ \hat{A_{r}^{'}},$ (12)where $σ$ was defined as $σ = \frac{\sum_{n = 1}^{E N_{i}} \sum_{m = 1}^{E N_{o}} A_{r} [m, n] \hat{A_{r}^{'}} [m, n]}{\sum_{n = 1}^{E N_{i}} \sum_{m = 1}^{E N_{o}} {(\hat{A_{r}^{'}} [m, n])}^{2}} .$ (13)

The MSE loss function to be minimized was defined as $L_{PSF} = \frac{1}{N_{i} N_{o}} \sum_{n = 1}^{E N_{i}} \sum_{m = 1}^{E N_{o}} {(A_{r} [m, n] - \hat{A_{r}} [m, n])}^{2} .$ (14)

The height $h$ of the diffractive features at each layer was constrained between zero and a maximum value $h_{\max}$ by employing a latent variable $h_{latent}$ . The relationship between the constrained height $h$ and the latent variable $h_{\max}$ was defined as $h = \frac{h_{\max}}{2} \times [\sin (h_{latent}) + 1]$ , where we chose $h_{\max} \approx \frac{λ}{n - 1}$ , which corresponds to a differential phase modulation of $2 π$ . The latent variables were initialized randomly from the standard normal distribution $N (0,1)$ .

The optimization of the diffractive layers was carried out using the AdamW optimizer39 for 12,000 iterations, with an initial learning rate of $10^{- 3}$ . The model state corresponding to the minimum of the MSEs evaluated after every 400 iterations was selected for the final evaluation. The $D^{2} NN$ models were implemented and trained using PyTorch (v1.12.1)40 with Compute Unified Device Architecture (CUDA) version 12.2. Training and testing were done on GeForce RTX 3090 graphics processing units (GPUs) in workstations with 256 GB of random-access memory (RAM) and Intel Core i9 central processing unit (CPU). The training time of the models varied with the size of the models. For example, the model used in Figs. 2(b) and 2(c) took around 1 h for 12,000 iterations. Inference for each input vector with $N_{φ} = 20,000$ takes around 30 s.

To visualize the all-optical transformation error in Fig. 2, we used the error matrix $ε_{r} = {(A_{r} - \hat{A_{r}})}^{2}$ ; here $(\cdot)^{2}$ denotes an element-wise operation. To evaluate the error $ε$ at complex linear transformation, we applied demosaicking to the columns of $\hat{A_{r}}$ to form the block matrix $[\hat{A_{0}} | \dots | \hat{A_{E - 1}}] \in C^{N_{o} \times {E N}_{i}}$ . Here, the subscript $k$ represents that $\hat{A_{k}}$ is measured by applying the columns of $e_{k} I$ as input and stacking the corresponding demosaicked (complex-valued) output vectors. Accordingly, we have $ε = {| [e_{0} A - \hat{A_{0}} | \dots | e_{E - 1} A - \hat{A_{E - 1}}] |}^{2} .$ (15)

Here, $| {\cdot |}^{2}$ represents an element-wise operation.

4.8 Entropy Evaluation

For the evaluation of the image encryption strength, we computed the entropy separately for the real and imaginary parts of a complex image as follows: $H^{Re / Im} (x) = - \sum_{i} p_{i}^{Re / Im} \cdot \log (p_{i}^{Re / Im}),$ (16)where we calculated the distribution of either the real or imaginary part (denoted by the superscript) over the pixels of $x$ ; here $x$ denotes the complex image. $p_{i}$ is the probability (normalized histogram count) for a certain pixel value $i$ .

For the histograms presented in Fig. S3(b) in the Supplementary Material, the data set is adapted from the Extended MNIST (EMNIST).41 For the creation of the input complex images, we randomly selected two distinct images from the EMNIST data set, using one as the real part and the other as the imaginary part of the complex image. To ensure compatibility with the input dimensionality, these images were bilinearly downsampled to a resolution of $4 \times 4 pixels$ . We randomly formed a set of 1000 such complex images to compile the histograms presented in Fig. S3(b) in the Supplementary Material.