
- Journal of Electronic Science and Technology
- Vol. 22, Issue 2, 100250 (2024)
Abstract
1 Introduction
An important aspect of modern power inspection is the state detection of equipment in transmission lines. The traditional edge detection technology can not accurately detect the overlapping state of complex insulator images, the insulator detection based on deep learning is more efficient, but deep learning needs a certain number of data sets as the training basis, to make accurate judgments on the status of samples in the actual scene. The data in the field of power grid transmission lines is often in high-risk scenarios, so it is difficult to obtain data and the data amount is low. Data augmentation based on the traditional generative adversarial network can generate nearly real samples by learning from a large number of samples, but the traditional algorithm has too large requirements on data volume, and it is difficult to meet the data augmentation requirements of insulators. The enhancement scheme based on the cycle generative adversarial network (Cycle-GAN) has low requirements on the initial data. Combined with the characteristics of insulator data, it can expand the corresponding types of data through mutual conversion, so it is very suitable for the insulator data enhancement task.
When converting images with different data distributions (such as oil painting and landscape map), the traditional Cycle-GAN can quickly realize the conversion in a few batches of training, because the distribution of the two image data sets is simple, and the conversion process does not need to consider too many target details in the image, but can learn the distribution law from the entire image sample. When applied to insulator data sets, the traditional generator is no longer suitable in order to preserve the complex background information of the insulator.
This paper mainly studies the insulator image augmenting method based on the generative adversarial network (GAN) in the transmission environment. Through the research of existing image sample augmenting methods, it is found that most of the augmenting methods still have many problems when applied to insulator samples, such as poor sample quality and sample feature information contrary to reality. In addition, the instability of GAN and the uncontrollability of the generated images also need to be solved. Therefore, this paper carries out the following work around these issues: An attention mechanism based on the Cycle-GAN structure combined with a channel attention mechanism and self-attention mechanism is designed. The attention monitoring generator is used to process the sample region, preserve greatly the background truth of the sample, and generate images that are consistent with the distribution of insulator samples in the real environment.
2 Related works
2.1 Data augmentation review
The traditional image data generation is generally carried out on a single image, its operations include flipping, cropping, erasing, blurring, adding noise, scaling, affine transformation, histogram equalization, and wavelet transformation. The blending generation is mainly divided into two areas, pixel-level blending and layer blending. The expansion of the learning data distribution has been developed after the birth of the generative model, with the generative adversarial network model as the main development line. For pixel level mixing, Tokozume et al. proposed a method for applying linear mixing between simple sound clip class learning (BC learning) [1] to the image domain. This method has improved the image classification performance. After improving the accuracy of the classification task, Tokozume et al. extended the fusion idea to the field of images. Layer blending is the overlaying of images in the form of slabs and is often used in conjunction with augmentation methods such as random cropping. The random image cropping and patching (RICAP) proposed by Takahashi et al. is layer mixing, which is verified on the Canadian Institute for Advanced Research 10/100 (CIFAR10/100) [2]. In CIFAR-10, the classification was reduced from 3.89% to 2.95%. In CIFAR-100, the classification error rate was reduced from 18.86% to 17.45%.
2.2 Generate adversarial network data augmentation
The augmentation application based on the GAN [3] benefits from the development of GAN. In the early days, Antoniou et al proposed data augmentation GAN (DAGAN) to solve the problem of sparse data sets [4]. By learning the distribution of real data through a small number of real samples, the generator generates fake data matching the distribution and verifies the validity of augmented data on multiple public data sets. Among them, the classification accuracy of the Omniglot data set increased by 13% to 82% [5], and the classification task accuracy of the extended mixed national institute of standards and technology (EMNIST) data set was increased by 2.1% to 76.1% from 74% [6]. Accuracy was improved from 0.5% to 97.4% on the Omniglot dataset and from 1.8% to 61.3% on the EMNIST dataset. In practical applications, Frid-Adar et al. [7] proposed the application of the deep convolutional GAN (DCGAN) to the task of data set expansion of liver computed tomography (CT) images. A small number of real samples were used as the learning basis, a large number of labeled samples were synthesized, and the new data was obtained for verification. The sensitivity and specificity of the test dataset were increased by 7.1% and 4%, respectively. Han et al. [8] generated the tumor image data set of brain CT images on the basis of the progressively growing GAN (PG-GAN) framework, trained and tested a series of lesion CT samples such as cysts and thrombosis, and used the You Only Look Once Version 3 (YOLOv3) [9] target detection algorithm to verify the generated data, resulting in a 3% increase in mean average precision (mAP) index and a 9.9% increase in sensitivity index. In addition, Zhu et al. [10] proposed the use of Cycle-GAN to augment the data of the expression image dataset and convert the training data into samples of other categories of the expression through the generator, thus providing feasibility for the data generation when the dataset of the same category lacks a few samples.
3 Method
3.1 Generator design
The method used in this paper is based on Cycle-GAN [11–13]. The selection of Cycle-GAN takes images as the input, starts from the perspective of the real data, converts according to real characteristics, retains complex background environment, learns the difference of insulator samples in two distributions, and masters the main insulator structural characteristics in the images. Gamut transformation within the structural features of interest is achieved through a generator structure designed by this method, which incorporates two expansion modules used to generate the attention focus area during the conversion process. The input image first passes through two attention channels A and B. The two modules adopt two different attention mechanisms: The channel attention mechanism and the self-attention mechanism. The self-attention mechanism has a good performance in the correlation of global features, and the channel attention mechanism has a good performance in the correlation degree between channels. After two attention networks are connected in parallel to generate their own attention graphs, a convolutional layer with shared parameters is transformed into a black-and-white distribution map containing the attention map. The image overlaps with the original input image in the form of occlusion to preserve the regions of interest learned by the attention network in the attention distribution map. The superimposed graph is used as the input for generation, and after the generator conversion, it is superimposed with the less weighted part of the attention layer learned before, to generate a new sample with the background information of the image kept high, and only the information of the main insulator is transformed. The network structure integrated with the attention module is shown in Fig. 1.
Figure 1.Structure of attention Cycle-GAN.
3.1.1 Fusion attention module
In the attention module, the channel attention mechanism and self-attention mechanism are adopted at the same time. The input of module A and module B uniformly uses the convolution eigenmatrix with dimension
Figure 2.Attention module structure diagram.
In the dual attention module, channel attention will convolve the input feature convolved eigenmatrix
3.1.2 Channel attention mechanism
The representative model of the channel attention mechanism is squeeze-and-excitation networks (SENet) [14]. By learning the samples, the model can automatically obtain the importance of different feature channels from the distribution law of the samples, and then enhance the influence of the features with the greater weight according to the learned importance, and at the same time suppress the influence of the features with the less effect in the current training task. Its function is to provide the network with the interdependence between the feature vectors convolution between different channels, so that the neural network segment adding the channel attention mechanism has the function of suppressing the secondary information and enhancing the main information. In the feature dimension, common convolution operations will fuse the features of all channels by default. The attention structure of channels is shown in Fig. 3.
Figure 3.Channel attention network diagram.
In the channel attention module above,
where the * in the formula represents the convolution operation.
where Fsq stands for global average pooling,
After data features are compressed, it is still necessary to use the gathered information to stimulate the subsequent learning process, so that the channel dependencies can be fully fitted. After the features of one-dimensional C channels are obtained by the compression method, the weight of the influence degree of each channel is learned through a fully connected layer, and the weight of each channel is obtained, and then applied to the corresponding path of input features, as shown in the formula:
where σ represents the sigmoid function. Fex represents a weight value for each feature channel. ReLU() is a commonly used activation function in artificial neural networks.
3.1.3 Self-attention mechanism
In order to achieve the conversion balance between the main insulator feature region and the background region, a self-attention mechanism is introduced. The self-attention implementation architecture is shown in Fig. 2. Input the same vector containing image information into two different feature converters, such as f, g, and h in Fig. 4. The area of attention is calculated by f(x), g(x), and h(x). The three conversion modules are all 1×1 convolution, the difference is that the number of channels of the three paths is inconsistent, and in transformer, they are respectively called query, key, and value. Since the convolution operation sets parameters such as step size and convolution kernel size, the index of three convolutions can reduce the number of image channels. In the past, the activation function was usually added at the end of this process, so it introduces more nonlinear transformations and enhances the expression ability of the neural network for the nonlinear distribution.
Figure 4.Self-attention network diagram.
3.1.4 Attention loss function
In the structure of Fig. 1, the attention module is represented by A, and the input is represented by x. When x is the input of A, the output is A(x) with the same size as the original input, and the dimension is only 1, and the value is between 0 and 1. The attention module generates an attention layer A(x) by assigning more weight to areas of interest based on the difference in the distribution of the given data while reducing the weight of areas of interest. In the other branch, the generator G converts the region of interest based on the input image x and the temporary sample generated by the overlay of the attention layer A(x) as the input, and the final image is generated as
where ⨀ represents the multiplication operator element by element, and the mapping F representation is introduced on the other side of the model, so that when the transformed image returns to the original space, its distribution still belongs to the original domain, as shown in the formula:
The expression of the F map is as follows:
In other conversion networks, the generator G converts the entire image to the target domain, and then the generator F restores the converted sample to the source domain. As a result, the background of the generated image will appear very unreal, which is very different from the background of the original image. Moreover, there is no recognition at all, and the cyclic consistency loss is difficult to reach 0. In the method proposed in this paper, the generator is converted under the constraint of the attention image, the input image has the distinction between the concentration area and the non-concentration area, and the non-concentration area is retained, so that the cyclic consistency loss of the output sample in the background part is directly 0, and only the attention area is focused.
The training process is similar to the cyclic consistency network. The input into the attention module predicts the attention distribution
Similarly, for data Y from the y domain, the mapping of the attention module should also satisfy cyclic consistency, as shown in the following formula:
In order to achieve the description of the above formula, this paper sets the cyclic consistency loss function as
where
On the basis of cyclic consistency, the attention network is also expected to focus on small areas related to the main features rather than the entire image, so as to avoid the failure of the attention module. Therefore, the sparse loss is introduced as shown
3.2 Defect insulator generation network based on transfer learning
The parameters in the network based on intact insulator training also have certain applicability to defective insulators. The distribution in defective samples and non-defective samples has a common subset to a large extent, and the number of defective insulator samples is very scarce, so the difference of the data distribution in the two domains is inevitable. Therefore, based on the difference between the two domains and the scarcity of defect samples, a feature regeneration module and a transfer discriminator compensation module are designed to carry out transfer learning on the parameters of the trained insulator conversion network so that the network cannot only transform different types of samples, but also can generate local defects according to the attention distribution diagram of the insulator during the conversion process. To solve the problem of lack of defective insulator samples, the transfer network structure combined with prior knowledge is shown in the Fig. 5.
Figure 5.Transfer training GAN structure diagram
In Fig. 5, the compensation module is used to add random local noise to the attention distribution map generated by the attention module to disturb the area of interest of the input sample and disrupt the local features of the input image. The disrupted local feature part is regenerated to meet the feature distribution of the defect position of the defective insulator. The discriminator Dx or Dy of the previous stage is still retained in the network. During the training process, Dx and Dy are no longer updated, but only the new discriminator D1 is updated. The retention of Gy and Dy is to constrain the compensated generator module so that the generation ability of the compensation module will not affect Gx or Gy that has been trained in the previous stage. The generators Gx and Gy can still convert the sample category, and they will no longer update the parameters.
4 Experiment
To verify the feasibility of the insulator sample conversion method mentioned above and to assess the effect of sample generation, a software platform was built in section 3. This platform collected insulator data from a specific transmission line scene. The effectiveness of the algorithm by comparing the samples was proved. In this experiment, GPU acceleration is required, and the data running platform is shown in Table 1.
Device name | Model number | Quantity |
CPU | i7-12700k | 1 |
GPU | RTX3080 12G | 1 |
Mainboard | ROG STRIX Z690-E | 1 |
RAM | DDR5 16G | 2 |
Hard disk | m2 Solid state drive 1T | 1 |
Table 1. List of experimental equipment.
4.1 Insulator sample conversion experiment
For different types of data above, the experiment sets corresponding conversion experiments and compares the peak signal-to-noise ratio (PSNR) and the structure similarity index measure (SSIM) indexes of the converted data with the real data in the corresponding target domain with Cycle-GAN [15] and Distance GAN [16]. In this experiment, 1000 samples of different kinds were used for mutual conversion training. The detailed results are shown in Table 2.
index | Quest | Cycle-GAN | Distance GAN | Ours |
PSNR | Glass→Ceramic | 18.224 | 12.139 | 24.543 |
Ceramic→Glass | 18.190 | 11.940 | 23.939 | |
Glass→Composite | 17.935 | 211.940 | 23.446 | |
Composite→Glass | 17.990 | 12.7262 | 23.105 | |
SSIM | Glass→Ceramic | 0.687 | 0.263 | 0.938 |
Ceramic→Glass | 0.703 | 0.291 | 0.923 | |
Glass→Composite | 0.683 | 0.278 | 0.921 | |
Composite→Glass | 0.690 | 0.280 | 0.913 |
Table 2. Comparison of sample conversion indicators of Cycle-GAN insulators based on the attention mechanism.
In Table 2, PSNR represents PSNR, and the larger the value, the better the image quality. From Table 2, the index of the sample transformed by the model design in this paper is significantly better than Cycle-GAN and Distance GAN. SSIM represents the structural similarity and measures how similar two images are. The obtained value ranges between [0, 1]. The higher the similarity, the more inclined the value is to 1, and otherwise, the value is to 0. According to the model structure in this paper, in the task of insulator sample conversion, both the quality of sample conversion and the degree of background information retained after sample conversion are obviously superior to Cycle-GAN and Distance GAN.
4.2 Generate image effect comparison
In this paper, on the experimental data of transmission line insulators collected in mountainous areas, the conversion augmentation of different types of insulators is compared. The overall effect diagram is Fig. 6.
Figure 6.Example of converting glass insulators into composite insulators.
As shown in Fig. 6, in the training of glass conversion to the insulator, insulator samples can achieve accurate conversion to specific targets and focus some distribution of attention in the background area to analyze the data samples. The possible reason is that the background features of the two data sets are too similar so the attention network becomes the main feature to recognize the background region in the learning process. Another reason is that the composite insulator image sample pixels can reduce the weight of background region recognition by appropriately adding samples with different background styles to the network.
The analysis is carried out by the conversion test of the composite insulator to the glass insulator in Fig. 7. When the insulator sample has a background with a large area similar to the color gamut and structure of the insulator, the attention network will also recognize the background as the insulator. The gamut of the two samples is too close, and the background also has a hierarchical feature that resembles an insulator, which is not the case in other scenes.
Figure 7.Example of converting composite insulators into glass insulators.
Fig. 8 renders the ceramic insulator converted into a glass insulator. The color gamut features have been well converted, but the gloss of the glass insulator is a detail that cannot be generated by the current network model. For samples with more reflective light, such as the first line of sample display diagram Fig. 8. The details of glass insulators are difficult to be generated with existing models. In the learning of the attention module, the attention module can accurately locate the shape features of the ceramic insulator. However, when the edge of the insulator is highly similar to the edge of the background in a small number of samples, the recognition area will spread to the background adjacent to the insulator. With the current conversion effect, the number of glass insulators is far more than that of ceramic insulators, and the number of ceramic insulators can be further expanded by converting the data volume of glass insulators to ceramic insulators.
Figure 8.Example of converting ceramic insulators into glass insulators
In the glass-to-ceramic insulator training shown in Fig. 9, the insulator sample can achieve accurate conversion to a specific target. However, some distribution of attention is concentrated in the background region, and the data sample is analyzed. The possible reason is that the background features of the two data sets are too similar, so the attention network has become the main feature to recognize the background region in the learning process. Another reason is that the composite insulator image sample pixels can reduce the weight of background region recognition by appropriately adding samples with different background styles to the network.
Figure 9.Example of converting glass insulators into ceramic insulators
4.3 Insulator defect sample generation experiment
The glass insulator is generated with the defect composite insulator and its reverse generation in the defect sample generation experiment. The glass insulator is generated with the defect ceramic insulator and its reverse generation are experimentally verified. In an experiment using 2000 real samples, the generation experiment was conducted respectively, and the sample was evaluated with the real data set sample, SSIM, PSNR, and the Fréchet inception distance (FID) parameters.
By comparing the indicators in Table 3, the performance of the migration generation model designed in this paper is significantly better than that of deep convolutional GAN [17] and Wasserstein GAN with gradient penalty (WGAN–GP )[18,19] in the evaluation parameter FID of the distribution of generated samples and real samples. According to the analysis of the signal-to-noise ratio index and the structural similarity index, the attention network and the insulator conversion module trained by prior knowledge also play a positive role in ensuring structural similarity and image quality during the generation of this method. In this paper, the PSNR and SSIM indicators can be maintained in the transformation network model to achieve the indicators.
Index | DCGAN | WGAN-GP | Ours |
FID | 25.33 | 18.78 | 12.54 |
PSNR | 15.46 | 19.12 | 23.76 |
SSIM | 0.58 | 0.67 | 0.92 |
Table 3. Comparison of defect sample indicators based on background PSNR, SSIM, and FID.
As shown in Fig. 10, in the process of mutual generation of glass and ceramic insulators, the attention network can identify and position the insulator accurately. There are defects in the process of generating glass insulators from composite insulators, which may be caused by a single sample of composite insulators being used in the training process and high background repetition. The background complexity can be increased by converting other samples to the composite insulator sample set appropriately to improve the network generalization ability.
Figure 10.Example of defect sample generation.
5 Conclusions
To solve the problem of difficulty in obtaining insulator data in the field of the power grid, this paper proposes an attention mechanism assisted method for generating adversarial network transformation samples. This method can fully utilize the existing dataset and provide a solution direction for the expansion of insulator data. Verified by the corresponding comprehensive index of data, it provides a favorable reference for expanding the insulator data of transmission lines. However, the current method cannot completely solve the problem of insulator data in the field of the power grid. The generation of data still has flaws when the color gamut is similar. Subsequent research can consider combining edge detection techniques to obtain features and generate new samples.
Disclosures
The authors declare no conflicts of interest.
References
[1] Y. Tokozume, Y. Ushiku, T. Harada, Learning from betweenclass examples f deep sound recognition, in: Proc. of the 6th Intl. Conf. on Learning Representations, Vancouver, Canada, 2018, pp. 1–13.
[2] R. Takahashi, T. Matsubara, K. Uehara, RICAP: Rom image cropping patching data augmentation f deep CNNs, in: Proc. of the 10th Asian Conf. on Machine Learning, Beijing, China, 2018, pp. 786–798.
[3] I.J. Goodfellow, J. PougetAbadie, M. Mirza, et al., Generative adversarial s, in: Proc. of the 27th Intl. Conf. on Neural Infmation Processing Systems, Montreal, Canada, 2014, pp. 2672–2680.
[4] A. Antoniou, A. Stkey, H. Edwards, Data augmentation generative adversarial wks [Online]. Available, https:arxiv.gabs1711.04340, November 2017.
[6] A. Radfd, L. Metz, S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial wks, in: Proc. of the 4th Intl. Conf. on Learning Representations, San Juan, America, 2016, pp. 1–15.
[7] M. FridAdar, E. Klang, M. Amitai, J. Goldberger, H. Greenspan, Synthetic data augmentation using GAN f improved liver lesion classification, in: Proc. of the 15th IEEE Intl. Symposium on Biomedical Imaging, Washington, America, 2018, pp. 289–293.
[8] C. Han, K. Murao, T. Noguchi, et al., Learning me with less: Conditional PGGANbased data augmentation f brain metastases detection using highlyrough annotation on MR images, in: Proc. of the 28th ACM International Conference on Infmation Knowledge Management (CIKM ''19). Association f Computing Machinery, Beijing, China, 2019, pp. 119–127.
[9] J. Redmon, A. Farhadi, YOLOv3: An incremental improvement [Online]. Available, https:arxiv.gabs1804.02767, April 2018.
[10] X.Y. Zhu, Y.F. Liu, J.H. Li, T. Wan, Z.C. Qin, Emotion classification with data augmentation using generative adversarial wk, in: Proc. of the 22nd PacificAsia Conf. on Knowledge Discovery Data Mining, Melbourne, Australia, 2018, pp. 349–360.
[11] J.Y. Zhu, T. Park, P. Isola, A.A. Efros, Unpaired imagetoimage translation using cycleconsistent adversarial wks, in: Proc. of IEEE Intl. Conf. on Computer Vision, Venice, Italy, 2017, pp. 2242–2251.
[12] T. Kim, M. Cha, H. Kim, J.K. Lee, J. Kim, Learning to discover crossdomain relations with generative adversarial wks, in: Proc. of the 34th Intl. Conf. on Machine Learning, Sydney, Australia, 2017, pp. 1857–1865.
[13] Z.L. Yi, H. Zhang, P. Tan, M.L. Gong, DualGAN: Unsupervised dual learning f imagetoimage translation, in: Proc. of IEEE Intl. Conf. on Computer Vision, Venice, Italy, 2017, pp. 2868–2876.
[15] Z. Liang, J.X. Huang, CycleGAN with dynamic criterion f malaria blood cell image synthetization, in: Proc. of AMIA Jt Summits Transl Sci Proc, Online, 2022, pp. 323–330.
[16] S. Benaim, L. Wolf, Onesided unsupervised domain mapping, in: Proc. of the 31st Intl. Conf. on Neural Infmation Processing Systems, Long Beach, America, 2017, pp. 752–762.
[17] B. Liu, J. Lv, X. Fan, et al., Application of an improved DCGAN f image generation, Mobile Infmation Systems 2022 (July 2022) 1–14.
[18] M. Arjovsky, S. Chintala, L. Bottou, Wasserstein generative adversarial wks, in: Proc. of the 34th Intl. Conf. on Machine Learning, Sydney, Australia, 2017, pp. 214–223.
[19] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. Courville, Improved training of wasserstein GANs, in: Proc. of the 31st Intl. Conf. on Neural Infmation Processing Systems, Long Beach, America, 2017, pp. 5769–5779.

Set citation alerts for the article
Please enter your email address