Chinese named entity recognition with multi-network fusion of multi-scale lexical information

Yan Guo; Hong-Chen Liu; Fu-Jiang Liu; Wei-Hua Lin; Quan-Sen Shao; Jun-Shun Su

doi:10.1016/j.jnlest.2024.100287

Journals >Journal of Electronic Science and Technology >Volume 22 >Issue 4 >Page 100287 > Article

Journal of Electronic Science and Technology
Vol. 22, Issue 4, 100287 (2024)

Chinese named entity recognition with multi-network fusion of multi-scale lexical information

Yan Guo¹, Hong-Chen Liu¹, Fu-Jiang Liu^2,*, Wei-Hua Lin²..., Quan-Sen Shao¹ and Jun-Shun Su³|Show fewer author(s)

Author Affiliations

¹School of Computer Science, China University of Geosciences, Wuhan, 430078, China

²School of Geography and Information Engineering, China University of Geosciences, Wuhan, 430078, China

³Xining Comprehensive Natural Resources Survey Centre, China Geological Survey (CGS), Xining, 810000, China

show less

DOI: 10.1016/j.jnlest.2024.100287 Cite this Article

Yan Guo, Hong-Chen Liu, Fu-Jiang Liu, Wei-Hua Lin, Quan-Sen Shao, Jun-Shun Su. Chinese named entity recognition with multi-network fusion of multi-scale lexical information[J]. Journal of Electronic Science and Technology, 2024, 22(4): 100287 Copy Citation Text

EndNote(RIS)

BibTex

Plain Text

show less

Fig. 1. An example of nested entities.

Download full size | View in the Article

Fig. 2. An example of an ambiguous entity word.

Download full size | View in the Article

Fig. 3. Comprehensive architecture of the BCWC model, consisting of four main layers: Embedding layer, feature extraction layer, feature fusion layer, and CRF layer. The embedding layer incorporates BERT and word embedding models for character and word embeddings, respectively. In the feature extraction layer, varied colored dashed boxes represent the range of word sequences captured by convolutional kernels of different scale sizes. The outcomes of these convolutions are seamlessly concatenated to form the final output. Subsequently, the feature fusion layer employs a multi-head attention mechanism for word weighting, culminating in the transmission of the output to the CRF layer for decoding. This layered structure ensures a comprehensive and efficient approach to information processing in the BCWC model.

Download full size | View in the Article

Fig. 4. Multi-scale IDCNN process diagram. The input consists of word embedding vectors. The diagram showcases two different scale iterative dilated convolution blocks. When , a regular convolution kernel is used, and when , the dilated convolution is employed. Each convolution block comprises two stacked layers. The results are concatenated and activated using the ReLU function, followed by normalization with LayerNorm. The outputs from different scales are concatenated to obtain the final output.

Download full size | View in the Article

Fig. 5. [in Chinese]

Download full size | View in the Article

Fig. 6. Correctly identified named entities of the sample.

Download full size | View in the Article

Fig. 7. Tokenization and multi-scale convolution kernel capturing process.

Download full size | View in the Article

Fig. 8. Entity recognition under the same information capture window size.

Download full size | View in the Article

Fig. 9. Results of different models on the three datasets.

Download full size | View in the Article

Fig. 10. Quantitative analysis of different structures: Results of IDCNN (a) with different structures on various datasets and (b) with the same structure but different convolutional kernel sizes on various datasets. In the notation m×n, m represents the number of iterative layers and n represents the size of the DCNN blocks.

Download full size | View in the Article

Fig. 11. Comparison of different structures on three datasets.

Download full size | View in the Article

Hyperparameter	Value
RNN dimension	64
BERT learning rate	2×10^–5
BERT dropout rate	0.35
RNN(CNN) learning rate	1×10^–3
Kernel size	3
Boundary embedding dimension	16
BERT model	BERT-base-Chinese
Epoch	30
Optimizer	AdamW
Multi-head attention head	8

Table 1. Some common hyperparameter settings about the experiments.

View in the Article

Dataset	Training	Evaluation	Test	Category	Sample	Character
CLUENER	10.74K	1.34K	1.34K	10	13.00K	503K
Weibo	1.35K	0.27K	0.27K	8	1.89K	103K
Youku	8.00K	1.00K	1.00K	9	10.00K	170K

Table 1. Statistics of datasets.

View in the Article

Dataset	max_seq_len	Batch size	max_word_len
CLUENER	128	32	25
Weibo	64	16	20
Youku	128	16	20

Table 2. Some hyperparameter settings about the experiments on the three datasets.

View in the Article

Model	Precision (%)	Recall (%)	F₁-score (%)
BiLSTM-CRF (2020)	71.06	68.97	70.00 [32]
stkBiGRU-CRF (2021)	73.41	70.02	71.68
TextCNN+CRF (2022)	75.07	69.78	72.12
Lattice-LSTM (2018)	74.13	73.84	74.41 [44]
BERT-CRF (2019)	76.47	77.01	76.73
BERT (2018)	77.24	80.46	78.82 [32]
ALBERT (2019)	78.96	64.58	71.05
ALBERT-BiLSTM (2020)	77.25	81.28	79.22
RoBERTa-WWM-BiLSTM-CRF (2021)	78.81	80.82	79.80
DSpERT (2023)	78.24	79.82	79.02
Ours	80.29	79.43	79.86

Table 2. Performance on CLUENER.

View in the Article

Environment	Value
Operating system	Windows 11
Processor	12th Gen Intel(R) Core(TM) i5-12600KF
Random access memory	32.0 GB
Graphics processing unit	NVIDIA GeForce RTX 3070Ti GPU 8 GB
Python version	3.8.17
PyTorch version	2.0.0

Table 3. Some hardware and software environments about the experiments.

View in the Article

Model	Precision (%)	Recall (%)	F₁-score (%)
BiLSTM-CRF (2020)	60.80	52.90	56.58 [11]
Lattice-LSTM (2018)	53.04	62.25	58.79
LR-CNN (2019)	57.14	66.67	59.92
LGN (2019)	56.44	64.52	60.21
BERT-CRF (2019)	67.12	66.88	67.00 [11]
FGN (2021)	69.02	73.65	71.25
MTL-HWS (2023)	73.03	73.21	73.12
DSpERT (2023)	69.52	68.80	69.12
KCB-FLAT (2024)	72.36	70.41	71.37
LkLi-CNER (2023)	77.43	68.23	72.54
Ours	71.96	75.32	73.60

Table 3. Performance on Weibo.

View in the Article

Model	Precision (%)	Recall (%)	F₁-score (%)
BiLSTM-CRF (2020)	80.31	79.22	79.76
BERT (2018)	85.06	76.75	80.69
BERT-CRF (2019)	83.00	81.70	82.40 [34]
Lattice-LSTM (2018)	84.43	81.28	82.82
BiLSTM+SSCNN-CRF (2023)	87.00	85.10	86.10
DSpERT (2023)	86.62	80.17	83.27
Ours	87.41	86.97	87.19

Table 4. Performance on Youku.

View in the Article

Model	F₁-score (%)
Model	BERT	CLUENER	Weibo	Youku
BCWC	+	79.86	73.60	87.19
CM	+	76.81 (↓3.05)	69.19 (↓4.41)	85.79 (↓1.40)
CM	−	70.00 (↓9.86)	56.58 (↓17.02)	79.76 (↓7.43)
WM	+	76.48 (↓3.38)	68.75 (↓4.85)	85.46 (↓1.73)
MSWM	+	76.72 (↓3.14)	69.75 (↓3.85)	85.64 (↓1.55)
CM⊕MSWM⊕FCF	+	77.63 (↓2.23)	70.39 (↓3.21)	86.12 (↓1.07)
CM⊕WM⊕MHAF	+	79.51 (↓0.35)	73.07 (↓0.53)	86.82 (↓0.37)