• Journal of Electronic Science and Technology
  • Vol. 22, Issue 4, 100287 (2024)
Yan Guo1, Hong-Chen Liu1, Fu-Jiang Liu2,*, Wei-Hua Lin2..., Quan-Sen Shao1 and Jun-Shun Su3|Show fewer author(s)
Author Affiliations
  • 1School of Computer Science, China University of Geosciences, Wuhan, 430078, China
  • 2School of Geography and Information Engineering, China University of Geosciences, Wuhan, 430078, China
  • 3Xining Comprehensive Natural Resources Survey Centre, China Geological Survey (CGS), Xining, 810000, China
  • show less
    DOI: 10.1016/j.jnlest.2024.100287 Cite this Article
    Yan Guo, Hong-Chen Liu, Fu-Jiang Liu, Wei-Hua Lin, Quan-Sen Shao, Jun-Shun Su. Chinese named entity recognition with multi-network fusion of multi-scale lexical information[J]. Journal of Electronic Science and Technology, 2024, 22(4): 100287 Copy Citation Text show less
    An example of nested entities.
    Fig. 1. An example of nested entities.
    An example of an ambiguous entity word.
    Fig. 2. An example of an ambiguous entity word.
    Comprehensive architecture of the BCWC model, consisting of four main layers: Embedding layer, feature extraction layer, feature fusion layer, and CRF layer. The embedding layer incorporates BERT and word embedding models for character and word embeddings, respectively. In the feature extraction layer, varied colored dashed boxes represent the range of word sequences captured by convolutional kernels of different scale sizes. The outcomes of these convolutions are seamlessly concatenated to form the final output. Subsequently, the feature fusion layer employs a multi-head attention mechanism for word weighting, culminating in the transmission of the output to the CRF layer for decoding. This layered structure ensures a comprehensive and efficient approach to information processing in the BCWC model.
    Fig. 3. Comprehensive architecture of the BCWC model, consisting of four main layers: Embedding layer, feature extraction layer, feature fusion layer, and CRF layer. The embedding layer incorporates BERT and word embedding models for character and word embeddings, respectively. In the feature extraction layer, varied colored dashed boxes represent the range of word sequences captured by convolutional kernels of different scale sizes. The outcomes of these convolutions are seamlessly concatenated to form the final output. Subsequently, the feature fusion layer employs a multi-head attention mechanism for word weighting, culminating in the transmission of the output to the CRF layer for decoding. This layered structure ensures a comprehensive and efficient approach to information processing in the BCWC model.
    Multi-scale IDCNN process diagram. The input consists of word embedding vectors. The diagram showcases two different scale iterative dilated convolution blocks. When , a regular convolution kernel is used, and when , the dilated convolution is employed. Each convolution block comprises two stacked layers. The results are concatenated and activated using the ReLU function, followed by normalization with LayerNorm. The outputs from different scales are concatenated to obtain the final output.
    Fig. 4. Multi-scale IDCNN process diagram. The input consists of word embedding vectors. The diagram showcases two different scale iterative dilated convolution blocks. When , a regular convolution kernel is used, and when , the dilated convolution is employed. Each convolution block comprises two stacked layers. The results are concatenated and activated using the ReLU function, followed by normalization with LayerNorm. The outputs from different scales are concatenated to obtain the final output.
    [in Chinese]
    Fig. 5. [in Chinese]
    Correctly identified named entities of the sample.
    Fig. 6. Correctly identified named entities of the sample.
    Tokenization and multi-scale convolution kernel capturing process.
    Fig. 7. Tokenization and multi-scale convolution kernel capturing process.
    Entity recognition under the same information capture window size.
    Fig. 8. Entity recognition under the same information capture window size.
    Results of different models on the three datasets.
    Fig. 9. Results of different models on the three datasets.
    Quantitative analysis of different structures: Results of IDCNN (a) with different structures on various datasets and (b) with the same structure but different convolutional kernel sizes on various datasets. In the notation m×n, m represents the number of iterative layers and n represents the size of the DCNN blocks.
    Fig. 10. Quantitative analysis of different structures: Results of IDCNN (a) with different structures on various datasets and (b) with the same structure but different convolutional kernel sizes on various datasets. In the notation m×n, m represents the number of iterative layers and n represents the size of the DCNN blocks.
    Comparison of different structures on three datasets.
    Fig. 11. Comparison of different structures on three datasets.
    HyperparameterValue
    RNN dimension64
    BERT learning rate2×10–5
    BERT dropout rate0.35
    RNN(CNN) learning rate1×10–3
    Kernel size3
    Boundary embedding dimension16
    BERT modelBERT-base-Chinese
    Epoch30
    OptimizerAdamW
    Multi-head attention head8
    Table 1. Some common hyperparameter settings about the experiments.
    DatasetTrainingEvaluationTestCategorySampleCharacter
    CLUENER10.74K1.34K1.34K1013.00K503K
    Weibo1.35K0.27K0.27K81.89K103K
    Youku8.00K1.00K1.00K910.00K170K
    Table 1. Statistics of datasets.
    Datasetmax_seq_lenBatch sizemax_word_len
    CLUENER1283225
    Weibo641620
    Youku1281620
    Table 2. Some hyperparameter settings about the experiments on the three datasets.
    ModelPrecision (%)Recall (%)F1-score (%)
    BiLSTM-CRF (2020)71.0668.9770.00 [32]
    stkBiGRU-CRF (2021)73.4170.0271.68
    TextCNN+CRF (2022)75.0769.7872.12
    Lattice-LSTM (2018)74.1373.8474.41 [44]
    BERT-CRF (2019)76.4777.0176.73
    BERT (2018)77.2480.4678.82 [32]
    ALBERT (2019)78.9664.5871.05
    ALBERT-BiLSTM (2020)77.2581.2879.22
    RoBERTa-WWM-BiLSTM-CRF (2021)78.8180.8279.80
    DSpERT (2023)78.2479.8279.02
    Ours80.2979.4379.86
    Table 2. Performance on CLUENER.
    EnvironmentValue
    Operating systemWindows 11
    Processor12th Gen Intel(R) Core(TM) i5-12600KF
    Random access memory32.0 GB
    Graphics processing unitNVIDIA GeForce RTX 3070Ti GPU 8 GB
    Python version3.8.17
    PyTorch version2.0.0
    Table 3. Some hardware and software environments about the experiments.
    ModelPrecision (%)Recall (%)F1-score (%)
    BiLSTM-CRF (2020)60.8052.9056.58 [11]
    Lattice-LSTM (2018)53.0462.2558.79
    LR-CNN (2019)57.1466.6759.92
    LGN (2019)56.4464.5260.21
    BERT-CRF (2019)67.1266.8867.00 [11]
    FGN (2021)69.0273.6571.25
    MTL-HWS (2023)73.0373.2173.12
    DSpERT (2023)69.5268.8069.12
    KCB-FLAT (2024)72.3670.4171.37
    LkLi-CNER (2023)77.4368.2372.54
    Ours71.9675.3273.60
    Table 3. Performance on Weibo.
    ModelPrecision (%)Recall (%)F1-score (%)
    BiLSTM-CRF (2020)80.3179.2279.76
    BERT (2018)85.0676.7580.69
    BERT-CRF (2019)83.0081.7082.40 [34]
    Lattice-LSTM (2018)84.4381.2882.82
    BiLSTM+SSCNN-CRF (2023)87.0085.1086.10
    DSpERT (2023)86.6280.1783.27
    Ours87.4186.9787.19
    Table 4. Performance on Youku.
    ModelF1-score (%)
    BERTCLUENERWeiboYouku
    BCWC+79.8673.6087.19
    CM+76.81 (↓3.05)69.19 (↓4.41)85.79 (↓1.40)
    70.00 (↓9.86)56.58 (↓17.02)79.76 (↓7.43)
    WM+76.48 (↓3.38)68.75 (↓4.85)85.46 (↓1.73)
    MSWM+76.72 (↓3.14)69.75 (↓3.85)85.64 (↓1.55)
    CM⊕MSWM⊕FCF+77.63 (↓2.23)70.39 (↓3.21)86.12 (↓1.07)
    CM⊕WM⊕MHAF+79.51 (↓0.35)73.07 (↓0.53)86.82 (↓0.37)
    Table 5. F1-score results of the ablation study.
    Yan Guo, Hong-Chen Liu, Fu-Jiang Liu, Wei-Hua Lin, Quan-Sen Shao, Jun-Shun Su. Chinese named entity recognition with multi-network fusion of multi-scale lexical information[J]. Journal of Electronic Science and Technology, 2024, 22(4): 100287
    Download Citation