• Journal of Electronic Science and Technology
  • Vol. 22, Issue 4, 100287 (2024)
Yan Guo1, Hong-Chen Liu1, Fu-Jiang Liu2,*, Wei-Hua Lin2..., Quan-Sen Shao1 and Jun-Shun Su3|Show fewer author(s)
Author Affiliations
  • 1School of Computer Science, China University of Geosciences, Wuhan, 430078, China
  • 2School of Geography and Information Engineering, China University of Geosciences, Wuhan, 430078, China
  • 3Xining Comprehensive Natural Resources Survey Centre, China Geological Survey (CGS), Xining, 810000, China
  • show less
    DOI: 10.1016/j.jnlest.2024.100287 Cite this Article
    Yan Guo, Hong-Chen Liu, Fu-Jiang Liu, Wei-Hua Lin, Quan-Sen Shao, Jun-Shun Su. Chinese named entity recognition with multi-network fusion of multi-scale lexical information[J]. Journal of Electronic Science and Technology, 2024, 22(4): 100287 Copy Citation Text show less

    Abstract

    Named entity recognition (NER) is an important part in knowledge extraction and one of the main tasks in constructing knowledge graphs. In today’s Chinese named entity recognition (CNER) task, the BERT-BiLSTM-CRF model is widely used and often yields notable results. However, recognizing each entity with high accuracy remains challenging. Many entities do not appear as single words but as part of complex phrases, making it difficult to achieve accurate recognition using word embedding information alone because the intricate lexical structure often impacts the performance. To address this issue, we propose an improved Bidirectional Encoder Representations from Transformers (BERT) character word conditional random field (CRF) (BCWC) model. It incorporates a pre-trained word embedding model using the skip-gram with negative sampling (SGNS) method, alongside traditional BERT embeddings. By comparing datasets with different word segmentation tools, we obtain enhanced word embedding features for segmented data. These features are then processed using the multi-scale convolution and iterated dilated convolutional neural networks (IDCNNs) with varying expansion rates to capture features at multiple scales and extract diverse contextual information. Additionally, a multi-attention mechanism is employed to fuse word and character embeddings. Finally, CRFs are applied to learn sequence constraints and optimize entity label annotations. A series of experiments are conducted on three public datasets, demonstrating that the proposed method outperforms the recent advanced baselines. BCWC is capable to address the challenge of recognizing complex entities by combining character-level and word-level embedding information, thereby improving the accuracy of CNER. Such a model is potential to the applications of more precise knowledge extraction such as knowledge graph construction and information retrieval, particularly in domain-specific natural language processing tasks that require high entity recognition precision.
    $ P\left( {\left. {{w_c}} \right|{w_t}} \right) = \frac{{\exp \left( {{{{\mathbf{v}}}_{{w_c}}'}^{\text{T}} \cdot {{\mathbf{v}}_{{w_t}}}} \right)}}{{\displaystyle\sum\limits_{w = 1}^W {\exp \left( {{{{\mathbf{v}}}_w'}^{\text{T}} \cdot {{\mathbf{v}}_{{w_t}}}} \right)} }} $(1)

    View in Article

    $ P\left( {\left. {{w_{{n_k}}}} \right|{w_t}} \right) = \frac{{\exp \left( {{{{\mathbf{v}}}_{{w_{{n_k}}}}'}^{\text{T}}{{\mathbf{v}}_{{w_t}}}} \right)}}{{\displaystyle\sum\limits_{w = 1}^W {\exp \left( {{{{\mathbf{v}}}_w'}^{\text{T}}{{\mathbf{v}}_{{w_t}}}} \right)} }} . $(2)

    View in Article

    $ {\mathrm{max}} \prod\nolimits_{({w_t}{\mathrm{,}}{w_c}) \in R} {P\left( {\left. {{w_c}} \right|{w_t}} \right)} \prod\nolimits_{({w_t}{\mathrm{,}}{w_{{n_k}}}) \in N} {P\left( {\left. {{w_{{n_k}}}} \right|{w_t}} \right)} $(3)

    View in Article

    $ {{\mathbf{I}}_t} = \sigma \left( {{{\mathbf{W}}_{{\mu _i}}}{{\mathbf{x}}_t} + {{\mathbf{W}}_{{h_i}}}{{\mathbf{H}}_{t - 1}} + {{\mathbf{b}}_i}} \right) $(4)

    View in Article

    $ {{\mathbf{f}}_t} = \sigma \left( {{{\mathbf{W}}_{{\mu _f}}}{{\mathbf{x}}_t} + {{\mathbf{W}}_{{h_f}}}{{\mathbf{H}}_{t - 1}} + {{\mathbf{b}}_f}} \right) $(5)

    View in Article

    $ {{{\tilde {\bf{C}}}}_t} = {\text{tanh}} \left( {{{\mathbf{W}}_{{\mu _c}}}{{\mathbf{x}}_t} + {{\mathbf{W}}_{{h_c}}}{{\mathbf{H}}_{t - 1}} + {{\mathbf{b}}_c}} \right) $(6)

    View in Article

    $ {{\mathbf{C}}_t} = {{\mathbf{f}}_t}{{\mathbf{C}}_{t - 1}} + {{\mathbf{I}}_t}{{{\tilde {\bf{C}}}}_t} $(7)

    View in Article

    $ {{\mathbf{O}}_t} = \sigma \left( {{{\mathbf{W}}_{{\mu _o}}}{{\mathbf{x}}_t} + {{\mathbf{W}}_{{h_o}}}{{\mathbf{H}}_{t - 1}} + {{\mathbf{b}}_o}} \right) $(8)

    View in Article

    $ {{\mathbf{H}}_t} = {{\mathbf{O}}_t}\;{\mathrm{tanh}} \;{{\mathbf{C}}_t} $(9)

    View in Article

    $ {\mathbf{h}}_t^{{\text{final}}} = {\text{Concat}}\left[ {{\mathbf{h}}_t^{{\text{forward}}}{\mathrm{,}}{\text{ }}{\mathbf{h}}_t^{{\text{backward}}}} \right]. $(10)

    View in Article

    $ {{\mathbf{O}}_i} = \left( {{{\mathbf{W}}_i} * {{\mathbf{W}}_e}\left[ {i:i + {k_1}} \right]} \right) + {\mathbf{b}} $(11)

    View in Article

    $ {{\mathbf{O}}_i} = \left( {{{\mathbf{W}}_i} * {{\mathbf{W}}_e}\left[ {i:i + {k_2}} \right]} \right) + {\mathbf{b}}. $(12)

    View in Article

    $ {{\mathbf{O}}_{{\text{id}}}} = \left( {{{\mathbf{W}}_{{\text{id}}}} * {{\mathbf{O}}_i}\left[ {i:i + R} \right]} \right) + {\mathbf{b}} $(13)

    View in Article

    $ r = k + \left( {k - 1} \right)\left( {d - 1} \right) $(14)

    View in Article

    $ {{{\tilde {\bf{O}}}}_1} = {\text{Contat}}\left( {{{\mathbf{O}}_1}{\mathrm{,}}{\text{ }}{{\mathbf{O}}_2}{\mathrm{,}}{\text{ }} \cdots {\mathrm{,}}{\text{ }}{{\mathbf{O}}_m}{\mathrm{,}}{\text{ }}{\mathrm{dim}} = 2} \right) $(15)

    View in Article

    $ {{{\tilde {\bf{O}}}}_2} = {\text{Contat}}\left( {{{\mathbf{O}}_1}{\mathrm{,}}{\text{ }}{{\mathbf{O}}_2}{\mathrm{,}}{\text{ }} \cdots {\mathrm{,}}{\text{ }}{{\mathbf{O}}_m}{\mathrm{,}}{\text{ }}{\mathrm{dim}} = 2} \right) $(16)

    View in Article

    $ {{\tilde {\bf{O}}}} = {\text{Layer}}\left( {{\text{ReLU}}\left( {{\text{Concat}}\left( {{{{{\tilde {\bf{O}}}}}_1}{\mathrm{,}}{\text{ }}{{{{\tilde {\bf{O}}}}}_2}{\mathrm{,}}{\text{ }}{\mathrm{dim}} = 1} \right)} \right)} \right) $(17)

    View in Article

    $ {{\mathbf{e}}_r} = {\text{Concat}}\left( {{{\mathbf{e}}_c}{\mathrm{,}}{\text{ }}{{\mathbf{e}}_w}{\mathrm{,}}{\text{ }}{\mathrm{dim}} = 1} \right) $(18)

    View in Article

    $ {{\mathbf{e}}_\alpha } = {\text{Dense}}\left( {{\mathbf{e}}_r^{\text{T}}} \right) $(19)

    View in Article

    $ {{\mathbf{e}}_s} = {\mathbf{e}}_\alpha ^{\text{T}} $(20)

    View in Article

    $ {d_k} = {{{\text{embedding\_}}{\mathrm{dim}} } \mathord{\left/ {\vphantom {{{\text{embedding\_}}\dim } {{\text{head}}}}} \right. } {{\text{head}}}} $(21)

    View in Article

    $ {\mathbf{Q}}{\text{ = }}{\mathbf{h}}_t^{{\text{final}}} = {\text{Concat}}\left( {{{\mathbf{Q}}_1}{\mathrm{,}}{\text{ }}{{\mathbf{Q}}_2}{\mathrm{,}}{\text{ }} \cdots {\mathrm{,}}{\text{ }}{{\mathbf{Q}}_{{\text{head}}}}} \right) $(22)

    View in Article

    $ {\mathbf{K}}{\text{ = }}{\mathbf{c}}_t^{{\text{final}}} = {\text{Concat}}\left( {{{\mathbf{K}}_1}{\mathrm{,}}{\text{ }}{{\mathbf{K}}_2}{\mathrm{,}}{\text{ }} \cdots {\mathrm{,}}{\text{ }}{{\mathbf{K}}_{{\text{head}}}}} \right) $(23)

    View in Article

    $ {\mathbf{V}}{\text{ = }}{\mathbf{c}}_t^{{\text{final}}} = {\text{Concat}}\left( {{{\mathbf{V}}_1}{\mathrm{,}}{\text{ }}{{\mathbf{V}}_2}{\mathrm{,}}{\text{ }} \cdots {\mathrm{,}}{\text{ }}{{\mathbf{V}}_{{\text{head}}}}} \right) $(24)

    View in Article

    $ {\mathbf{Resul}}{{\mathbf{t}}_i} = {\text{Attention}}\left( {{{\mathbf{Q}}_i}{\mathrm{,}}{\text{ }}{{\mathbf{K}}_i}{\mathrm{,}}{\text{ }}{{\mathbf{V}}_i}} \right) = {\text{Softmax}}\left( {\frac{{{{\mathbf{Q}}_i}{\mathbf{K}}_i^{\text{T}}}}{{\sqrt {{d_k}} }}} \right){{\mathbf{V}}_i} $(25)

    View in Article

    $ {\mathbf{Result}} = {\text{Concat}}\left( {{\mathbf{Resul}}{{\mathbf{t}}_1}{\mathrm{,}}{\text{ }}{\mathbf{Resul}}{{\mathbf{t}}_2}{\mathrm{,}}{\text{ }} \cdots {\mathrm{,}}{\text{ }}{\mathbf{Resul}}{{\mathbf{t}}_{{\text{head}}}}} \right) $(26)

    View in Article

    $ P\left( {\left. {\mathbf{Y}} \right|{\mathbf{X}}} \right) = \frac{{\displaystyle\prod\limits_{t = 1}^\omega {{\varphi _t}\left( {\left. {{{\mathbf{Y}}_{t - 1}}{\mathrm{,}}{\text{ }}{{\mathbf{Y}}_t}} \right|{\mathbf{X}}} \right)} }}{{\displaystyle\sum\limits_{{\mathbf{Y}}' \in {Y_{{\text{label}}}}} {\prod\limits_{t = 1}^\omega {{\varphi _t}\left( {\left. {{{{\mathbf{Y}}}_{t - 1}'}{\mathrm{,}}{\text{ }}{{{\mathbf{Y}}}_t'}} \right|{\mathbf{X}}} \right)} } }} $(27)

    View in Article

    $ {\varphi _t}\left( {\left. {{{\mathbf{Y}}_{t - 1}}{\mathrm{,}}{{\mathbf{Y}}_t}} \right|{\mathbf{X}}} \right) = \exp \left( {{{\mathbf{W}}_t}{{\mathbf{H}}_{{{\mathbf{Y}}_{t - 1}}{\mathrm{,}}{{\mathbf{Y}}_t}}} + {{\mathbf{b}}_{{{\mathbf{Y}}_{t - 1}}{\mathrm{,}}{{\mathbf{Y}}_t}}}} \right) $(28)

    View in Article

    $ {\mathrm{Loss}} = - \sum {{\mathrm{log}} \left( {P\left( {\left. {\mathbf{Y}} \right|{\mathbf{X}}} \right)} \right)} {\mathrm{.}} $(29)

    View in Article

    $ {\text{Precision = }}\frac{{{\text{TP}}}}{{{\text{TP + FP}}}}{\mathrm{.}} $(30)

    View in Article

    $ {\text{Recall = }}\frac{{{\text{TP}}}}{{{\text{TP + FN}}}}{\mathrm{.}} $(31)

    View in Article

    $ {F_1}{\text{-score = }}\frac{{{\text{2}} \times {\text{Precision}} \times {\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}}{\mathrm{.}} $(32)

    View in Article

    Yan Guo, Hong-Chen Liu, Fu-Jiang Liu, Wei-Hua Lin, Quan-Sen Shao, Jun-Shun Su. Chinese named entity recognition with multi-network fusion of multi-scale lexical information[J]. Journal of Electronic Science and Technology, 2024, 22(4): 100287
    Download Citation