Search results

1 – 10 of 893
Article
Publication date: 28 December 2023

Na Xu, Yanxiang Liang, Chaoran Guo, Bo Meng, Xueqing Zhou, Yuting Hu and Bo Zhang

Safety management plays an important part in coal mine construction. Due to complex data, the implementation of the construction safety knowledge scattered in standards poses a…

Abstract

Purpose

Safety management plays an important part in coal mine construction. Due to complex data, the implementation of the construction safety knowledge scattered in standards poses a challenge. This paper aims to develop a knowledge extraction model to automatically and efficiently extract domain knowledge from unstructured texts.

Design/methodology/approach

Bidirectional encoder representations from transformers (BERT)-bidirectional long short-term memory (BiLSTM)-conditional random field (CRF) method based on a pre-training language model was applied to carry out knowledge entity recognition in the field of coal mine construction safety in this paper. Firstly, 80 safety standards for coal mine construction were collected, sorted out and marked as a descriptive corpus. Then, the BERT pre-training language model was used to obtain dynamic word vectors. Finally, the BiLSTM-CRF model concluded the entity’s optimal tag sequence.

Findings

Accordingly, 11,933 entities and 2,051 relationships in the standard specifications texts of this paper were identified and a language model suitable for coal mine construction safety management was proposed. The experiments showed that F1 values were all above 60% in nine types of entities such as security management. F1 value of this model was more than 60% for entity extraction. The model identified and extracted entities more accurately than conventional methods.

Originality/value

This work completed the domain knowledge query and built a Q&A platform via entities and relationships identified by the standard specifications suitable for coal mines. This paper proposed a systematic framework for texts in coal mine construction safety to improve efficiency and accuracy of domain-specific entity extraction. In addition, the pretraining language model was also introduced into the coal mine construction safety to realize dynamic entity recognition, which provides technical support and theoretical reference for the optimization of safety management platforms.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 20 September 2022

Jinzhu Zhang, Yue Liu, Linqi Jiang and Jialu Shi

This paper aims to propose a method for better discovering topic evolution path and semantic relationship from the perspective of patent entity extraction and semantic…

Abstract

Purpose

This paper aims to propose a method for better discovering topic evolution path and semantic relationship from the perspective of patent entity extraction and semantic representation. On the one hand, this paper identifies entities that have the same semantics but different expressions for accurate topic evolution path discovery. On the other hand, this paper reveals semantic relationships of topic evolution for better understanding what leads to topic evolution.

Design/methodology/approach

Firstly, a Bi-LSTM-CRF (bidirectional long short-term memory with conditional random field) model is designed for patent entity extraction and a representation learning method is constructed for patent entity representation. Secondly, a method based on knowledge outflow and inflow is proposed for discovering topic evolution path, by identifying and computing semantic common entities among topics. Finally, multiple semantic relationships among patent entities are pre-designed according to a specific domain, and then the semantic relationship among topics is identified through the proportion of different types of semantic relationships belonging to each topic.

Findings

In the field of UAV (unmanned aerial vehicle), this method identifies semantic common entities which have the same semantics but different expressions. In addition, this method better discovers topic evolution paths by comparison with a traditional method. Finally, this method identifies different semantic relationships among topics, which gives a detailed description for understanding and interpretation of topic evolution. These results prove that the proposed method is effective and useful. Simultaneously, this method is a preliminary study and still needs to be further investigated on other datasets using multiple emerging deep learning methods.

Originality/value

This work provides a new perspective for topic evolution analysis by considering semantic representation of patent entities. The authors design a method for discovering topic evolution paths by considering knowledge flow computed by semantic common entities, which can be easily extended to other patent mining-related tasks. This work is the first attempt to reveal semantic relationships among topics for a precise and detailed description of topic evolution.

Details

Aslib Journal of Information Management, vol. 75 no. 3
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 14 November 2023

Shaodan Sun, Jun Deng and Xugong Qin

This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained…

Abstract

Purpose

This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained knowledge element perspective. This endeavor seeks to unlock the latent value embedded within newspaper contents while simultaneously furnishing invaluable guidance within methodological paradigms for research in the humanities domain.

Design/methodology/approach

According to the semantic organization process and knowledge element concept, this study proposes a holistic framework, including four pivotal stages: knowledge element description, extraction, association and application. Initially, a semantic description model dedicated to knowledge elements is devised. Subsequently, harnessing the advanced deep learning techniques, the study delves into the realm of entity recognition and relationship extraction. These techniques are instrumental in identifying entities within the historical newspaper contents and capturing the interdependencies that exist among them. Finally, an online platform based on Flask is developed to enable the recognition of entities and relationships within historical newspapers.

Findings

This article utilized the Shengjing Times·Changchun Compilation as the datasets for describing, extracting, associating and applying newspapers contents. Regarding knowledge element extraction, the BERT + BS consistently outperforms Bi-LSTM, CRF++ and even BERT in terms of Recall and F1 scores, making it a favorable choice for entity recognition in this context. Particularly noteworthy is the Bi-LSTM-Pro model, which stands out with the highest scores across all metrics, notably achieving an exceptional F1 score in knowledge element relationship recognition.

Originality/value

Historical newspapers transcend their status as mere artifacts, evolving into invaluable reservoirs safeguarding the societal and historical memory. Through semantic organization from a fine-grained knowledge element perspective, it can facilitate semantic retrieval, semantic association, information visualization and knowledge discovery services for historical newspapers. In practice, it can empower researchers to unearth profound insights within the historical and cultural context, broadening the landscape of digital humanities research and practical applications.

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 8 June 2022

Guo Chen, Jiabin Peng, Tianxiang Xu and Lu Xiao

Problem-solving” is the most crucial key insight of scientific research. This study focuses on constructing the “problem-solving” knowledge graph of scientific domains by…

Abstract

Purpose

Problem-solving” is the most crucial key insight of scientific research. This study focuses on constructing the “problem-solving” knowledge graph of scientific domains by extracting four entity relation types: problem-solving, problem hierarchy, solution hierarchy and association.

Design/methodology/approach

This paper presents a low-cost method for identifying these relationships in scientific papers based on word analogy. The problem-solving and hierarchical relations are represented as offset vectors of the head and tail entities and then classified by referencing a small set of predefined entity relations.

Findings

This paper presents an experiment with artificial intelligence papers from the Web of Science and achieved good performance. The F1 scores of entity relation types problem hierarchy, problem-solving and solution hierarchy, which were 0.823, 0.815 and 0.748, respectively. This paper used computer vision as an example to demonstrate the application of the extracted relations in constructing domain knowledge graphs and revealing historical research trends.

Originality/value

This paper uses an approach that is highly efficient and has a good generalization ability. Instead of relying on a large-scale manually annotated corpus, it only requires a small set of entity relations that can be easily extracted from external knowledge resources.

Details

Aslib Journal of Information Management, vol. 75 no. 3
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 22 August 2022

Tatsawan Timakum, Min Song and Giyeong Kim

This study aimed to examine the mental health information entities and associations between the biomedical, psychological and social domains of bipolar disorder (BD) by analyzing…

Abstract

Purpose

This study aimed to examine the mental health information entities and associations between the biomedical, psychological and social domains of bipolar disorder (BD) by analyzing social media data and scientific literature.

Design/methodology/approach

Reddit posts and full-text papers from PubMed Central (PMC) were collected. The text analysis was used to create a psychological dictionary. The text mining tools were applied to extract BD entities and their relationships in the datasets using a dictionary- and rule-based approach. Lastly, social network analysis and visualization were employed to view the associations.

Findings

Mental health information on the drug side effects entity was detected frequently in both datasets. In the affective category, the most frequent entities were “depressed” and “severe” in the social media and PMC data, respectively. The social and personal concerns entities that related to friends, family, self-attitude and economy were found repeatedly in the Reddit data. The relationships between the biomedical and psychological processes, “afraid” and “Lithium” and “schizophrenia” and “suicidal,” were identified often in the social media and PMC data, respectively.

Originality/value

Mental health information has been increasingly sought-after, and BD is a mental illness with complicated factors in the clinical picture. This paper has made an original contribution to comprehending the biological, psychological and social factors of BD. Importantly, these results have highlighted the benefit of mental health informatics that can be analyzed in the laboratory and social media domains.

Details

Aslib Journal of Information Management, vol. 75 no. 3
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 3 February 2023

Huyen Nguyen, Haihua Chen, Jiangping Chen, Kate Kargozari and Junhua Ding

This study aims to evaluate a method of building a biomedical knowledge graph (KG).

Abstract

Purpose

This study aims to evaluate a method of building a biomedical knowledge graph (KG).

Design/methodology/approach

This research first constructs a COVID-19 KG on the COVID-19 Open Research Data Set, covering information over six categories (i.e. disease, drug, gene, species, therapy and symptom). The construction used open-source tools to extract entities, relations and triples. Then, the COVID-19 KG is evaluated on three data-quality dimensions: correctness, relatedness and comprehensiveness, using a semiautomatic approach. Finally, this study assesses the application of the KG by building a question answering (Q&A) system. Five queries regarding COVID-19 genomes, symptoms, transmissions and therapeutics were submitted to the system and the results were analyzed.

Findings

With current extraction tools, the quality of the KG is moderate and difficult to improve, unless more efforts are made to improve the tools for entity extraction, relation extraction and others. This study finds that comprehensiveness and relatedness positively correlate with the data size. Furthermore, the results indicate the performances of the Q&A systems built on the larger-scale KGs are better than the smaller ones for most queries, proving the importance of relatedness and comprehensiveness to ensure the usefulness of the KG.

Originality/value

The KG construction process, data-quality-based and application-based evaluations discussed in this paper provide valuable references for KG researchers and practitioners to build high-quality domain-specific knowledge discovery systems.

Details

Information Discovery and Delivery, vol. 51 no. 4
Type: Research Article
ISSN: 2398-6247

Keywords

Article
Publication date: 29 May 2023

Xiang Zheng, Mingjie Li, Ze Wan and Yan Zhang

This study aims to extract knowledge of ancient Chinese scientific and technological documents bibliographic summaries (STDBS) and provide the knowledge graph (KG) comprehensively…

Abstract

Purpose

This study aims to extract knowledge of ancient Chinese scientific and technological documents bibliographic summaries (STDBS) and provide the knowledge graph (KG) comprehensively and systematically. By presenting the relationship among content, discipline, and author, this study focuses on providing services for knowledge discovery of ancient Chinese scientific and technological documents.

Design/methodology/approach

This study compiles ancient Chinese STDBS and designs a knowledge mining and graph visualization framework. The authors define the summaries' entities, attributes, and relationships for knowledge representation, use deep learning techniques such as BERT-BiLSTM-CRF models and rules for knowledge extraction, unify the representation of entities for knowledge fusion, and use Neo4j and other visualization techniques for KG construction and application. This study presents the generation, distribution, and evolution of ancient Chinese agricultural scientific and technological knowledge in visualization graphs.

Findings

The knowledge mining and graph visualization framework is feasible and effective. The BERT-BiLSTM-CRF model has domain adaptability and accuracy. The knowledge generation of ancient Chinese agricultural scientific and technological documents has distinctive time features. The knowledge distribution is uneven and concentrated, mainly concentrated on C1-Planting and cultivation, C2-Silkworm, and C3-Mulberry and water conservancy. The knowledge evolution is apparent, and differentiation and integration coexist.

Originality/value

This study is the first to visually present the knowledge connotation and association of ancient Chinese STDBS. It solves the problems of the lack of in-depth knowledge mining and connotation visualization of ancient Chinese STDBS.

Details

Library Hi Tech, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 29 March 2024

Sihao Li, Jiali Wang and Zhao Xu

The compliance checking of Building Information Modeling (BIM) models is crucial throughout the lifecycle of construction. The increasing amount and complexity of information…

Abstract

Purpose

The compliance checking of Building Information Modeling (BIM) models is crucial throughout the lifecycle of construction. The increasing amount and complexity of information carried by BIM models have made compliance checking more challenging, and manual methods are prone to errors. Therefore, this study aims to propose an integrative conceptual framework for automated compliance checking of BIM models, allowing for the identification of errors within BIM models.

Design/methodology/approach

This study first analyzed the typical building standards in the field of architecture and fire protection, and then the ontology of these elements is developed. Based on this, a building standard corpus is built, and deep learning models are trained to automatically label the building standard texts. The Neo4j is utilized for knowledge graph construction and storage, and a data extraction method based on the Dynamo is designed to obtain checking data files. After that, a matching algorithm is devised to express the logical rules of knowledge graph triples, resulting in automated compliance checking for BIM models.

Findings

Case validation results showed that this theoretical framework can achieve the automatic construction of domain knowledge graphs and automatic checking of BIM model compliance. Compared with traditional methods, this method has a higher degree of automation and portability.

Originality/value

This study introduces knowledge graphs and natural language processing technology into the field of BIM model checking and completes the automated process of constructing domain knowledge graphs and checking BIM model data. The validation of its functionality and usability through two case studies on a self-developed BIM checking platform.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Abstract

Details

Aslib Journal of Information Management, vol. 75 no. 3
Type: Research Article
ISSN: 2050-3806

Article
Publication date: 13 December 2022

Chengxi Yan, Xuemei Tang, Hao Yang and Jun Wang

The majority of existing studies about named entity recognition (NER) concentrate on the prediction enhancement of deep neural network (DNN)-based models themselves, but the…

Abstract

Purpose

The majority of existing studies about named entity recognition (NER) concentrate on the prediction enhancement of deep neural network (DNN)-based models themselves, but the issues about the scarcity of training corpus and the difficulty of annotation quality control are not fully solved, especially for Chinese ancient corpora. Therefore, designing a new integrated solution for Chinese historical NER, including automatic entity extraction and man-machine cooperative annotation, is quite valuable for improving the effectiveness of Chinese historical NER and fostering the development of low-resource information extraction.

Design/methodology/approach

The research provides a systematic approach for Chinese historical NER with a three-stage framework. In addition to the stage of basic preprocessing, the authors create, retrain and yield a high-performance NER model only using limited labeled resources during the stage of augmented deep active learning (ADAL), which entails three steps—DNN-based NER modeling, hybrid pool-based sampling (HPS) based on the active learning (AL), and NER-oriented data augmentation (DA). ADAL is thought to have the capacity to maintain the performance of DNN as high as possible under the few-shot constraint. Then, to realize machine-aided quality control in crowdsourcing settings, the authors design a stage of globally-optimized automatic label consolidation (GALC). The core of GALC is a newly-designed label consolidation model called simulated annealing-based automatic label aggregation (“SA-ALC”), which incorporates the factors of worker reliability and global label estimation. The model can assure the annotation quality of those data from a crowdsourcing annotation system.

Findings

Extensive experiments on two types of Chinese classical historical datasets show that the authors’ solution can effectively reduce the corpus dependency of a DNN-based NER model and alleviate the problem of label quality. Moreover, the results also show the superior performance of the authors’ pipeline approaches (i.e. HPS + DA and SA-ALC) compared to equivalent baselines in each stage.

Originality/value

The study sheds new light on the automatic extraction of Chinese historical entities in an all-technological-process integration. The solution is helpful to effectively reducing the annotation cost and controlling the labeling quality for the NER task. It can be further applied to similar tasks of information extraction and other low-resource fields in theoretical and practical ways.

Details

Aslib Journal of Information Management, vol. 75 no. 3
Type: Research Article
ISSN: 2050-3806

Keywords

1 – 10 of 893