Search results

1 – 10 of 18
Article
Publication date: 5 May 2023

Ying Yu and Jing Ma

The tender documents, an essential data source for internet-based logistics tendering platforms, incorporate massive fine-grained data, ranging from information on tenderee…

Abstract

Purpose

The tender documents, an essential data source for internet-based logistics tendering platforms, incorporate massive fine-grained data, ranging from information on tenderee, shipping location and shipping items. Automated information extraction in this area is, however, under-researched, making the extraction process a time- and effort-consuming one. For Chinese logistics tender entities, in particular, existing named entity recognition (NER) solutions are mostly unsuitable as they involve domain-specific terminologies and possess different semantic features.

Design/methodology/approach

To tackle this problem, a novel lattice long short-term memory (LSTM) model, combining a variant contextual feature representation and a conditional random field (CRF) layer, is proposed in this paper for identifying valuable entities from logistic tender documents. Instead of traditional word embedding, the proposed model uses the pretrained Bidirectional Encoder Representations from Transformers (BERT) model as input to augment the contextual feature representation. Subsequently, with the Lattice-LSTM model, the information of characters and words is effectively utilized to avoid error segmentation.

Findings

The proposed model is then verified by the Chinese logistic tender named entity corpus. Moreover, the results suggest that the proposed model excels in the logistics tender corpus over other mainstream NER models. The proposed model underpins the automatic extraction of logistics tender information, enabling logistic companies to perceive the ever-changing market trends and make far-sighted logistic decisions.

Originality/value

(1) A practical model for logistic tender NER is proposed in the manuscript. By employing and fine-tuning BERT into the downstream task with a small amount of data, the experiment results show that the model has a better performance than other existing models. This is the first study, to the best of the authors' knowledge, to extract named entities from Chinese logistic tender documents. (2) A real logistic tender corpus for practical use is constructed and a program of the model for online-processing real logistic tender documents is developed in this work. The authors believe that the model will facilitate logistic companies in converting unstructured documents to structured data and further perceive the ever-changing market trends to make far-sighted logistic decisions.

Details

Data Technologies and Applications, vol. 58 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 28 December 2023

Na Xu, Yanxiang Liang, Chaoran Guo, Bo Meng, Xueqing Zhou, Yuting Hu and Bo Zhang

Safety management plays an important part in coal mine construction. Due to complex data, the implementation of the construction safety knowledge scattered in standards poses a…

Abstract

Purpose

Safety management plays an important part in coal mine construction. Due to complex data, the implementation of the construction safety knowledge scattered in standards poses a challenge. This paper aims to develop a knowledge extraction model to automatically and efficiently extract domain knowledge from unstructured texts.

Design/methodology/approach

Bidirectional encoder representations from transformers (BERT)-bidirectional long short-term memory (BiLSTM)-conditional random field (CRF) method based on a pre-training language model was applied to carry out knowledge entity recognition in the field of coal mine construction safety in this paper. Firstly, 80 safety standards for coal mine construction were collected, sorted out and marked as a descriptive corpus. Then, the BERT pre-training language model was used to obtain dynamic word vectors. Finally, the BiLSTM-CRF model concluded the entity’s optimal tag sequence.

Findings

Accordingly, 11,933 entities and 2,051 relationships in the standard specifications texts of this paper were identified and a language model suitable for coal mine construction safety management was proposed. The experiments showed that F1 values were all above 60% in nine types of entities such as security management. F1 value of this model was more than 60% for entity extraction. The model identified and extracted entities more accurately than conventional methods.

Originality/value

This work completed the domain knowledge query and built a Q&A platform via entities and relationships identified by the standard specifications suitable for coal mines. This paper proposed a systematic framework for texts in coal mine construction safety to improve efficiency and accuracy of domain-specific entity extraction. In addition, the pretraining language model was also introduced into the coal mine construction safety to realize dynamic entity recognition, which provides technical support and theoretical reference for the optimization of safety management platforms.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 14 November 2023

Shaodan Sun, Jun Deng and Xugong Qin

This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained…

Abstract

Purpose

This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained knowledge element perspective. This endeavor seeks to unlock the latent value embedded within newspaper contents while simultaneously furnishing invaluable guidance within methodological paradigms for research in the humanities domain.

Design/methodology/approach

According to the semantic organization process and knowledge element concept, this study proposes a holistic framework, including four pivotal stages: knowledge element description, extraction, association and application. Initially, a semantic description model dedicated to knowledge elements is devised. Subsequently, harnessing the advanced deep learning techniques, the study delves into the realm of entity recognition and relationship extraction. These techniques are instrumental in identifying entities within the historical newspaper contents and capturing the interdependencies that exist among them. Finally, an online platform based on Flask is developed to enable the recognition of entities and relationships within historical newspapers.

Findings

This article utilized the Shengjing Times·Changchun Compilation as the datasets for describing, extracting, associating and applying newspapers contents. Regarding knowledge element extraction, the BERT + BS consistently outperforms Bi-LSTM, CRF++ and even BERT in terms of Recall and F1 scores, making it a favorable choice for entity recognition in this context. Particularly noteworthy is the Bi-LSTM-Pro model, which stands out with the highest scores across all metrics, notably achieving an exceptional F1 score in knowledge element relationship recognition.

Originality/value

Historical newspapers transcend their status as mere artifacts, evolving into invaluable reservoirs safeguarding the societal and historical memory. Through semantic organization from a fine-grained knowledge element perspective, it can facilitate semantic retrieval, semantic association, information visualization and knowledge discovery services for historical newspapers. In practice, it can empower researchers to unearth profound insights within the historical and cultural context, broadening the landscape of digital humanities research and practical applications.

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 19 January 2024

Meng Zhu and Xiaolong Xu

Intent detection (ID) and slot filling (SF) are two important tasks in natural language understanding. ID is to identify the main intent of a paragraph of text. The goal of SF is…

Abstract

Purpose

Intent detection (ID) and slot filling (SF) are two important tasks in natural language understanding. ID is to identify the main intent of a paragraph of text. The goal of SF is to extract the information that is important to the intent from the input sentence. However, most of the existing methods use sentence-level intention recognition, which has the risk of error propagation, and the relationship between intention recognition and SF is not explicitly modeled. Aiming at this problem, this paper proposes a collaborative model of ID and SF for intelligent spoken language understanding called ID-SF-Fusion.

Design/methodology/approach

ID-SF-Fusion uses Bidirectional Encoder Representation from Transformers (BERT) and Bidirectional Long Short-Term Memory (BiLSTM) to extract effective word embedding and context vectors containing the whole sentence information respectively. Fusion layer is used to provide intent–slot fusion information for SF task. In this way, the relationship between ID and SF task is fully explicitly modeled. This layer takes the result of ID and slot context vectors as input to obtain the fusion information which contains both ID result and slot information. Meanwhile, to further reduce error propagation, we use word-level ID for the ID-SF-Fusion model. Finally, two tasks of ID and SF are realized by joint optimization training.

Findings

We conducted experiments on two public datasets, Airline Travel Information Systems (ATIS) and Snips. The results show that the Intent ACC score and Slot F1 score of ID-SF-Fusion on ATIS and Snips are 98.0 per cent and 95.8 per cent, respectively, and the two indicators on Snips dataset are 98.6 per cent and 96.7 per cent, respectively. These models are superior to slot-gated, SF-ID NetWork, stack-Prop and other models. In addition, ablation experiments were performed to further analyze and discuss the proposed model.

Originality/value

This paper uses word-level intent recognition and introduces intent information into the SF process, which is a significant improvement on both data sets.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 10 February 2023

Huiyong Wang, Ding Yang, Liang Guo and Xiaoming Zhang

Intent detection and slot filling are two important tasks in question comprehension of a question answering system. This study aims to build a joint task model with some…

Abstract

Purpose

Intent detection and slot filling are two important tasks in question comprehension of a question answering system. This study aims to build a joint task model with some generalization ability and benchmark its performance over other neural network models mentioned in this paper.

Design/methodology/approach

This study used a deep-learning-based approach for the joint modeling of question intent detection and slot filling. Meanwhile, the internal cell structure of the long short-term memory (LSTM) network was improved. Furthermore, the dataset Computer Science Literature Question (CSLQ) was constructed based on the Science and Technology Knowledge Graph. The datasets Airline Travel Information Systems, Snips (a natural language processing dataset of the consumer intent engine collected by Snips) and CSLQ were used for the empirical analysis. The accuracy of intent detection and F1 score of slot filling, as well as the semantic accuracy of sentences, were compared for several models.

Findings

The results showed that the proposed model outperformed all other benchmark methods, especially for the CSLQ dataset. This proves that the design of this study improved the comprehensive performance and generalization ability of the model to some extent.

Originality/value

This study contributes to the understanding of question sentences in a specific domain. LSTM was improved, and a computer literature domain dataset was constructed herein. This will lay the data and model foundation for the future construction of a computer literature question answering system.

Details

Data Technologies and Applications, vol. 57 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 29 May 2023

Xiang Zheng, Mingjie Li, Ze Wan and Yan Zhang

This study aims to extract knowledge of ancient Chinese scientific and technological documents bibliographic summaries (STDBS) and provide the knowledge graph (KG) comprehensively…

Abstract

Purpose

This study aims to extract knowledge of ancient Chinese scientific and technological documents bibliographic summaries (STDBS) and provide the knowledge graph (KG) comprehensively and systematically. By presenting the relationship among content, discipline, and author, this study focuses on providing services for knowledge discovery of ancient Chinese scientific and technological documents.

Design/methodology/approach

This study compiles ancient Chinese STDBS and designs a knowledge mining and graph visualization framework. The authors define the summaries' entities, attributes, and relationships for knowledge representation, use deep learning techniques such as BERT-BiLSTM-CRF models and rules for knowledge extraction, unify the representation of entities for knowledge fusion, and use Neo4j and other visualization techniques for KG construction and application. This study presents the generation, distribution, and evolution of ancient Chinese agricultural scientific and technological knowledge in visualization graphs.

Findings

The knowledge mining and graph visualization framework is feasible and effective. The BERT-BiLSTM-CRF model has domain adaptability and accuracy. The knowledge generation of ancient Chinese agricultural scientific and technological documents has distinctive time features. The knowledge distribution is uneven and concentrated, mainly concentrated on C1-Planting and cultivation, C2-Silkworm, and C3-Mulberry and water conservancy. The knowledge evolution is apparent, and differentiation and integration coexist.

Originality/value

This study is the first to visually present the knowledge connotation and association of ancient Chinese STDBS. It solves the problems of the lack of in-depth knowledge mining and connotation visualization of ancient Chinese STDBS.

Details

Library Hi Tech, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 22 February 2024

Yuzhuo Wang, Chengzhi Zhang, Min Song, Seongdeok Kim, Youngsoo Ko and Juhee Lee

In the era of artificial intelligence (AI), algorithms have gained unprecedented importance. Scientific studies have shown that algorithms are frequently mentioned in papers…

84

Abstract

Purpose

In the era of artificial intelligence (AI), algorithms have gained unprecedented importance. Scientific studies have shown that algorithms are frequently mentioned in papers, making mention frequency a classical indicator of their popularity and influence. However, contemporary methods for evaluating influence tend to focus solely on individual algorithms, disregarding the collective impact resulting from the interconnectedness of these algorithms, which can provide a new way to reveal their roles and importance within algorithm clusters. This paper aims to build the co-occurrence network of algorithms in the natural language processing field based on the full-text content of academic papers and analyze the academic influence of algorithms in the group based on the features of the network.

Design/methodology/approach

We use deep learning models to extract algorithm entities from articles and construct the whole, cumulative and annual co-occurrence networks. We first analyze the characteristics of algorithm networks and then use various centrality metrics to obtain the score and ranking of group influence for each algorithm in the whole domain and each year. Finally, we analyze the influence evolution of different representative algorithms.

Findings

The results indicate that algorithm networks also have the characteristics of complex networks, with tight connections between nodes developing over approximately four decades. For different algorithms, algorithms that are classic, high-performing and appear at the junctions of different eras can possess high popularity, control, central position and balanced influence in the network. As an algorithm gradually diminishes its sway within the group, it typically loses its core position first, followed by a dwindling association with other algorithms.

Originality/value

To the best of the authors’ knowledge, this paper is the first large-scale analysis of algorithm networks. The extensive temporal coverage, spanning over four decades of academic publications, ensures the depth and integrity of the network. Our results serve as a cornerstone for constructing multifaceted networks interlinking algorithms, scholars and tasks, facilitating future exploration of their scientific roles and semantic relations.

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 29 May 2023

Jinxiang Zeng, Shujin Cao, Yijin Chen, Pei Pan and Yafang Cai

This study analyzed the interdisciplinary characteristics of Chinese research studies in library and information science (LIS) measured by knowledge elements extracted through the…

Abstract

Purpose

This study analyzed the interdisciplinary characteristics of Chinese research studies in library and information science (LIS) measured by knowledge elements extracted through the Lexicon-LSTM model.

Design/methodology/approach

Eight research themes were selected for experiment, with a large-scale (N = 11,625) dataset of research papers from the China National Knowledge Infrastructure (CNKI) database constructed. And it is complemented with multiple corpora. Knowledge elements were extracted through a Lexicon-LSTM model. A subject knowledge graph is constructed to support the searching and classification of knowledge elements. An interdisciplinary-weighted average citation index space was constructed for measuring the interdisciplinary characteristics and contributions based on knowledge elements.

Findings

The empirical research shows that the Lexicon-LSTM model has superiority in the accuracy of extracting knowledge elements. In the field of LIS, the interdisciplinary diversity indicator showed an upward trend from 2011 to 2021, while the disciplinary balance and difference indicators showed a downward trend. The knowledge elements of theory and methodology could be used to detect and measure the interdisciplinary characteristics and contributions.

Originality/value

The extraction of knowledge elements facilitates the discovery of semantic information embedded in academic papers. The knowledge elements were proved feasible for measuring the interdisciplinary characteristics and exploring the changes in the time sequence, which helps for overview the state of the arts and future development trend of the interdisciplinary of research theme in LIS.

Details

Aslib Journal of Information Management, vol. 75 no. 3
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 29 March 2024

Sihao Li, Jiali Wang and Zhao Xu

The compliance checking of Building Information Modeling (BIM) models is crucial throughout the lifecycle of construction. The increasing amount and complexity of information…

Abstract

Purpose

The compliance checking of Building Information Modeling (BIM) models is crucial throughout the lifecycle of construction. The increasing amount and complexity of information carried by BIM models have made compliance checking more challenging, and manual methods are prone to errors. Therefore, this study aims to propose an integrative conceptual framework for automated compliance checking of BIM models, allowing for the identification of errors within BIM models.

Design/methodology/approach

This study first analyzed the typical building standards in the field of architecture and fire protection, and then the ontology of these elements is developed. Based on this, a building standard corpus is built, and deep learning models are trained to automatically label the building standard texts. The Neo4j is utilized for knowledge graph construction and storage, and a data extraction method based on the Dynamo is designed to obtain checking data files. After that, a matching algorithm is devised to express the logical rules of knowledge graph triples, resulting in automated compliance checking for BIM models.

Findings

Case validation results showed that this theoretical framework can achieve the automatic construction of domain knowledge graphs and automatic checking of BIM model compliance. Compared with traditional methods, this method has a higher degree of automation and portability.

Originality/value

This study introduces knowledge graphs and natural language processing technology into the field of BIM model checking and completes the automated process of constructing domain knowledge graphs and checking BIM model data. The validation of its functionality and usability through two case studies on a self-developed BIM checking platform.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 21 August 2023

Zengxin Kang, Jing Cui and Zhongyi Chu

Accurate segmentation of artificial assembly action is the basis of autonomous industrial assembly robots. This paper aims to study the precise segmentation method of manual…

Abstract

Purpose

Accurate segmentation of artificial assembly action is the basis of autonomous industrial assembly robots. This paper aims to study the precise segmentation method of manual assembly action.

Design/methodology/approach

In this paper, a temporal-spatial-contact features segmentation system (TSCFSS) for manual assembly actions recognition and segmentation is proposed. The system consists of three stages: spatial features extraction, contact force features extraction and action segmentation in the temporal dimension. In the spatial features extraction stage, a vectors assembly graph (VAG) is proposed to precisely describe the motion state of the objects and relative position between objects in an RGB-D video frame. Then graph networks are used to extract the spatial features from the VAG. In the contact features extraction stage, a sliding window is used to cut contact force features between hands and tools/parts corresponding to the video frame. Finally, in the action segmentation stage, the spatial and contact features are concatenated as the input of temporal convolution networks for action recognition and segmentation. The experiments have been conducted on a new manual assembly data set containing RGB-D video and contact force.

Findings

In the experiments, the TSCFSS is used to recognize 11 kinds of assembly actions in demonstrations and outperforms the other comparative action identification methods.

Originality/value

A novel manual assembly actions precisely segmentation system, which fuses temporal features, spatial features and contact force features, has been proposed. The VAG, a symbolic knowledge representation for describing assembly scene state, is proposed, making action segmentation more convenient. A data set with RGB-D video and contact force is specifically tailored for researching manual assembly actions.

Details

Robotic Intelligence and Automation, vol. 43 no. 5
Type: Research Article
ISSN: 2754-6969

Keywords

1 – 10 of 18