Search results

1 – 10 of 57
Article
Publication date: 7 August 2017

Eun-Suk Yang, Jong Dae Kim, Chan-Young Park, Hye-Jeong Song and Yu-Seop Kim

In this paper, the problem of a nonlinear model – specifically the hidden unit conditional random fields (HUCRFs) model, which has binary stochastic hidden units between the data…

Abstract

Purpose

In this paper, the problem of a nonlinear model – specifically the hidden unit conditional random fields (HUCRFs) model, which has binary stochastic hidden units between the data and the labels – exhibiting unstable performance depending on the hyperparameter under consideration.

Design/methodology/approach

There are three main optimization search methods for hyperparameter tuning: manual search, grid search and random search. This study shows that HUCRFs’ unstable performance depends on the hyperparameter values used and its performance is based on tuning that draws on grid and random searches. All experiments conducted used the n-gram features – specifically, unigram, bigram, and trigram.

Findings

Naturally, selecting a list of hyperparameter values based on a researchers’ experience to find a set in which the best performance is exhibited is better than finding it from a probability distribution. Realistically, however, it is impossible to calculate using the parameters in all combinations. The present research indicates that the random search method has a better performance compared with the grid search method while requiring shorter computation time and a reduced cost.

Originality/value

In this paper, the issues affecting the performance of HUCRF, a nonlinear model with performance that varies depending on the hyperparameters, but performs better than CRF, has been examined.

Details

Engineering Computations, vol. 34 no. 6
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 31 August 2022

Si Shen, Chuan Jiang, Haotian Hu, Youshu Ji and Dongbo Wang

Reorganising unstructured academic abstracts according to a certain logical structure can help scholars not only extract valid information quickly but also facilitate the faceted…

Abstract

Purpose

Reorganising unstructured academic abstracts according to a certain logical structure can help scholars not only extract valid information quickly but also facilitate the faceted search of academic literature. This study aims to build a high-performance model for identifying of the functional structures of unstructured abstracts in the social sciences.

Design/methodology/approach

This study first investigated the structuring of abstracts in academic articles in the field of social sciences, using large-scale statistical analyses. Then, the functional structures of sentences in the abstract in a corpus of more than 3.5 million abstracts were identified from sentence classification and sequence tagging by using several models based on either machine learning or a deep learning approach, and the results were compared.

Findings

The results demonstrate that the functional structures of sentences in abstracts in social science manuscripts include the background, purpose, methods, results and conclusions. The experimental results show that the bidirectional encoder representation from transformers exhibited the best performance, the overall F1 score of which was 86.23%.

Originality/value

The data set of annotated social science abstract is generated and corresponding models are trained on the basis of the data set, both of which are available on Github (https://github.com/Academic-Abstract-Knowledge-Mining/SSCI_Abstract_Structures_Identification). Based on the optimised model, a Web application for the identification of the functional structures of abstracts and their faceted search in social sciences was constructed to enable rapid and convenient reading, organisation and fine-grained retrieval of academic abstracts.

Article
Publication date: 5 May 2023

Ying Yu and Jing Ma

The tender documents, an essential data source for internet-based logistics tendering platforms, incorporate massive fine-grained data, ranging from information on tenderee…

Abstract

Purpose

The tender documents, an essential data source for internet-based logistics tendering platforms, incorporate massive fine-grained data, ranging from information on tenderee, shipping location and shipping items. Automated information extraction in this area is, however, under-researched, making the extraction process a time- and effort-consuming one. For Chinese logistics tender entities, in particular, existing named entity recognition (NER) solutions are mostly unsuitable as they involve domain-specific terminologies and possess different semantic features.

Design/methodology/approach

To tackle this problem, a novel lattice long short-term memory (LSTM) model, combining a variant contextual feature representation and a conditional random field (CRF) layer, is proposed in this paper for identifying valuable entities from logistic tender documents. Instead of traditional word embedding, the proposed model uses the pretrained Bidirectional Encoder Representations from Transformers (BERT) model as input to augment the contextual feature representation. Subsequently, with the Lattice-LSTM model, the information of characters and words is effectively utilized to avoid error segmentation.

Findings

The proposed model is then verified by the Chinese logistic tender named entity corpus. Moreover, the results suggest that the proposed model excels in the logistics tender corpus over other mainstream NER models. The proposed model underpins the automatic extraction of logistics tender information, enabling logistic companies to perceive the ever-changing market trends and make far-sighted logistic decisions.

Originality/value

(1) A practical model for logistic tender NER is proposed in the manuscript. By employing and fine-tuning BERT into the downstream task with a small amount of data, the experiment results show that the model has a better performance than other existing models. This is the first study, to the best of the authors' knowledge, to extract named entities from Chinese logistic tender documents. (2) A real logistic tender corpus for practical use is constructed and a program of the model for online-processing real logistic tender documents is developed in this work. The authors believe that the model will facilitate logistic companies in converting unstructured documents to structured data and further perceive the ever-changing market trends to make far-sighted logistic decisions.

Details

Data Technologies and Applications, vol. 58 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 28 December 2023

Na Xu, Yanxiang Liang, Chaoran Guo, Bo Meng, Xueqing Zhou, Yuting Hu and Bo Zhang

Safety management plays an important part in coal mine construction. Due to complex data, the implementation of the construction safety knowledge scattered in standards poses a…

Abstract

Purpose

Safety management plays an important part in coal mine construction. Due to complex data, the implementation of the construction safety knowledge scattered in standards poses a challenge. This paper aims to develop a knowledge extraction model to automatically and efficiently extract domain knowledge from unstructured texts.

Design/methodology/approach

Bidirectional encoder representations from transformers (BERT)-bidirectional long short-term memory (BiLSTM)-conditional random field (CRF) method based on a pre-training language model was applied to carry out knowledge entity recognition in the field of coal mine construction safety in this paper. Firstly, 80 safety standards for coal mine construction were collected, sorted out and marked as a descriptive corpus. Then, the BERT pre-training language model was used to obtain dynamic word vectors. Finally, the BiLSTM-CRF model concluded the entity’s optimal tag sequence.

Findings

Accordingly, 11,933 entities and 2,051 relationships in the standard specifications texts of this paper were identified and a language model suitable for coal mine construction safety management was proposed. The experiments showed that F1 values were all above 60% in nine types of entities such as security management. F1 value of this model was more than 60% for entity extraction. The model identified and extracted entities more accurately than conventional methods.

Originality/value

This work completed the domain knowledge query and built a Q&A platform via entities and relationships identified by the standard specifications suitable for coal mines. This paper proposed a systematic framework for texts in coal mine construction safety to improve efficiency and accuracy of domain-specific entity extraction. In addition, the pretraining language model was also introduced into the coal mine construction safety to realize dynamic entity recognition, which provides technical support and theoretical reference for the optimization of safety management platforms.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 3 June 2020

Christopher Enyioma Alozie

This paper assessed accuracy level in accounting for government funds in Nigeria's federal treasury and their faithful presentation in government financial reporting. It aimed to…

8252

Abstract

Purpose

This paper assessed accuracy level in accounting for government funds in Nigeria's federal treasury and their faithful presentation in government financial reporting. It aimed to determine whether the reported annual balances in Nigeria's financial reporting were reliable or otherwise. Data used in analysis were obtained from secondary sources from federal treasury.

Design/methodology/approach

Ex-post “facto” analysis method was adopted in the study involving the use of statistical techniques of absolute or aggregate mean percentage error derived from differences between recomputed and published fund balances and was employed. This was augmented with interactive review meetings of the initial case research report with the management of Nigeria's audit agency.

Findings

Results distilled from the consolidated revenue fund (CRF), development fund and public debt show that recomputed values were greater than the fund balances in the gazetted financial statements. Results for contingency fund (CTF), federation account fund (FAF), special trust fund (STF) and sundry deposit fund yield equal figures and accurate. The paper concludes that there were serial understatements of the core public fund balances in the financial statements over the years. This trend of reporting incorrect in three core public funds in financial statements rendered Nigeria's financial position unreliable in the affected years for decisions. It also facilitated frauds, mismanagement of funds and corrupt practices.

Research limitations/implications

The scope of the research is restricted to assessment of degree of accuracy in fund accounting, faithful representation of the respective fund balance in the liabilities side of FGN balance sheet and the reliability of the financial position. But, it did not consider or cover the implementation of International Public Sector Accounting Standards (IPSASs) in federal treasury since FGN had not issued any full IPSAS–oriented financial statements as on 2015.

Practical implications

Identification of deficiencies in fund account balances, structural defects in fund accounting and acts of understatement of carrying balances in CRF and capital development fund (CDF) implies that the aggregate core fund liabilities reported in financial statement of government entities without corresponding assets do not actually reflect a true and fair financial position in some countries. It reveals remarkable degree of financial information asymmetry in government financial reporting. Illusionary fund accounting has direct linkage to poor fiscal governance in many sovereign with associated sub-optimal delivery of public goods and service level distress syndrome in many economies; lead to poverty, unemployment, crisis and macroeconomic disturbances.

Social implications

The study contributes to the development of fund accounting system; strengthening government financial reporting architecture and practices. It provides framework for tracking financial information asymmetry in government financial reporting and mismanagement of public funds. It provides platform to effect necessary adjustment (correction) during the “first time 3-year adoption” adjustment window in Nigeria. Flowing from the findings, it advocates for institutionalization of government fund accounting standards and provides evidence for migration to accrual accounting system in countries that have not already implemented it. Evaluation system developed herein will improve fund management in federal treasury and contribute to efficient public financial management, good governance and enhance development of public accounting practice.

Originality/value

This exploratory empirical research is the one to ever evaluate accuracy level of fund accounting in sovereign entities and faithful representation in government's financial position prior to implementation of accrual accounting and financial reporting. The study established substantial level of illusionary accounting for public funds and information asymmetry in published government's financial reporting. It is necessary to rectify these discrepancies in fund accounting and financial reporting prior to and or during the first three years of the IPSAS transition implementation programme. These research deliverables provide adopters with relevant data for adjustment accounting during the transition period in strengthening public financial reporting in order to realize the benefit of full IPSAS accrual accounting.

Details

Journal of Public Budgeting, Accounting & Financial Management, vol. 32 no. 3
Type: Research Article
ISSN: 1096-3367

Keywords

Article
Publication date: 2 August 2022

Zhongbao Liu and Wenjuan Zhao

The research on structure function recognition mainly concentrates on identifying a specific part of academic literature and its applicability in the multidiscipline perspective…

Abstract

Purpose

The research on structure function recognition mainly concentrates on identifying a specific part of academic literature and its applicability in the multidiscipline perspective. A specific part of academic literature, such as sentences, paragraphs and chapter contents are also called a level of academic literature in this paper. There are a few comparative research works on the relationship between models, disciplines and levels in the process of structure function recognition. In view of this, comparative research on structure function recognition based on deep learning has been conducted in this paper.

Design/methodology/approach

An experimental corpus, including the academic literature of traditional Chinese medicine, library and information science, computer science, environmental science and phytology, was constructed. Meanwhile, deep learning models such as convolutional neural networks (CNN), long and short-term memory (LSTM) and bidirectional encoder representation from transformers (BERT) were used. The comparative experiments of structure function recognition were conducted with the help of the deep learning models from the multilevel perspective.

Findings

The experimental results showed that (1) the BERT model performed best, with F1 values of 78.02, 89.41 and 94.88%, respectively at the level of sentence, paragraph and chapter content. (2) The deep learning models performed better on the academic literature of traditional Chinese medicine than on other disciplines in most cases, e.g. F1 values of CNN, LSTM and BERT, respectively arrived at 71.14, 69.96 and 78.02% at the level of sentence. (3) The deep learning models performed better at the level of chapter content than other levels, the maximum F1 values of CNN, LSTM and BERT at 91.92, 74.90 and 94.88%, respectively. Furthermore, the confusion matrix of recognition results on the academic literature was introduced to find out the reason for misrecognition.

Originality/value

This paper may inspire other research on structure function recognition, and provide a valuable reference for the analysis of influencing factors.

Details

Library Hi Tech, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 14 November 2023

Shaodan Sun, Jun Deng and Xugong Qin

This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained…

Abstract

Purpose

This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained knowledge element perspective. This endeavor seeks to unlock the latent value embedded within newspaper contents while simultaneously furnishing invaluable guidance within methodological paradigms for research in the humanities domain.

Design/methodology/approach

According to the semantic organization process and knowledge element concept, this study proposes a holistic framework, including four pivotal stages: knowledge element description, extraction, association and application. Initially, a semantic description model dedicated to knowledge elements is devised. Subsequently, harnessing the advanced deep learning techniques, the study delves into the realm of entity recognition and relationship extraction. These techniques are instrumental in identifying entities within the historical newspaper contents and capturing the interdependencies that exist among them. Finally, an online platform based on Flask is developed to enable the recognition of entities and relationships within historical newspapers.

Findings

This article utilized the Shengjing Times·Changchun Compilation as the datasets for describing, extracting, associating and applying newspapers contents. Regarding knowledge element extraction, the BERT + BS consistently outperforms Bi-LSTM, CRF++ and even BERT in terms of Recall and F1 scores, making it a favorable choice for entity recognition in this context. Particularly noteworthy is the Bi-LSTM-Pro model, which stands out with the highest scores across all metrics, notably achieving an exceptional F1 score in knowledge element relationship recognition.

Originality/value

Historical newspapers transcend their status as mere artifacts, evolving into invaluable reservoirs safeguarding the societal and historical memory. Through semantic organization from a fine-grained knowledge element perspective, it can facilitate semantic retrieval, semantic association, information visualization and knowledge discovery services for historical newspapers. In practice, it can empower researchers to unearth profound insights within the historical and cultural context, broadening the landscape of digital humanities research and practical applications.

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 19 January 2024

Meng Zhu and Xiaolong Xu

Intent detection (ID) and slot filling (SF) are two important tasks in natural language understanding. ID is to identify the main intent of a paragraph of text. The goal of SF is…

Abstract

Purpose

Intent detection (ID) and slot filling (SF) are two important tasks in natural language understanding. ID is to identify the main intent of a paragraph of text. The goal of SF is to extract the information that is important to the intent from the input sentence. However, most of the existing methods use sentence-level intention recognition, which has the risk of error propagation, and the relationship between intention recognition and SF is not explicitly modeled. Aiming at this problem, this paper proposes a collaborative model of ID and SF for intelligent spoken language understanding called ID-SF-Fusion.

Design/methodology/approach

ID-SF-Fusion uses Bidirectional Encoder Representation from Transformers (BERT) and Bidirectional Long Short-Term Memory (BiLSTM) to extract effective word embedding and context vectors containing the whole sentence information respectively. Fusion layer is used to provide intent–slot fusion information for SF task. In this way, the relationship between ID and SF task is fully explicitly modeled. This layer takes the result of ID and slot context vectors as input to obtain the fusion information which contains both ID result and slot information. Meanwhile, to further reduce error propagation, we use word-level ID for the ID-SF-Fusion model. Finally, two tasks of ID and SF are realized by joint optimization training.

Findings

We conducted experiments on two public datasets, Airline Travel Information Systems (ATIS) and Snips. The results show that the Intent ACC score and Slot F1 score of ID-SF-Fusion on ATIS and Snips are 98.0 per cent and 95.8 per cent, respectively, and the two indicators on Snips dataset are 98.6 per cent and 96.7 per cent, respectively. These models are superior to slot-gated, SF-ID NetWork, stack-Prop and other models. In addition, ablation experiments were performed to further analyze and discuss the proposed model.

Originality/value

This paper uses word-level intent recognition and introduces intent information into the SF process, which is a significant improvement on both data sets.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 10 February 2023

Huiyong Wang, Ding Yang, Liang Guo and Xiaoming Zhang

Intent detection and slot filling are two important tasks in question comprehension of a question answering system. This study aims to build a joint task model with some…

Abstract

Purpose

Intent detection and slot filling are two important tasks in question comprehension of a question answering system. This study aims to build a joint task model with some generalization ability and benchmark its performance over other neural network models mentioned in this paper.

Design/methodology/approach

This study used a deep-learning-based approach for the joint modeling of question intent detection and slot filling. Meanwhile, the internal cell structure of the long short-term memory (LSTM) network was improved. Furthermore, the dataset Computer Science Literature Question (CSLQ) was constructed based on the Science and Technology Knowledge Graph. The datasets Airline Travel Information Systems, Snips (a natural language processing dataset of the consumer intent engine collected by Snips) and CSLQ were used for the empirical analysis. The accuracy of intent detection and F1 score of slot filling, as well as the semantic accuracy of sentences, were compared for several models.

Findings

The results showed that the proposed model outperformed all other benchmark methods, especially for the CSLQ dataset. This proves that the design of this study improved the comprehensive performance and generalization ability of the model to some extent.

Originality/value

This study contributes to the understanding of question sentences in a specific domain. LSTM was improved, and a computer literature domain dataset was constructed herein. This will lay the data and model foundation for the future construction of a computer literature question answering system.

Details

Data Technologies and Applications, vol. 57 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 29 May 2023

Xiang Zheng, Mingjie Li, Ze Wan and Yan Zhang

This study aims to extract knowledge of ancient Chinese scientific and technological documents bibliographic summaries (STDBS) and provide the knowledge graph (KG) comprehensively…

Abstract

Purpose

This study aims to extract knowledge of ancient Chinese scientific and technological documents bibliographic summaries (STDBS) and provide the knowledge graph (KG) comprehensively and systematically. By presenting the relationship among content, discipline, and author, this study focuses on providing services for knowledge discovery of ancient Chinese scientific and technological documents.

Design/methodology/approach

This study compiles ancient Chinese STDBS and designs a knowledge mining and graph visualization framework. The authors define the summaries' entities, attributes, and relationships for knowledge representation, use deep learning techniques such as BERT-BiLSTM-CRF models and rules for knowledge extraction, unify the representation of entities for knowledge fusion, and use Neo4j and other visualization techniques for KG construction and application. This study presents the generation, distribution, and evolution of ancient Chinese agricultural scientific and technological knowledge in visualization graphs.

Findings

The knowledge mining and graph visualization framework is feasible and effective. The BERT-BiLSTM-CRF model has domain adaptability and accuracy. The knowledge generation of ancient Chinese agricultural scientific and technological documents has distinctive time features. The knowledge distribution is uneven and concentrated, mainly concentrated on C1-Planting and cultivation, C2-Silkworm, and C3-Mulberry and water conservancy. The knowledge evolution is apparent, and differentiation and integration coexist.

Originality/value

This study is the first to visually present the knowledge connotation and association of ancient Chinese STDBS. It solves the problems of the lack of in-depth knowledge mining and connotation visualization of ancient Chinese STDBS.

Details

Library Hi Tech, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0737-8831

Keywords

1 – 10 of 57