Search results

1 – 10 of over 16000
Article
Publication date: 9 August 2021

Xintong Zhao, Jane Greenberg, Vanessa Meschke, Eric Toberer and Xiaohua Hu

The output of academic literature has increased significantly due to digital technology, presenting researchers with a challenge across every discipline, including materials…

Abstract

Purpose

The output of academic literature has increased significantly due to digital technology, presenting researchers with a challenge across every discipline, including materials science, as it is impossible to manually read and extract knowledge from millions of published literature. The purpose of this study is to address this challenge by exploring knowledge extraction in materials science, as applied to digital scholarship. An overriding goal is to help inform readers about the status knowledge extraction in materials science.

Design/methodology/approach

The authors conducted a two-part analysis, comparing knowledge extraction methods applied materials science scholarship, across a sample of 22 articles; followed by a comparison of HIVE-4-MAT, an ontology-based knowledge extraction and MatScholar, a named entity recognition (NER) application. This paper covers contextual background, and a review of three tiers of knowledge extraction (ontology-based, NER and relation extraction), followed by the research goals and approach.

Findings

The results indicate three key needs for researchers to consider for advancing knowledge extraction: the need for materials science focused corpora; the need for researchers to define the scope of the research being pursued, and the need to understand the tradeoffs among different knowledge extraction methods. This paper also points to future material science research potential with relation extraction and increased availability of ontologies.

Originality/value

To the best of the authors’ knowledge, there are very few studies examining knowledge extraction in materials science. This work makes an important contribution to this underexplored research area.

Details

The Electronic Library , vol. 39 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Open Access
Article
Publication date: 14 August 2017

Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Anne H.H. Ngu and Yihong Zhang

This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.

2049

Abstract

Purpose

This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.

Design/methodology/approach

In particular, this study extracts new predicates from four types of data sources, namely, Web texts, Document Object Model (DOM) trees, existing KBs and query stream to augment the ontology of the existing KB (i.e. Freebase). In addition, a graph-based approach to conduct better truth discovery for multi-valued predicates is also proposed.

Findings

Empirical studies demonstrate the effectiveness of the approaches presented in this study and the potential of GrandBase. The future research directions regarding GrandBase construction and extension has also been discussed.

Originality/value

To revolutionize our modern society by using the wisdom of Big Data, considerable KBs have been constructed to feed the massive knowledge-driven applications with Resource Description Framework triples. The important challenges for KB construction include extracting information from large-scale, possibly conflicting and different-structured data sources (i.e. the knowledge extraction problem) and reconciling the conflicts that reside in the sources (i.e. the truth discovery problem). Tremendous research efforts have been contributed on both problems. However, the existing KBs are far from being comprehensive and accurate: first, existing knowledge extraction systems retrieve data from limited types of Web sources; second, existing truth discovery approaches commonly assume each predicate has only one true value. In this paper, the focus is on the problem of generating actionable knowledge from Big Data. A system is proposed, which consists of two phases, namely, knowledge extraction and truth discovery, to construct a broader KB, called GrandBase.

Details

PSU Research Review, vol. 1 no. 2
Type: Research Article
ISSN: 2399-1747

Keywords

Article
Publication date: 24 June 2020

Yilu Zhou and Yuan Xue

Strategic alliances among organizations are some of the central drivers of innovation and economic growth. However, the discovery of alliances has relied on pure manual search and…

234

Abstract

Purpose

Strategic alliances among organizations are some of the central drivers of innovation and economic growth. However, the discovery of alliances has relied on pure manual search and has limited scope. This paper proposes a text-mining framework, ACRank, that automatically extracts alliances from news articles. ACRank aims to provide human analysts with a higher coverage of strategic alliances compared to existing databases, yet maintain a reasonable extraction precision. It has the potential to discover alliances involving less well-known companies, a situation often neglected by commercial databases.

Design/methodology/approach

The proposed framework is a systematic process of alliance extraction and validation using natural language processing techniques and alliance domain knowledge. The process integrates news article search, entity extraction, and syntactic and semantic linguistic parsing techniques. In particular, Alliance Discovery Template (ADT) identifies a number of linguistic templates expanded from expert domain knowledge and extract potential alliances at sentence-level. Alliance Confidence Ranking (ACRank)further validates each unique alliance based on multiple features at document-level. The framework is designed to deal with extremely skewed, noisy data from news articles.

Findings

In evaluating the performance of ACRank on a gold standard data set of IBM alliances (2006–2008) showed that: Sentence-level ADT-based extraction achieved 78.1% recall and 44.7% precision and eliminated over 99% of the noise in news articles. ACRank further improved precision to 97% with the top20% of extracted alliance instances. Further comparison with Thomson Reuters SDC database showed that SDC covered less than 20% of total alliances, while ACRank covered 67%. When applying ACRank to Dow 30 company news articles, ACRank is estimated to achieve a recall between 0.48 and 0.95, and only 15% of the alliances appeared in SDC.

Originality/value

The research framework proposed in this paper indicates a promising direction of building a comprehensive alliance database using automatic approaches. It adds value to academic studies and business analyses that require in-depth knowledge of strategic alliances. It also encourages other innovative studies that use text mining and data analytics to study business relations.

Details

Information Technology & People, vol. 33 no. 5
Type: Research Article
ISSN: 0959-3845

Keywords

Article
Publication date: 14 November 2023

Shaodan Sun, Jun Deng and Xugong Qin

This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained…

Abstract

Purpose

This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained knowledge element perspective. This endeavor seeks to unlock the latent value embedded within newspaper contents while simultaneously furnishing invaluable guidance within methodological paradigms for research in the humanities domain.

Design/methodology/approach

According to the semantic organization process and knowledge element concept, this study proposes a holistic framework, including four pivotal stages: knowledge element description, extraction, association and application. Initially, a semantic description model dedicated to knowledge elements is devised. Subsequently, harnessing the advanced deep learning techniques, the study delves into the realm of entity recognition and relationship extraction. These techniques are instrumental in identifying entities within the historical newspaper contents and capturing the interdependencies that exist among them. Finally, an online platform based on Flask is developed to enable the recognition of entities and relationships within historical newspapers.

Findings

This article utilized the Shengjing Times·Changchun Compilation as the datasets for describing, extracting, associating and applying newspapers contents. Regarding knowledge element extraction, the BERT + BS consistently outperforms Bi-LSTM, CRF++ and even BERT in terms of Recall and F1 scores, making it a favorable choice for entity recognition in this context. Particularly noteworthy is the Bi-LSTM-Pro model, which stands out with the highest scores across all metrics, notably achieving an exceptional F1 score in knowledge element relationship recognition.

Originality/value

Historical newspapers transcend their status as mere artifacts, evolving into invaluable reservoirs safeguarding the societal and historical memory. Through semantic organization from a fine-grained knowledge element perspective, it can facilitate semantic retrieval, semantic association, information visualization and knowledge discovery services for historical newspapers. In practice, it can empower researchers to unearth profound insights within the historical and cultural context, broadening the landscape of digital humanities research and practical applications.

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 29 May 2023

Jinxiang Zeng, Shujin Cao, Yijin Chen, Pei Pan and Yafang Cai

This study analyzed the interdisciplinary characteristics of Chinese research studies in library and information science (LIS) measured by knowledge elements extracted through the…

Abstract

Purpose

This study analyzed the interdisciplinary characteristics of Chinese research studies in library and information science (LIS) measured by knowledge elements extracted through the Lexicon-LSTM model.

Design/methodology/approach

Eight research themes were selected for experiment, with a large-scale (N = 11,625) dataset of research papers from the China National Knowledge Infrastructure (CNKI) database constructed. And it is complemented with multiple corpora. Knowledge elements were extracted through a Lexicon-LSTM model. A subject knowledge graph is constructed to support the searching and classification of knowledge elements. An interdisciplinary-weighted average citation index space was constructed for measuring the interdisciplinary characteristics and contributions based on knowledge elements.

Findings

The empirical research shows that the Lexicon-LSTM model has superiority in the accuracy of extracting knowledge elements. In the field of LIS, the interdisciplinary diversity indicator showed an upward trend from 2011 to 2021, while the disciplinary balance and difference indicators showed a downward trend. The knowledge elements of theory and methodology could be used to detect and measure the interdisciplinary characteristics and contributions.

Originality/value

The extraction of knowledge elements facilitates the discovery of semantic information embedded in academic papers. The knowledge elements were proved feasible for measuring the interdisciplinary characteristics and exploring the changes in the time sequence, which helps for overview the state of the arts and future development trend of the interdisciplinary of research theme in LIS.

Details

Aslib Journal of Information Management, vol. 75 no. 3
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 20 April 2012

Mohamed Morsey, Jens Lehmann, Sören Auer, Claus Stadler and Sebastian Hellmann

DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However…

2399

Abstract

Purpose

DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However, the DBpedia release process is heavyweight and releases are sometimes based on several months old data. DBpedia‐Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. This paper seeks to address these issues.

Design/methodology/approach

Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of articles, which were recently updated. DBpedia‐Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia‐Live publishes the newly added/deleted triples in files, in order to enable synchronization between the DBpedia endpoint and other DBpedia mirrors.

Findings

During the realization of DBpedia‐Live the authors learned that it is crucial to process Wikipedia updates in a priority queue. Recently‐updated Wikipedia articles should have the highest priority, over mapping‐changes and unmodified pages. An overall finding is that there are plenty of opportunities arising from the emerging Web of Data for librarians.

Practical implications

DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Many companies and researchers use DBpedia and its public services to improve their applications and research approaches. The DBpedia‐Live framework improves DBpedia further by timely synchronizing it with Wikipedia, which is relevant for many use cases requiring up‐to‐date information.

Originality/value

The new DBpedia‐Live framework adds new features to the old DBpedia‐Live framework, e.g. abstract extraction, ontology changes, and changesets publication.

Details

Program, vol. 46 no. 2
Type: Research Article
ISSN: 0033-0337

Keywords

Article
Publication date: 28 December 2023

Na Xu, Yanxiang Liang, Chaoran Guo, Bo Meng, Xueqing Zhou, Yuting Hu and Bo Zhang

Safety management plays an important part in coal mine construction. Due to complex data, the implementation of the construction safety knowledge scattered in standards poses a…

Abstract

Purpose

Safety management plays an important part in coal mine construction. Due to complex data, the implementation of the construction safety knowledge scattered in standards poses a challenge. This paper aims to develop a knowledge extraction model to automatically and efficiently extract domain knowledge from unstructured texts.

Design/methodology/approach

Bidirectional encoder representations from transformers (BERT)-bidirectional long short-term memory (BiLSTM)-conditional random field (CRF) method based on a pre-training language model was applied to carry out knowledge entity recognition in the field of coal mine construction safety in this paper. Firstly, 80 safety standards for coal mine construction were collected, sorted out and marked as a descriptive corpus. Then, the BERT pre-training language model was used to obtain dynamic word vectors. Finally, the BiLSTM-CRF model concluded the entity’s optimal tag sequence.

Findings

Accordingly, 11,933 entities and 2,051 relationships in the standard specifications texts of this paper were identified and a language model suitable for coal mine construction safety management was proposed. The experiments showed that F1 values were all above 60% in nine types of entities such as security management. F1 value of this model was more than 60% for entity extraction. The model identified and extracted entities more accurately than conventional methods.

Originality/value

This work completed the domain knowledge query and built a Q&A platform via entities and relationships identified by the standard specifications suitable for coal mines. This paper proposed a systematic framework for texts in coal mine construction safety to improve efficiency and accuracy of domain-specific entity extraction. In addition, the pretraining language model was also introduced into the coal mine construction safety to realize dynamic entity recognition, which provides technical support and theoretical reference for the optimization of safety management platforms.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 12 April 2022

Jun Deng, Chuyi Zhong, Shaodan Sun and Ruan Wang

This paper aims to construct a spatio-temporal emotional framework (STEF) for digital humanities from a quantitative perspective, applying knowledge extraction and mining…

Abstract

Purpose

This paper aims to construct a spatio-temporal emotional framework (STEF) for digital humanities from a quantitative perspective, applying knowledge extraction and mining technology to promote innovation of humanities research paradigm and method.

Design/methodology/approach

The proposed STEF uses methods of information extraction, sentiment analysis and geographic information system to achieve knowledge extraction and mining. STEF integrates time, space and emotional elements to visualize the spatial and temporal evolution of emotions, which thus enriches the analytical paradigm in digital humanities.

Findings

The case study shows that STEF can effectively extract knowledge from unstructured texts in the field of Chinese Qing Dynasty novels. First, STEF introduces the knowledge extraction tools – MARKUS and DocuSky – to profile character entities and perform plots extraction. Second, STEF extracts the characters' emotional evolutionary trajectory from the temporal and spatial perspective. Finally, the study draws a spatio-temporal emotional path figure of the leading characters and integrates the corresponding plots to analyze the causes of emotion fluctuations.

Originality/value

The STEF is constructed based on the “spatio-temporal narrative theory” and “emotional narrative theory”. It is the first framework to integrate elements of time, space and emotion to analyze the emotional evolution trajectories of characters in novels. The execuability and operability of the framework is also verified with a case novel to suggest a new path for quantitative analysis of other novels.

Details

Aslib Journal of Information Management, vol. 74 no. 6
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 29 March 2024

Sihao Li, Jiali Wang and Zhao Xu

The compliance checking of Building Information Modeling (BIM) models is crucial throughout the lifecycle of construction. The increasing amount and complexity of information…

Abstract

Purpose

The compliance checking of Building Information Modeling (BIM) models is crucial throughout the lifecycle of construction. The increasing amount and complexity of information carried by BIM models have made compliance checking more challenging, and manual methods are prone to errors. Therefore, this study aims to propose an integrative conceptual framework for automated compliance checking of BIM models, allowing for the identification of errors within BIM models.

Design/methodology/approach

This study first analyzed the typical building standards in the field of architecture and fire protection, and then the ontology of these elements is developed. Based on this, a building standard corpus is built, and deep learning models are trained to automatically label the building standard texts. The Neo4j is utilized for knowledge graph construction and storage, and a data extraction method based on the Dynamo is designed to obtain checking data files. After that, a matching algorithm is devised to express the logical rules of knowledge graph triples, resulting in automated compliance checking for BIM models.

Findings

Case validation results showed that this theoretical framework can achieve the automatic construction of domain knowledge graphs and automatic checking of BIM model compliance. Compared with traditional methods, this method has a higher degree of automation and portability.

Originality/value

This study introduces knowledge graphs and natural language processing technology into the field of BIM model checking and completes the automated process of constructing domain knowledge graphs and checking BIM model data. The validation of its functionality and usability through two case studies on a self-developed BIM checking platform.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 14 May 2019

Ahsan Mahmood, Hikmat Ullah Khan, Zahoor Ur Rehman, Khalid Iqbal and Ch. Muhmmad Shahzad Faisal

The purpose of this research study is to extract and identify named entities from Hadith literature. Named entity recognition (NER) refers to the identification of the named…

Abstract

Purpose

The purpose of this research study is to extract and identify named entities from Hadith literature. Named entity recognition (NER) refers to the identification of the named entities in a computer readable text having an annotation of categorization tags for information extraction. NER is an active research area in information management and information retrieval systems. NER serves as a baseline for machines to understand the context of a given content and helps in knowledge extraction. Although NER is considered as a solved task in major languages such as English, in languages such as Urdu, NER is still a challenging task. Moreover, NER depends on the language and domain of study; thus, it is gaining the attention of researchers in different domains.

Design/methodology/approach

This paper proposes a knowledge extraction framework using finite-state transducers (FSTs) – KEFST – to extract the named entities. KEFST consists of five steps: content extraction, tokenization, part of speech tagging, multi-word detection and NER. An extensive empirical analysis using the data corpus of Urdu translation of Sahih Al-Bukhari, a widely known hadith book, reveals that the proposed method effectively recognizes the entities to obtain better results.

Findings

The significant performance in terms of f-measure, precision and recall validates that the proposed model outperforms the existing methods for NER in the relevant literature.

Originality/value

This research is novel in this regard that no previous work is proposed in the Urdu language to extract named entities using FSTs and no previous work is proposed for Urdu hadith data NER.

Details

The Electronic Library , vol. 37 no. 2
Type: Research Article
ISSN: 0264-0473

Keywords

1 – 10 of over 16000