Search results

1 – 10 of over 13000
Article
Publication date: 9 August 2021

Xintong Zhao, Jane Greenberg, Vanessa Meschke, Eric Toberer and Xiaohua Hu

The output of academic literature has increased significantly due to digital technology, presenting researchers with a challenge across every discipline, including…

Abstract

Purpose

The output of academic literature has increased significantly due to digital technology, presenting researchers with a challenge across every discipline, including materials science, as it is impossible to manually read and extract knowledge from millions of published literature. The purpose of this study is to address this challenge by exploring knowledge extraction in materials science, as applied to digital scholarship. An overriding goal is to help inform readers about the status knowledge extraction in materials science.

Design/methodology/approach

The authors conducted a two-part analysis, comparing knowledge extraction methods applied materials science scholarship, across a sample of 22 articles; followed by a comparison of HIVE-4-MAT, an ontology-based knowledge extraction and MatScholar, a named entity recognition (NER) application. This paper covers contextual background, and a review of three tiers of knowledge extraction (ontology-based, NER and relation extraction), followed by the research goals and approach.

Findings

The results indicate three key needs for researchers to consider for advancing knowledge extraction: the need for materials science focused corpora; the need for researchers to define the scope of the research being pursued, and the need to understand the tradeoffs among different knowledge extraction methods. This paper also points to future material science research potential with relation extraction and increased availability of ontologies.

Originality/value

To the best of the authors’ knowledge, there are very few studies examining knowledge extraction in materials science. This work makes an important contribution to this underexplored research area.

Details

The Electronic Library , vol. 39 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Open Access
Article
Publication date: 14 August 2017

Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Anne H.H. Ngu and Yihong Zhang

This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.

1808

Abstract

Purpose

This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.

Design/methodology/approach

In particular, this study extracts new predicates from four types of data sources, namely, Web texts, Document Object Model (DOM) trees, existing KBs and query stream to augment the ontology of the existing KB (i.e. Freebase). In addition, a graph-based approach to conduct better truth discovery for multi-valued predicates is also proposed.

Findings

Empirical studies demonstrate the effectiveness of the approaches presented in this study and the potential of GrandBase. The future research directions regarding GrandBase construction and extension has also been discussed.

Originality/value

To revolutionize our modern society by using the wisdom of Big Data, considerable KBs have been constructed to feed the massive knowledge-driven applications with Resource Description Framework triples. The important challenges for KB construction include extracting information from large-scale, possibly conflicting and different-structured data sources (i.e. the knowledge extraction problem) and reconciling the conflicts that reside in the sources (i.e. the truth discovery problem). Tremendous research efforts have been contributed on both problems. However, the existing KBs are far from being comprehensive and accurate: first, existing knowledge extraction systems retrieve data from limited types of Web sources; second, existing truth discovery approaches commonly assume each predicate has only one true value. In this paper, the focus is on the problem of generating actionable knowledge from Big Data. A system is proposed, which consists of two phases, namely, knowledge extraction and truth discovery, to construct a broader KB, called GrandBase.

Details

PSU Research Review, vol. 1 no. 2
Type: Research Article
ISSN: 2399-1747

Keywords

Article
Publication date: 24 June 2020

Yilu Zhou and Yuan Xue

Strategic alliances among organizations are some of the central drivers of innovation and economic growth. However, the discovery of alliances has relied on pure manual…

171

Abstract

Purpose

Strategic alliances among organizations are some of the central drivers of innovation and economic growth. However, the discovery of alliances has relied on pure manual search and has limited scope. This paper proposes a text-mining framework, ACRank, that automatically extracts alliances from news articles. ACRank aims to provide human analysts with a higher coverage of strategic alliances compared to existing databases, yet maintain a reasonable extraction precision. It has the potential to discover alliances involving less well-known companies, a situation often neglected by commercial databases.

Design/methodology/approach

The proposed framework is a systematic process of alliance extraction and validation using natural language processing techniques and alliance domain knowledge. The process integrates news article search, entity extraction, and syntactic and semantic linguistic parsing techniques. In particular, Alliance Discovery Template (ADT) identifies a number of linguistic templates expanded from expert domain knowledge and extract potential alliances at sentence-level. Alliance Confidence Ranking (ACRank)further validates each unique alliance based on multiple features at document-level. The framework is designed to deal with extremely skewed, noisy data from news articles.

Findings

In evaluating the performance of ACRank on a gold standard data set of IBM alliances (2006–2008) showed that: Sentence-level ADT-based extraction achieved 78.1% recall and 44.7% precision and eliminated over 99% of the noise in news articles. ACRank further improved precision to 97% with the top20% of extracted alliance instances. Further comparison with Thomson Reuters SDC database showed that SDC covered less than 20% of total alliances, while ACRank covered 67%. When applying ACRank to Dow 30 company news articles, ACRank is estimated to achieve a recall between 0.48 and 0.95, and only 15% of the alliances appeared in SDC.

Originality/value

The research framework proposed in this paper indicates a promising direction of building a comprehensive alliance database using automatic approaches. It adds value to academic studies and business analyses that require in-depth knowledge of strategic alliances. It also encourages other innovative studies that use text mining and data analytics to study business relations.

Details

Information Technology & People, vol. 33 no. 5
Type: Research Article
ISSN: 0959-3845

Keywords

Article
Publication date: 20 April 2012

Mohamed Morsey, Jens Lehmann, Sören Auer, Claus Stadler and Sebastian Hellmann

DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL…

2324

Abstract

Purpose

DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However, the DBpedia release process is heavyweight and releases are sometimes based on several months old data. DBpedia‐Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. This paper seeks to address these issues.

Design/methodology/approach

Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of articles, which were recently updated. DBpedia‐Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia‐Live publishes the newly added/deleted triples in files, in order to enable synchronization between the DBpedia endpoint and other DBpedia mirrors.

Findings

During the realization of DBpedia‐Live the authors learned that it is crucial to process Wikipedia updates in a priority queue. Recently‐updated Wikipedia articles should have the highest priority, over mapping‐changes and unmodified pages. An overall finding is that there are plenty of opportunities arising from the emerging Web of Data for librarians.

Practical implications

DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Many companies and researchers use DBpedia and its public services to improve their applications and research approaches. The DBpedia‐Live framework improves DBpedia further by timely synchronizing it with Wikipedia, which is relevant for many use cases requiring up‐to‐date information.

Originality/value

The new DBpedia‐Live framework adds new features to the old DBpedia‐Live framework, e.g. abstract extraction, ontology changes, and changesets publication.

Details

Program, vol. 46 no. 2
Type: Research Article
ISSN: 0033-0337

Keywords

Article
Publication date: 12 April 2022

Jun Deng, Chuyi Zhong, Shaodan Sun and Ruan Wang

This paper aims to construct a spatio-temporal emotional framework (STEF) for digital humanities from a quantitative perspective, applying knowledge extraction and mining…

Abstract

Purpose

This paper aims to construct a spatio-temporal emotional framework (STEF) for digital humanities from a quantitative perspective, applying knowledge extraction and mining technology to promote innovation of humanities research paradigm and method.

Design/methodology/approach

The proposed STEF uses methods of information extraction, sentiment analysis and geographic information system to achieve knowledge extraction and mining. STEF integrates time, space and emotional elements to visualize the spatial and temporal evolution of emotions, which thus enriches the analytical paradigm in digital humanities.

Findings

The case study shows that STEF can effectively extract knowledge from unstructured texts in the field of Chinese Qing Dynasty novels. First, STEF introduces the knowledge extraction tools – MARKUS and DocuSky – to profile character entities and perform plots extraction. Second, STEF extracts the characters' emotional evolutionary trajectory from the temporal and spatial perspective. Finally, the study draws a spatio-temporal emotional path figure of the leading characters and integrates the corresponding plots to analyze the causes of emotion fluctuations.

Originality/value

The STEF is constructed based on the “spatio-temporal narrative theory” and “emotional narrative theory”. It is the first framework to integrate elements of time, space and emotion to analyze the emotional evolution trajectories of characters in novels. The execuability and operability of the framework is also verified with a case novel to suggest a new path for quantitative analysis of other novels.

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 14 May 2019

Ahsan Mahmood, Hikmat Ullah Khan, Zahoor Ur Rehman, Khalid Iqbal and Ch. Muhmmad Shahzad Faisal

The purpose of this research study is to extract and identify named entities from Hadith literature. Named entity recognition (NER) refers to the identification of the…

Abstract

Purpose

The purpose of this research study is to extract and identify named entities from Hadith literature. Named entity recognition (NER) refers to the identification of the named entities in a computer readable text having an annotation of categorization tags for information extraction. NER is an active research area in information management and information retrieval systems. NER serves as a baseline for machines to understand the context of a given content and helps in knowledge extraction. Although NER is considered as a solved task in major languages such as English, in languages such as Urdu, NER is still a challenging task. Moreover, NER depends on the language and domain of study; thus, it is gaining the attention of researchers in different domains.

Design/methodology/approach

This paper proposes a knowledge extraction framework using finite-state transducers (FSTs) – KEFST – to extract the named entities. KEFST consists of five steps: content extraction, tokenization, part of speech tagging, multi-word detection and NER. An extensive empirical analysis using the data corpus of Urdu translation of Sahih Al-Bukhari, a widely known hadith book, reveals that the proposed method effectively recognizes the entities to obtain better results.

Findings

The significant performance in terms of f-measure, precision and recall validates that the proposed model outperforms the existing methods for NER in the relevant literature.

Originality/value

This research is novel in this regard that no previous work is proposed in the Urdu language to extract named entities using FSTs and no previous work is proposed for Urdu hadith data NER.

Details

The Electronic Library , vol. 37 no. 2
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 5 February 2020

Mona Mohamed, Sharma Pillutla and Stella Tomasi

The purpose of this paper is to establish a new conceptual iterative framework for extracting knowledge from open government data (OGD). OGD is becoming a major source for…

Abstract

Purpose

The purpose of this paper is to establish a new conceptual iterative framework for extracting knowledge from open government data (OGD). OGD is becoming a major source for knowledge and innovation to generate economic value, if properly used. However, currently there are no standards or frameworks for applying knowledge continuum tactics, techniques and procedures (TTPs) to improve elicit knowledge extraction from OGD in a consistent manner.

Design/methodology/approach

This paper is based on a comprehensive review of literature on both OGD and knowledge management (KM) frameworks. It provides insights into the extraction of knowledge from OGD by using a vast array of phased KM TTPs into the OGD lifecycle phases.

Findings

The paper proposes a knowledge iterative value network (KIVN) as a new conceptual model that applies the principles of KM on OGD. KIVN operates through applying KM TTPs to transfer and transform discrete data into valuable knowledge.

Research limitations/implications

This model covers the most important knowledge elicitation steps; however, users who are interested in using KIVN phases may need to slightly customize it based on their environment and OGD policy and procedure.

Practical implications

After its validation, the model allows facilitating systemic manipulation of OGD for both data-consuming industries and data-producing governments to establish new business models and governance schemes to better make use of OGD.

Originality/value

This paper offers new perspectives on eliciting knowledge from OGD and discussing crucial, but overlooked area of the OGD arena, namely, knowledge extraction through KM principles.

Details

VINE Journal of Information and Knowledge Management Systems, vol. 50 no. 3
Type: Research Article
ISSN: 2059-5891

Keywords

Article
Publication date: 24 September 2021

Nina Rizun, Aleksandra Revina and Vera G. Meister

This study aims to draw the attention of business process management (BPM) research and practice to the textual data generated in the processes and the potential of…

Abstract

Purpose

This study aims to draw the attention of business process management (BPM) research and practice to the textual data generated in the processes and the potential of meaningful insights extraction. The authors apply standard natural language processing (NLP) approaches to gain valuable knowledge in the form of business process (BP) complexity concept suggested in the study. It is built on the objective, subjective and meta-knowledge extracted from the BP textual data and encompassing semantics, syntax and stylistics. As a result, the authors aim to create awareness about cognitive, attention and reading efforts forming the textual data-based BP complexity. The concept serves as a basis for the development of various decision-support solutions for BP workers.

Design/methodology/approach

The starting point is an investigation of the complexity concept in the BPM literature to develop an understanding of the related complexity research and to put the textual data-based BP complexity in its context. Afterward, utilizing the linguistic foundations and the theory of situation awareness (SA), the concept is empirically developed and evaluated in a real-world application case using qualitative interview-based and quantitative data-based methods.

Findings

In the practical, real-world application, the authors confirmed that BP textual data could be used to predict BP complexity from the semantic, syntactic and stylistic viewpoints. The authors were able to prove the value of this knowledge about the BP complexity formed based on the (1) professional contextual experience of the BP worker enriched by the awareness of cognitive efforts required for BP execution (objective knowledge), (2) business emotions enriched by attention efforts (subjective knowledge) and (3) quality of the text, i.e. professionalism, expertise and stress level of the text author, enriched by reading efforts (meta-knowledge). In particular, the BP complexity concept has been applied to an industrial example of Information Technology Infrastructure Library (ITIL) change management (CHM) Information Technology (IT) ticket processing. The authors used IT ticket texts from two samples of 28,157 and 4,625 tickets as the basis for the analysis. The authors evaluated the concept with the help of manually labeled tickets and a rule-based approach using historical ticket execution data. Having a recommendation character, the results showed to be useful in creating awareness regarding cognitive, attention and reading efforts for ITIL CHM BP workers coordinating the IT ticket processing.

Originality/value

While aiming to draw attention to those valuable insights inherent in BP textual data, the authors propose an unconventional approach to BP complexity definition through the lens of textual data. Hereby, the authors address the challenges specified by BPM researchers, i.e. focus on semantics in the development of vocabularies and organization- and sector-specific adaptation of standard NLP techniques.

Details

Business Process Management Journal, vol. 27 no. 7
Type: Research Article
ISSN: 1463-7154

Keywords

Article
Publication date: 1 December 1998

Ramana Rao and Ralph H. Sprague

This paper ‘looks” into one of the most novel knowledge management technology products that has been brought to the market in the recent years. The authors describe two…

1159

Abstract

This paper ‘looks” into one of the most novel knowledge management technology products that has been brought to the market in the recent years. The authors describe two technologies, information visualization and knowledge extraction, for leveraging our natural abilities of vision, language and memory. They discuss a way for exploiting structure that is available in the information system in one case (traditionally called structured) and easily perceived by humans in the other (traditionally called unstructured). The two technologies focus on the two sides of this goal, respectively. They demonstrate the value of these technologies in supporting interaction with much larger amounts of information than was possible with previous graphical interfaces and in guiding access and use of the information and often for automating portions of the work.

Details

Journal of Knowledge Management, vol. 2 no. 2
Type: Research Article
ISSN: 1367-3270

Keywords

Article
Publication date: 1 November 2006

Chyan Yang, Liang‐Chu Chen and Chun‐Yen Peng

This paper seeks to establish an extraction system for an information technology (IT) product specification named ITSIES which combines the natural language process (NLP…

Abstract

Purpose

This paper seeks to establish an extraction system for an information technology (IT) product specification named ITSIES which combines the natural language process (NLP) with the ontology concept and also to evaluate the system's effectiveness in advance.

Design/methodology/approach

The development of the system is based on a prototype design and performance validation. This study adopts four classes of IT specification (PC, Unix server, Monitor, and Printer) that follow IBM's and HP's product lines as the baseline information in order to construct the extraction system in GATE (General Architecture for Text Engineering) tools and to examine the IT product specification with other brands and patterns. Additionally indices are adopted such as precision, recall, and F‐measure as the matrices for evaluating system performance.

Findings

The performance shows that the average recall, precision, and F‐measure are all over 90 per cent, revealing that the JAPE (Java Annotation Patterns Engine) grammar rules in the IT domain are reasonably good and generally in line with expectations.

Originality/value

The paper proposes an integrative framework to examine IT product specification information and demonstrates that the system is effective for IT application.

Details

The Electronic Library, vol. 24 no. 6
Type: Research Article
ISSN: 0264-0473

Keywords

1 – 10 of over 13000