Search results

1 – 10 of over 3000
To view the access options for this content please click here
Article
Publication date: 1 May 2020

Qihang Wu, Daifeng Li, Lu Huang and Biyun Ye

Entity relation extraction is an important research direction to obtain structured information. However, most of the current methods are to determine the relations between…

Abstract

Purpose

Entity relation extraction is an important research direction to obtain structured information. However, most of the current methods are to determine the relations between entities in a given sentence based on a stepwise method, seldom considering entities and relations into a unified framework. The joint learning method is an optimal solution that combines relations and entities. This paper aims to optimize hierarchical reinforcement learning framework and provide an efficient model to extract entity relation.

Design/methodology/approach

This paper is based on the hierarchical reinforcement learning framework of joint learning and combines the model with BERT, the best language representation model, to optimize the word embedding and encoding process. Besides, this paper adjusts some punctuation marks to make the data set more standardized, and introduces positional information to improve the performance of the model.

Findings

Experiments show that the model proposed in this paper outperforms the baseline model with a 13% improvement, and achieve 0.742 in F1 score in NYT10 data set. This model can effectively extract entities and relations in large-scale unstructured text and can be applied to the fields of multi-domain information retrieval, intelligent understanding and intelligent interaction.

Originality/value

The research provides an efficient solution for researchers in a different domain to make use of artificial intelligence (AI) technologies to process their unstructured text more accurately.

Details

Information Discovery and Delivery, vol. 48 no. 3
Type: Research Article
ISSN: 2398-6247

Keywords

To view the access options for this content please click here
Article
Publication date: 14 May 2019

Ahsan Mahmood, Hikmat Ullah Khan, Zahoor Ur Rehman, Khalid Iqbal and Ch. Muhmmad Shahzad Faisal

The purpose of this research study is to extract and identify named entities from Hadith literature. Named entity recognition (NER) refers to the identification of the…

Abstract

Purpose

The purpose of this research study is to extract and identify named entities from Hadith literature. Named entity recognition (NER) refers to the identification of the named entities in a computer readable text having an annotation of categorization tags for information extraction. NER is an active research area in information management and information retrieval systems. NER serves as a baseline for machines to understand the context of a given content and helps in knowledge extraction. Although NER is considered as a solved task in major languages such as English, in languages such as Urdu, NER is still a challenging task. Moreover, NER depends on the language and domain of study; thus, it is gaining the attention of researchers in different domains.

Design/methodology/approach

This paper proposes a knowledge extraction framework using finite-state transducers (FSTs) – KEFST – to extract the named entities. KEFST consists of five steps: content extraction, tokenization, part of speech tagging, multi-word detection and NER. An extensive empirical analysis using the data corpus of Urdu translation of Sahih Al-Bukhari, a widely known hadith book, reveals that the proposed method effectively recognizes the entities to obtain better results.

Findings

The significant performance in terms of f-measure, precision and recall validates that the proposed model outperforms the existing methods for NER in the relevant literature.

Originality/value

This research is novel in this regard that no previous work is proposed in the Urdu language to extract named entities using FSTs and no previous work is proposed for Urdu hadith data NER.

Details

The Electronic Library , vol. 37 no. 2
Type: Research Article
ISSN: 0264-0473

Keywords

To view the access options for this content please click here
Article
Publication date: 24 June 2020

Yilu Zhou and Yuan Xue

Strategic alliances among organizations are some of the central drivers of innovation and economic growth. However, the discovery of alliances has relied on pure manual…

Abstract

Purpose

Strategic alliances among organizations are some of the central drivers of innovation and economic growth. However, the discovery of alliances has relied on pure manual search and has limited scope. This paper proposes a text-mining framework, ACRank, that automatically extracts alliances from news articles. ACRank aims to provide human analysts with a higher coverage of strategic alliances compared to existing databases, yet maintain a reasonable extraction precision. It has the potential to discover alliances involving less well-known companies, a situation often neglected by commercial databases.

Design/methodology/approach

The proposed framework is a systematic process of alliance extraction and validation using natural language processing techniques and alliance domain knowledge. The process integrates news article search, entity extraction, and syntactic and semantic linguistic parsing techniques. In particular, Alliance Discovery Template (ADT) identifies a number of linguistic templates expanded from expert domain knowledge and extract potential alliances at sentence-level. Alliance Confidence Ranking (ACRank)further validates each unique alliance based on multiple features at document-level. The framework is designed to deal with extremely skewed, noisy data from news articles.

Findings

In evaluating the performance of ACRank on a gold standard data set of IBM alliances (2006–2008) showed that: Sentence-level ADT-based extraction achieved 78.1% recall and 44.7% precision and eliminated over 99% of the noise in news articles. ACRank further improved precision to 97% with the top20% of extracted alliance instances. Further comparison with Thomson Reuters SDC database showed that SDC covered less than 20% of total alliances, while ACRank covered 67%. When applying ACRank to Dow 30 company news articles, ACRank is estimated to achieve a recall between 0.48 and 0.95, and only 15% of the alliances appeared in SDC.

Originality/value

The research framework proposed in this paper indicates a promising direction of building a comprehensive alliance database using automatic approaches. It adds value to academic studies and business analyses that require in-depth knowledge of strategic alliances. It also encourages other innovative studies that use text mining and data analytics to study business relations.

Details

Information Technology & People, vol. 33 no. 5
Type: Research Article
ISSN: 0959-3845

Keywords

To view the access options for this content please click here
Article
Publication date: 18 June 2018

Chao Dong and Chongchong Zhao

Online encyclopedia has facilitated users to easily access interesting knowledge and find solutions for daily problems. However, for the staff in specific domains…

Abstract

Purpose

Online encyclopedia has facilitated users to easily access interesting knowledge and find solutions for daily problems. However, for the staff in specific domains, especially in secret-related domains, a domain-micropedia is still necessary for work.

Design/methodology/approach

In this paper, the authors propose an approach to extract entities from DBpedia and construct the SDPedia in space debris mitigation domain. First, the authors select the root categories about space debris mitigation domain by manual methods. Subsequently, the authors propose Distance of Electrical Resistance, Pages Common Words and AVDP algorithms to implement the extraction. The authors also achieve the data visualization by generating swf files and embedding them into web pages.

Findings

In the experiments, the precision, recall and F1-measure are used to evaluate the proposed algorithms. The authors set a series of thresholds to pursue the highest F1-measure. The experimental data indicate that the AVDP algorithm gets the highest F1-measure and is statistically effective for the entities extraction from DBpedia.

Originality/value

The authors propose an approach of deriving linked data from DBpedia and construct their own SDPedia, which has been applied in the space debris mitigation domain currently. Compared with DBpedia, the authors also add the linked data visualization. Moreover, the methodology can be used in many other domains in the future.

Details

International Journal of Web Information Systems, vol. 14 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

To view the access options for this content please click here
Article
Publication date: 11 July 2019

Yazhong Deng

The purpose of this study was to establish a massive online open course (MOOC)-based map of higher education knowledge and apply it to university libraries. It hoped to…

Abstract

Purpose

The purpose of this study was to establish a massive online open course (MOOC)-based map of higher education knowledge and apply it to university libraries. It hoped to provide more targeted and personalized learning services for every learner.

Design/methodology/approach

In this study, MOOC and university library information services were outlined, the development status of MOOC at home and abroad and the development of university library information services were introduced, and the necessity and significance of MOOC in developing information services in university libraries were analyzed. What is more, the knowledge map of university libraries was explored. The four modules include the construction of data sets, the identification of related entities from plain text, the extraction of entity relationships and the practical application of knowledge maps. For the logical relationship of the course, a combination of knowledge base and machine learning was adopted. In the knowledge map application module, the knowledge map was visualized. Aiming at the generation of personalized learning scheme, a prior data set was constructed by means of the knowledge base. The original problem was considered as a multi-classification problem. K-nearest neighbor classifier divided all courses into four academic years to obtain all courses. According to the course stage, the personalized learning scheme of some majors in higher education was obtained.

Findings

The experiment showed that it was feasible to apply the higher education knowledge map based on MOOC to university libraries. In addition, it was effective to divide the course into four stages by classifier. In this way, the specific professional training program can be obtained, the information service of the university library can be improved, and the accuracy and richness of the entire learning program can be increased.

Research limitations/implications

Due to the limitations of conditions, time and other aspects, there were not many opportunities to visit the field library, which led to limited level and imperfect research. There were many proper nouns and professional terms in foreign references, but my English translation ability was limited. The relevant investigation on foreign studies may not be detailed and comprehensive enough, and the analysis and induction of influencing factors of university library information service may not be rigorous and concise enough.

Practical implications

As the base of university information dissemination, the university library is the source of knowledge. At the same time, it is also the temple of students’ independent learning and the media of mainstream culture and improving its own information service level is also in line with the trend of The Times. Under this background, this research studied the influence of MOOC on university library information service and focused on the challenges and opportunities faced by university library information service in the MOOC environment, so as to continuously improve its cultural serviceability and better serve teachers and students.

Originality/value

Since the birth of MOOC, they have exerted great influence and enlightenment on universities and relevant educational institutions within a few years. European and American universities take an active part in the construction of the MOOC platform and explore how to make better use of the library to build MOOC resources in practice. It is also a hot topic for university libraries to participate in the construction of MOOC information resources. Therefore, the study of this topic has both theoretical and practical significance.

Details

The Electronic Library, vol. 37 no. 5
Type: Research Article
ISSN: 0264-0473

Keywords

To view the access options for this content please click here
Article
Publication date: 21 December 2020

Sudha Cheerkoot-Jalim and Kavi Kumar Khedo

This work shows the results of a systematic literature review on biomedical text mining. The purpose of this study is to identify the different text mining approaches used…

Abstract

Purpose

This work shows the results of a systematic literature review on biomedical text mining. The purpose of this study is to identify the different text mining approaches used in different application areas of the biomedical domain, the common tools used and the challenges of biomedical text mining as compared to generic text mining algorithms. This study will be of value to biomedical researchers by allowing them to correlate text mining approaches to specific biomedical application areas. Implications for future research are also discussed.

Design/methodology/approach

The review was conducted following the principles of the Kitchenham method. A number of research questions were first formulated, followed by the definition of the search strategy. The papers were then selected based on a list of assessment criteria. Each of the papers were analyzed and information relevant to the research questions were extracted.

Findings

It was found that researchers have mostly harnessed data sources such as electronic health records, biomedical literature, social media and health-related forums. The most common text mining technique was natural language processing using tools such as MetaMap and Unstructured Information Management Architecture, alongside the use of medical terminologies such as Unified Medical Language System. The main application area was the detection of adverse drug events. Challenges identified included the need to deal with huge amounts of text, the heterogeneity of the different data sources, the duality of meaning of words in biomedical text and the amount of noise introduced mainly from social media and health-related forums.

Originality/value

To the best of the authors’ knowledge, other reviews in this area have focused on either specific techniques, specific application areas or specific data sources. The results of this review will help researchers to correlate most relevant and recent advances in text mining approaches to specific biomedical application areas by providing an up-to-date and holistic view of work done in this research area. The use of emerging text mining techniques has great potential to spur the development of innovative applications, thus considerably impacting on the advancement of biomedical research.

Details

Journal of Knowledge Management, vol. 25 no. 3
Type: Research Article
ISSN: 1367-3270

Keywords

To view the access options for this content please click here
Article
Publication date: 14 June 2019

Nora Madi, Rawan Al-Matham and Hend Al-Khalifa

The purpose of this paper is to provide an overall review of grammar checking and relation extraction (RE) literature, their techniques and the open challenges associated…

Abstract

Purpose

The purpose of this paper is to provide an overall review of grammar checking and relation extraction (RE) literature, their techniques and the open challenges associated with them; and, finally, suggest future directions.

Design/methodology/approach

The review on grammar checking and RE was carried out using the following protocol: we prepared research questions, planed for searching strategy, addressed paper selection criteria to distinguish relevant works, extracted data from these works, and finally, analyzed and synthesized the data.

Findings

The output of error detection models could be used for creating a profile of a certain writer. Such profiles can be used for author identification, native language identification or even the level of education, to name a few. The automatic extraction of relations could be used to build or complete electronic lexical thesauri and knowledge bases.

Originality/value

Grammar checking is the process of detecting and sometimes correcting erroneous words in the text, while RE is the process of detecting and categorizing predefined relationships between entities or words that were identified in the text. The authors found that the most obvious challenge is the lack of data sets, especially for low-resource languages. Also, the lack of unified evaluation methods hinders the ability to compare results.

Details

Data Technologies and Applications, vol. 53 no. 3
Type: Research Article
ISSN: 2514-9288

Keywords

To view the access options for this content please click here
Article
Publication date: 14 October 2013

Trond Aalberg and Maja Žumer

Bibliographic records should now be used in innovative end-user applications that enable users to learn about, discover and exploit available content, and this information…

Abstract

Purpose

Bibliographic records should now be used in innovative end-user applications that enable users to learn about, discover and exploit available content, and this information should be interpreted and reused also beyond the library domain. New conceptual models such as FRBR offer the foundation for such developments. The main motivation for this research is to contribute to the adoption of the FRBR model in future bibliographic standards and systems, by analysing limitations in existing bibliographic information and looking for short- and long-term solutions that can improve the data quality in terms of expressing the FRBR model.

Design/methodology/approach

MARC records in three collections (BIBSYS catalogue, Slovenian National Bibliography and BTJ catalogue) were first analysed by looking at statistics of field and subfield usage to determine common patterns that express FRBR. Based on this, different rules for interpreting the information were developed. Finally typical problems/errors found in MARC records were analysed.

Findings

Different types of FRBR entity-relationship structures that typically can be found in bibliographic records are identified. Problems related to interpreting these from bibliographic records are analyzed. Frbrisation of consistent and complete MARC records is relatively successful, particularly if all entities are systematically described and relationships among them are clearly indicated.

Research limitations/implications

Advanced matching was not used for clustering of identical entities.

Practical implications

Cataloguing guidelines are proposed to enable better frbrisation of MARC records in the interim period, before new formats are developed and implemented.

Originality/value

This is the first in depth analysis of manifestations embodying several expressions and of works and agents as subjects.

Details

Journal of Documentation, vol. 69 no. 6
Type: Research Article
ISSN: 0022-0418

Keywords

To view the access options for this content please click here
Article
Publication date: 20 April 2015

Mahmoud Rammal, Zeinab Bahsoun and Mona Al Achkar Jabbour

– The purpose of this paper is to apply local grammar (LG) to develop an indexing system which automatically extracts keywords from titles of Lebanese official journals.

Abstract

Purpose

The purpose of this paper is to apply local grammar (LG) to develop an indexing system which automatically extracts keywords from titles of Lebanese official journals.

Design/methodology/approach

To build LG for our system, the first word that plays the determinant role in understanding the meaning of a title is analyzed and grouped as the initial state. These steps are repeated recursively for the whole words. As a new title is introduced, the first word determines which LG should be applied to suggest or generate further potential keywords based on a set of features calculated for each node of a title.

Findings

The overall performance of our system is 67 per cent, which means that 67 per cent of the keywords extracted manually have been extracted by our system. This empirical result shows the validity of this study’s approach after taking into consideration the below-mentioned limitations.

Research limitations/implications

The system has two limitations. First, it is applied to a sample of 5,747 titles and it can be developed to generate all finite state automata for all titles. The other limitation is that named entities are not processed due to their varieties that require specific ontology.

Originality/value

Almost all keyword extraction systems apply statistical, linguistic or hybrid approaches to extract keywords from texts. This paper contributes to the development of an automatic indexing system to replace the expensive human indexing by taking advantages of LG, which is mainly applied to extract time, date and proper names from texts.

Details

Interactive Technology and Smart Education, vol. 12 no. 1
Type: Research Article
ISSN: 1741-5659

Keywords

To view the access options for this content please click here
Article
Publication date: 1 March 2013

M. Rivette, P. Mognol and J.Y. Hascoet

The purpose of this paper is to propose a method to obtain hybrid rapid tools with elementary component assembly.

Abstract

Purpose

The purpose of this paper is to propose a method to obtain hybrid rapid tools with elementary component assembly.

Design/methodology/approach

The authors' method proposes a functional representational model, starting with the product features, analyzed from three points of view: a feasibility analysis; a manufacturing analysis; and an assembly and synthesis analysis. This method, based on CAD STEP AP‐224 data, makes it possible to obtain an exhaustive list of solutions for the module. The work is illustrated with an industrial example. To construct the Assembly Identity Card (AIC) and test the various parameters that influence the quality of the injected parts, a hybrid injection mold has been produced. The methodology associated with the use of this AIC uses a “representation graph”, which makes it possible to propose a set of valid solutions for assembling the various tooling modules. This method is validated by industrial example.

Findings

The product part is decomposed into a multi‐component prototype (MCP), instead of being made as a single part, which optimizes the manufacturing process and enables greater reactivity during the development of the product.

Research limitations/implications

The final goal is to propose a software assistant used in association with CAD system during the design of hybrid rapid tooling. An important work concerning the features recognition must be implemented. The assembly of the different parts of the hybrid rapid tooling must be considered and optimized.

Practical implications

This method allows the selection of the best process technologies from manufacturing tools.

Originality/value

The analysis of manufacturing hybrid rapid tooling has not been studied previously.

Details

Rapid Prototyping Journal, vol. 19 no. 2
Type: Research Article
ISSN: 1355-2546

Keywords

1 – 10 of over 3000