Books and journals Case studies Expert Briefings Open Access
Advanced search

Search results

1 – 10 of 29
To view the access options for this content please click here
Article
Publication date: 5 September 2017

Subject-based retrieval of scientific documents, case study: Retrieval of Information Technology scientific articles

Azadeh Mohebi, Mehri Sedighi and Zahra Zargaran

The purpose of this paper is to introduce an approach for retrieving a set of scientific articles in the field of Information Technology (IT) from a scientific database…

HTML
PDF (1.4 MB)

Abstract

Purpose

The purpose of this paper is to introduce an approach for retrieving a set of scientific articles in the field of Information Technology (IT) from a scientific database such as Web of Science (WoS), to apply scientometrics indices and compare them with other fields.

Design/methodology/approach

The authors propose to apply a statistical classification-based approach for extracting IT-related articles. In this approach, first, a probabilistic model is introduced to model the subject IT, using keyphrase extraction techniques. Then, they retrieve IT-related articles from all Iranian papers in WoS, based on a Bayesian classification scheme. Based on the probabilistic IT model, they assign an IT membership probability for each article in the database, and then they retrieve the articles with highest probabilities.

Findings

The authors have extracted a set of IT keyphrases, with 1,497 terms through the keyphrase extraction process, for the probabilistic model. They have evaluated the proposed retrieval approach with two approaches: the query-based approach in which the articles are retrieved from WoS using a set of queries composed of limited IT keywords, and the research area-based approach which is based on retrieving the articles using WoS categorizations and research areas. The evaluation and comparison results show that the proposed approach is able to generate more accurate results while retrieving more articles related to IT.

Research limitations/implications

Although this research is limited to the IT subject, it can be generalized for any subject as well. However, for multidisciplinary topics such as IT, special attention should be given to the keyphrase extraction phase. In this research, bigram model is used; however, one can extend it to tri-gram as well.

Originality/value

This paper introduces an integrated approach for retrieving IT-related documents from a collection of scientific documents. The approach has two main phases: building a model for representing topic IT, and retrieving documents based on the model. The model, based on a set of keyphrases, extracted from a collection of IT articles. However, the extraction technique does not rely on Term Frequency-Inverse Document Frequency, since almost all of the articles in the collection share a set of same keyphrases. In addition, a probabilistic membership score is defined to retrieve the IT articles from a collection of scientific articles.

Details

Library Review, vol. 66 no. 6/7
Type: Research Article
DOI: https://doi.org/10.1108/LR-10-2016-0090
ISSN: 0024-2535

Keywords

  • Information technology
  • Information retrieval
  • Scientometrics
  • Document retrieval
  • Keyphrase extraction
  • Probabilistic modeling

To view the access options for this content please click here
Article
Publication date: 6 May 2014

HIVEing: the effect of a semantic web technology on inter-indexer consistency

Hollie White, Craig Willis and Jane Greenberg

The purpose of this paper is to examine the effect of the Helping Interdisciplinary Vocabulary Engineering (HIVE) system on the inter-indexer consistency of information…

HTML
PDF (361 KB)

Abstract

Purpose

The purpose of this paper is to examine the effect of the Helping Interdisciplinary Vocabulary Engineering (HIVE) system on the inter-indexer consistency of information professionals when assigning keywords to a scientific abstract. This study examined first, the inter-indexer consistency of potential HIVE users; second, the impact HIVE had on consistency; and third, challenges associated with using HIVE.

Design/methodology/approach

A within-subjects quasi-experimental research design was used for this study. Data were collected using a task-scenario based questionnaire. Analysis was performed on consistency results using Hooper's and Rolling's inter-indexer consistency measures. A series of t-tests was used to judge the significance between consistency measure results.

Findings

Results suggest that HIVE improves inter-indexing consistency. Working with HIVE increased consistency rates by 22 percent (Rolling's) and 25 percent (Hooper's) when selecting relevant terms from all vocabularies. A statistically significant difference exists between the assignment of free-text keywords and machine-aided keywords. Issues with homographs, disambiguation, vocabulary choice, and document structure were all identified as potential challenges.

Research limitations/implications

Research limitations for this study can be found in the small number of vocabularies used for the study. Future research will include implementing HIVE into the Dryad Repository and studying its application in a repository system.

Originality/value

This paper showcases several features used in HIVE system. By using traditional consistency measures to evaluate a semantic web technology, this paper emphasizes the link between traditional indexing and next generation machine-aided indexing (MAI) tools.

Details

Journal of Documentation, vol. 70 no. 3
Type: Research Article
DOI: https://doi.org/10.1108/JD-07-2012-0083
ISSN: 0022-0418

Keywords

  • Inter-indexer consistency
  • Indexing
  • Helping Interdisciplinary Vocabulary Engineering (HIVE)

To view the access options for this content please click here
Article
Publication date: 5 August 2019

Semantic key phrase-based model for document management

Prafulla Bafna, Dhanya Pramod, Shailaja Shrwaikar and Atiya Hassan

Document management is growing in importance proportionate to the growth of unstructured data, and its applications are increasing from process benchmarking to customer…

HTML
PDF (428 KB)

Abstract

Purpose

Document management is growing in importance proportionate to the growth of unstructured data, and its applications are increasing from process benchmarking to customer relationship management and so on. The purpose of this paper is to improve important components of document management that is keyword extraction and document clustering. It is achieved through knowledge extraction by updating the phrase document matrix. The objective is to manage documents by extending the phrase document matrix and achieve refined clusters. The study achieves consistency in cluster quality in spite of the increasing size of data set. Domain independence of the proposed method is tested and compared with other methods.

Design/methodology/approach

In this paper, a synset-based phrase document matrix construction method is proposed where semantically similar phrases are grouped to reduce the dimension curse. When a large collection of documents is to be processed, it includes some documents that are very much related to the topic of interest known as model documents and also the documents that deviate from the topic of interest. These non-relevant documents may affect the cluster quality. The first step in knowledge extraction from the unstructured textual data is converting it into structured form either as term frequency-inverse document frequency matrix or as phrase document matrix. Once in structured form, a range of mining algorithms from classification to clustering can be applied.

Findings

In the enhanced approach, the model documents are used to extract key phrases with synset groups, whereas the other documents participate in the construction of the feature matrix. It gives a better feature vector representation and improved cluster quality.

Research limitations/implications

Various applications that require managing of unstructured documents can use this approach by specifically incorporating the domain knowledge with a thesaurus.

Practical implications

Experiment pertaining to the academic domain is presented that categorizes research papers according to the context and topic, and this will help academicians to organize and build knowledge in a better way. The grouping and feature extraction for resume data can facilitate the candidate selection process.

Social implications

Applications like knowledge management, clustering of search engine results, different recommender systems like hotel recommender, task recommender, and so on, will benefit from this study. Hence, the study contributes to improving document management in business domains or areas of interest of its users from various strata’s of society.

Originality/value

The study proposed an improvement to document management approach that can be applied in various domains. The efficacy of the proposed approach and its enhancement is validated on three different data sets of well-articulated documents from data sets such as biography, resume and research papers. These results can be used for benchmarking further work carried out in these areas.

Details

Benchmarking: An International Journal, vol. 26 no. 6
Type: Research Article
DOI: https://doi.org/10.1108/BIJ-04-2018-0102
ISSN: 1463-5771

Keywords

  • Clustering
  • Feature vector
  • Phrase
  • Silhouette coefficient
  • Synset group

Content available
Article
Publication date: 17 July 2020

Graph node rank based important keyword detection from Twitter

Mukesh Kumar and Palak Rehan

Social media networks like Twitter, Facebook, WhatsApp etc. are most commonly used medium for sharing news, opinions and to stay in touch with peers. Messages on twitter…

Open Access
HTML
PDF (2.2 MB)

Abstract

Social media networks like Twitter, Facebook, WhatsApp etc. are most commonly used medium for sharing news, opinions and to stay in touch with peers. Messages on twitter are limited to 140 characters. This led users to create their own novel syntax in tweets to express more in lesser words. Free writing style, use of URLs, markup syntax, inappropriate punctuations, ungrammatical structures, abbreviations etc. makes it harder to mine useful information from them. For each tweet, we can get an explicit time stamp, the name of the user, the social network the user belongs to, or even the GPS coordinates if the tweet is created with a GPS-enabled mobile device. With these features, Twitter is, in nature, a good resource for detecting and analyzing the real time events happening around the world. By using the speed and coverage of Twitter, we can detect events, a sequence of important keywords being talked, in a timely manner which can be used in different applications like natural calamity relief support, earthquake relief support, product launches, suspicious activity detection etc. The keyword detection process from Twitter can be seen as a two step process: detection of keyword in the raw text form (words as posted by the users) and keyword normalization process (reforming the users’ unstructured words in the complete meaningful English language words). In this paper a keyword detection technique based upon the graph, spanning tree and Page Rank algorithm is proposed. A text normalization technique based upon hybrid approach using Levenshtein distance, demetaphone algorithm and dictionary mapping is proposed to work upon the unstructured keywords as produced by the proposed keyword detector. The proposed normalization technique is validated using the standard lexnorm 1.2 dataset. The proposed system is used to detect the keywords from Twiter text being posted at real time. The detected and normalized keywords are further validated from the search engine results at later time for detection of events.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
DOI: https://doi.org/10.1016/j.aci.2018.08.002
ISSN: 2634-1964

Keywords

  • Graphs
  • Normalization
  • Social media
  • Spanning trees

To view the access options for this content please click here
Article
Publication date: 14 January 2019

Wisdom extraction in knowledge-based information systems

Zaki Malik, Khayyam Hashmi, Erfan Najmi and Abdelmounaam Rezgui

This paper aims to provide a number of distinct approaches towards this goal, i.e. to translate the information contained in the repositories into knowledge. For…

HTML
PDF (430 KB)

Abstract

Purpose

This paper aims to provide a number of distinct approaches towards this goal, i.e. to translate the information contained in the repositories into knowledge. For centuries, humans have gathered and generated data to study the different phenomena around them. Consequently, there are a variety of information repositories available in many different fields of study. However, the ability to access, integrate and properly interpret the relevant data sets in these repositories has mainly been limited by their ever expanding volumes. The goal of translating the available data to knowledge, eventually leading to wisdom, requires an understanding of the relations, ordering and associations among the data sets.

Design/methodology/approach

While the existing information repositories are rich in content, there are no easy means of understanding the relevance or influence of the different facts contained therein. Therefore, the interest of the general populace in terms of prioritizing some data items (or facts) over others is usually lost. In this paper, the goal is to provide approaches for transforming the available facts in the information repositories to wisdom. The authors target the lack of order in the facts presented in the repositories to create a hierarchical distribution based on the common understanding, expectations, opinions and judgments of the different users.

Findings

The authors present multiple approaches to extract and order the facts related to each concept, using both automatic and semi-automatic methods. The experiments show that the results of these approaches are similar and very close to the instinctive ordering of facts by users.

Originality/value

The authors believe that the work presented in this paper, with some additions, can be a feasible step to convert the available knowledge to wisdom and a step towards the future of online information systems.

Details

Journal of Knowledge Management, vol. 23 no. 1
Type: Research Article
DOI: https://doi.org/10.1108/JKM-05-2018-0288
ISSN: 1367-3270

Keywords

  • Information technology
  • Knowledge management systems
  • Knowledge transfer

To view the access options for this content please click here
Article
Publication date: 23 November 2010

Topic‐based web site summarization

Yongzheng Zhang, Evangelos Milios and Nur Zincir‐Heywood

Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a…

HTML
PDF (200 KB)

Abstract

Purpose

Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel topic‐based framework to address this problem.

Design/methodology/approach

A two‐stage framework is proposed. The first stage identifies the main topics covered in a web site via clustering and the second stage summarizes each topic separately. The proposed system is evaluated by a user study and compared with the single‐topic summarization approach.

Findings

The user study demonstrates that the clustering‐summarization approach statistically significantly outperforms the plain summarization approach in the multi‐topic web site summarization task. Text‐based clustering based on selecting features with high variance over web pages is reliable; outgoing links are useful if a rich set of cross links is available.

Research limitations/implications

More sophisticated clustering methods than those used in this study are worth investigating. The proposed method should be tested on web content that is less structured than organizational web sites, for example blogs.

Practical implications

The proposed summarization framework can be applied to the effective organization of search engine results and faceted or topical browsing of large web sites.

Originality/value

Several key components are integrated for web site summarization for the first time, including feature selection and link analysis, key phrase and key sentence extraction. Insight into the contributions of links and content to topic‐based summarization was gained. A classification approach is used to minimize the number of parameters.

Details

International Journal of Web Information Systems, vol. 6 no. 4
Type: Research Article
DOI: https://doi.org/10.1108/17440081011090220
ISSN: 1744-0084

Keywords

  • Programming and algorithm theory
  • Internet
  • Cluster analysis
  • Data handling

To view the access options for this content please click here
Article
Publication date: 1 November 1999

Resource sharing for Chinese language materials and what technologies can offer in the digital age: A Hong Kong experience of worldwide issues

Lawrence W.H. Tam

In Hong Kong resource sharing for cooperative cataloguing for Chinese language materials started in the 1990s with an infrastructure of a Z29.50‐based distributed system…

HTML
PDF (142 KB)

Abstract

In Hong Kong resource sharing for cooperative cataloguing for Chinese language materials started in the 1990s with an infrastructure of a Z29.50‐based distributed system under the auspices of JULAC of the University Grants Committee. The advantages and limitations of the distributed approach for resource sharing are considered. Problems such as variant MARC formats, romanisation, and codes for information exchange are examined. Unresolved practice issues specific to Chinese language materials are discussed. Resource descriptions for resource sharing, especially cataloguing, are introduced.

Details

Asian Libraries, vol. 8 no. 11
Type: Research Article
DOI: https://doi.org/10.1108/10176749910302046
ISSN: 1017-6748

Keywords

  • Library materials
  • Purchasing techniques
  • Hong Kong
  • Information technology
  • China
  • Language
  • Cataloguing

To view the access options for this content please click here
Article
Publication date: 9 October 2017

Evaluating the concept specialization distance from an end-user perspective: The case of AGROVOC

David Martín-Moncunill, Miguel Angel Sicilia-Urban, Elena García-Barriocanal and Christian M. Stracke

The common understanding of generalization/specialization relations assumes the relation to be equally strong between a classifier and any of its related classifiers and…

HTML
PDF (304 KB)

Abstract

Purpose

The common understanding of generalization/specialization relations assumes the relation to be equally strong between a classifier and any of its related classifiers and also at every level of the hierarchy. Assigning a grade of relative distance to represent the level of similarity between the related pairs of classifiers could correct this situation, which has been considered as an oversimplification of the psychological account of the real-world relations. The paper aims to discuss these issues.

Design/methodology/approach

The evaluation followed an end-user perspective. In order to obtain a consistent data set of specialization distances, a group of 21 persons was asked to assign values to a set of relations from a selection of terms from the AGROVOC thesaurus. Then two sets of representations of the relations between the terms were built, one according to the calculated concept of specialization weights and the other one following the original order of the thesaurus. In total, 40 persons were asked to choose between the two sets following an A/B test-like experiment. Finally, short interviews were carried out after the test to inquiry about their decisions.

Findings

The results show that the use of this information could be a valuable tool for search and information retrieval purposes and for the visual representation of knowledge organization systems (KOS). Furthermore, the methodology followed in the study turned out to be useful for detecting inconsistencies in the thesaurus and could thus be used for quality control and optimization of the hierarchical relations.

Originality/value

The use of this relative distance information, namely, “concept specialization distance,” has been proposed mainly at a theoretical level. In the current experiment, the authors evaluate the potential use of this information from an end-user perspective, not only for text-based interfaces but also its application for the visual representation of KOS. Finally, the methodology followed for the elaboration of the concept specialization distance data set showed potential for detecting possible inconsistencies in KOS.

Details

Online Information Review, vol. 41 no. 6
Type: Research Article
DOI: https://doi.org/10.1108/OIR-03-2016-0094
ISSN: 1468-4527

Keywords

  • Information seeking
  • User interfaces
  • Knowledge organization systems
  • Search tactics
  • Concept specialization distance
  • Gen-spec

To view the access options for this content please click here
Article
Publication date: 20 April 2015

Keyword extraction from Arabic legal texts

Mahmoud Rammal, Zeinab Bahsoun and Mona Al Achkar Jabbour

– The purpose of this paper is to apply local grammar (LG) to develop an indexing system which automatically extracts keywords from titles of Lebanese official journals.

HTML
PDF (273 KB)

Abstract

Purpose

The purpose of this paper is to apply local grammar (LG) to develop an indexing system which automatically extracts keywords from titles of Lebanese official journals.

Design/methodology/approach

To build LG for our system, the first word that plays the determinant role in understanding the meaning of a title is analyzed and grouped as the initial state. These steps are repeated recursively for the whole words. As a new title is introduced, the first word determines which LG should be applied to suggest or generate further potential keywords based on a set of features calculated for each node of a title.

Findings

The overall performance of our system is 67 per cent, which means that 67 per cent of the keywords extracted manually have been extracted by our system. This empirical result shows the validity of this study’s approach after taking into consideration the below-mentioned limitations.

Research limitations/implications

The system has two limitations. First, it is applied to a sample of 5,747 titles and it can be developed to generate all finite state automata for all titles. The other limitation is that named entities are not processed due to their varieties that require specific ontology.

Originality/value

Almost all keyword extraction systems apply statistical, linguistic or hybrid approaches to extract keywords from texts. This paper contributes to the development of an automatic indexing system to replace the expensive human indexing by taking advantages of LG, which is mainly applied to extract time, date and proper names from texts.

Details

Interactive Technology and Smart Education, vol. 12 no. 1
Type: Research Article
DOI: https://doi.org/10.1108/ITSE-11-2013-0030
ISSN: 1741-5659

Keywords

  • Knowledge management systems
  • Information systems
  • Information extraction
  • Local grammar
  • Human indexing
  • Keyword extraction

To view the access options for this content please click here
Article
Publication date: 15 June 2015

A semi-automatic indexing system based on embedded information in HTML documents

Mari Vállez, Rafael Pedraza-Jiménez, Lluís Codina, Saúl Blanco and Cristòfol Rovira

The purpose of this paper is to describe and evaluate the tool DigiDoc MetaEdit which allows the semi-automatic indexing of HTML documents. The tool works by identifying…

HTML
PDF (422 KB)

Abstract

Purpose

The purpose of this paper is to describe and evaluate the tool DigiDoc MetaEdit which allows the semi-automatic indexing of HTML documents. The tool works by identifying and suggesting keywords from a thesaurus according to the embedded information in HTML documents. This enables the parameterization of keyword assignment based on how frequently the terms appear in the document, the relevance of their position, and the combination of both.

Design/methodology/approach

In order to evaluate the efficiency of the indexing tool, the descriptors/keywords suggested by the indexing tool are compared to the keywords which have been indexed manually by human experts. To make this comparison a corpus of HTML documents are randomly selected from a journal devoted to Library and Information Science.

Findings

The results of the evaluation show that there: first, is close to a 50 per cent match or overlap between the two indexing systems, however, if you take into consideration the related terms and the narrow terms the matches can reach 73 per cent; and second, the first terms identified by the tool are the most relevant.

Originality/value

The tool presented identifies the most important keywords in an HTML document based on the embedded information in HTML documents. Nowadays, representing the contents of documents with keywords is an essential practice in areas such as information retrieval and e-commerce.

Details

Library Hi Tech, vol. 33 no. 2
Type: Research Article
DOI: https://doi.org/10.1108/LHT-12-2014-0114
ISSN: 0737-8831

Keywords

  • Digital documents
  • Information retrieval
  • Indexing
  • Search engines
  • Hypertext markup language

Access
Only content I have access to
Only Open Access
Year
  • Last 3 months (1)
  • Last 6 months (4)
  • Last 12 months (8)
  • All dates (29)
Content type
  • Article (25)
  • Earlycite article (4)
1 – 10 of 29
Emerald Publishing
  • Opens in new window
  • Opens in new window
  • Opens in new window
  • Opens in new window
© 2021 Emerald Publishing Limited

Services

  • Authors Opens in new window
  • Editors Opens in new window
  • Librarians Opens in new window
  • Researchers Opens in new window
  • Reviewers Opens in new window

About

  • About Emerald Opens in new window
  • Working for Emerald Opens in new window
  • Contact us Opens in new window
  • Publication sitemap

Policies and information

  • Privacy notice
  • Site policies
  • Modern Slavery Act Opens in new window
  • Chair of Trustees governance statement Opens in new window
  • COVID-19 policy Opens in new window
Manage cookies

We’re listening — tell us what you think

  • Something didn’t work…

    Report bugs here

  • All feedback is valuable

    Please share your general feedback

  • Member of Emerald Engage?

    You can join in the discussion by joining the community or logging in here.
    You can also find out more about Emerald Engage.

Join us on our journey

  • Platform update page

    Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

  • Questions & More Information

    Answers to the most commonly asked questions here