Search results

1 – 10 of 47
Article
Publication date: 14 July 2023

Hamid Hassani, Azadeh Mohebi, M.J. Ershadi and Ammar Jalalimanesh

The purpose of this research is to provide a framework in which new data quality dimensions are defined. The new dimensions provide new metrics for the assessment of lecture video…

91

Abstract

Purpose

The purpose of this research is to provide a framework in which new data quality dimensions are defined. The new dimensions provide new metrics for the assessment of lecture video indexing. As lecture video indexing involves various steps, the proposed framework containing new dimensions, introduces new integrated approach for evaluating an indexing method or algorithm from the beginning to the end.

Design/methodology/approach

The emphasis in this study is on the fifth step of design science research methodology (DSRM), known as evaluation. That is, the methods that are developed in the field of lecture video indexing as an artifact, should be evaluated from different aspects. In this research, nine dimensions of data quality including accuracy, value-added, relevancy, completeness, appropriate amount of data, concise, consistency, interpretability and accessibility have been redefined based on previous studies and nominal group technique (NGT).

Findings

The proposed dimensions are implemented as new metrics to evaluate a newly developed lecture video indexing algorithm, LVTIA and numerical values have been obtained based on the proposed definitions for each dimension. In addition, the new dimensions are compared with each other in terms of various aspects. The comparison shows that each dimension that is used for assessing lecture video indexing, is able to reflect a different weakness or strength of an indexing method or algorithm.

Originality/value

Despite development of different methods for indexing lecture videos, the issue of data quality and its various dimensions have not been studied. Since data with low quality can affect the process of scientific lecture video indexing, the issue of data quality in this process requires special attention.

Details

Library Hi Tech, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 14 June 2021

Farnoush Bayatmakou, Azadeh Mohebi and Abbas Ahmadi

Query-based summarization approaches might not be able to provide summaries compatible with the user’s information need, as they mostly rely on a limited source of information…

Abstract

Purpose

Query-based summarization approaches might not be able to provide summaries compatible with the user’s information need, as they mostly rely on a limited source of information, usually represented as a single query by the user. This issue becomes even more challenging when dealing with scientific documents, as they contain more specific subject-related terms, while the user may not be able to express his/her specific information need in a query with limited terms. This study aims to propose an interactive multi-document text summarization approach that generates an eligible summary that is more compatible with the user’s information need. This approach allows the user to interactively specify the composition of a multi-document summary.

Design/methodology/approach

This approach exploits the user’s opinion in two stages. The initial query is refined by user-selected keywords/keyphrases and complete sentences extracted from the set of retrieved documents. It is followed by a novel method for sentence expansion using the genetic algorithm, and ranking the final set of sentences using the maximal marginal relevance method. Basically, for implementation, the Web of Science data set in the artificial intelligence (AI) category is considered.

Findings

The proposed approach receives feedback from the user in terms of favorable keywords and sentences. The feedback eventually improves the summary as the end. To assess the performance of the proposed system, this paper has asked 45 users who were graduate students in the field of AI to fill out a questionnaire. The quality of the final summary has been also evaluated from the user’s perspective and information redundancy. It has been investigated that the proposed approach leads to higher degrees of user satisfaction compared to the ones with no or only one step of the interaction.

Originality/value

The interactive summarization approach goes beyond the initial user’s query, while it includes the user’s preferred keywords/keyphrases and sentences through a systematic interaction. With respect to these interactions, the system gives the user a more clear idea of the information he/she is looking for and consequently adjusting the final result to the ultimate information need. Such interaction allows the summarization system to achieve a comprehensive understanding of the user’s information needs while expanding context-based knowledge and guiding the user toward his/her information journey.

Details

Information Discovery and Delivery, vol. 50 no. 2
Type: Research Article
ISSN: 2398-6247

Keywords

Article
Publication date: 5 September 2017

Azadeh Mohebi, Mehri Sedighi and Zahra Zargaran

The purpose of this paper is to introduce an approach for retrieving a set of scientific articles in the field of Information Technology (IT) from a scientific database such as…

Abstract

Purpose

The purpose of this paper is to introduce an approach for retrieving a set of scientific articles in the field of Information Technology (IT) from a scientific database such as Web of Science (WoS), to apply scientometrics indices and compare them with other fields.

Design/methodology/approach

The authors propose to apply a statistical classification-based approach for extracting IT-related articles. In this approach, first, a probabilistic model is introduced to model the subject IT, using keyphrase extraction techniques. Then, they retrieve IT-related articles from all Iranian papers in WoS, based on a Bayesian classification scheme. Based on the probabilistic IT model, they assign an IT membership probability for each article in the database, and then they retrieve the articles with highest probabilities.

Findings

The authors have extracted a set of IT keyphrases, with 1,497 terms through the keyphrase extraction process, for the probabilistic model. They have evaluated the proposed retrieval approach with two approaches: the query-based approach in which the articles are retrieved from WoS using a set of queries composed of limited IT keywords, and the research area-based approach which is based on retrieving the articles using WoS categorizations and research areas. The evaluation and comparison results show that the proposed approach is able to generate more accurate results while retrieving more articles related to IT.

Research limitations/implications

Although this research is limited to the IT subject, it can be generalized for any subject as well. However, for multidisciplinary topics such as IT, special attention should be given to the keyphrase extraction phase. In this research, bigram model is used; however, one can extend it to tri-gram as well.

Originality/value

This paper introduces an integrated approach for retrieving IT-related documents from a collection of scientific documents. The approach has two main phases: building a model for representing topic IT, and retrieving documents based on the model. The model, based on a set of keyphrases, extracted from a collection of IT articles. However, the extraction technique does not rely on Term Frequency-Inverse Document Frequency, since almost all of the articles in the collection share a set of same keyphrases. In addition, a probabilistic membership score is defined to retrieve the IT articles from a collection of scientific articles.

Article
Publication date: 14 December 2021

Claudia Lanza, Antonietta Folino, Erika Pasceri and Anna Perri

The aim of this study is a semantic comparative analysis between the current pandemic and the Spanish flu. It is based on a bilingual terminological perspective oriented to…

Abstract

Purpose

The aim of this study is a semantic comparative analysis between the current pandemic and the Spanish flu. It is based on a bilingual terminological perspective oriented to evaluate and compare the terms used to describe and communicate the pandemic's issues both to biomedical experts and to a non-specialist public.

Design/methodology/approach

The analysis carried out is a terminological comparative investigation performed on two corpora, the first containing scientific English articles, the second Italian national newspapers' issues on two pandemics, the Spanish flu and the current Covid-19 disease, towards the detection of semantic similarities and differences among them through the implementation of computational tasks and corpus linguistics methodologies.

Findings

Given the cross-fielding representativeness of terms, and their relevance within specific historical eras, our study is conducted both on a synchronic and on a diachronic level to discover the common lexical usages in the dissemination of the pandemic issues.

Originality/value

The study presents the extraction of the main representative terms about two pandemics and their usages to share news about their trends among the population and the integration of a topic modeling detection procedure to discover some of the main categories representing the lexicon of the pandemics with reference to a list of classes created by external thesauri and ontologies on pandemics. As a result, a detailed overview of the discrepancies, as well as similarities, retrieved in two historical corpora dealing with a common subject, i.e. the pandemics' terminology, is provided.

Details

Journal of Documentation, vol. 78 no. 4
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 6 May 2014

Hollie White, Craig Willis and Jane Greenberg

The purpose of this paper is to examine the effect of the Helping Interdisciplinary Vocabulary Engineering (HIVE) system on the inter-indexer consistency of information…

Abstract

Purpose

The purpose of this paper is to examine the effect of the Helping Interdisciplinary Vocabulary Engineering (HIVE) system on the inter-indexer consistency of information professionals when assigning keywords to a scientific abstract. This study examined first, the inter-indexer consistency of potential HIVE users; second, the impact HIVE had on consistency; and third, challenges associated with using HIVE.

Design/methodology/approach

A within-subjects quasi-experimental research design was used for this study. Data were collected using a task-scenario based questionnaire. Analysis was performed on consistency results using Hooper's and Rolling's inter-indexer consistency measures. A series of t-tests was used to judge the significance between consistency measure results.

Findings

Results suggest that HIVE improves inter-indexing consistency. Working with HIVE increased consistency rates by 22 percent (Rolling's) and 25 percent (Hooper's) when selecting relevant terms from all vocabularies. A statistically significant difference exists between the assignment of free-text keywords and machine-aided keywords. Issues with homographs, disambiguation, vocabulary choice, and document structure were all identified as potential challenges.

Research limitations/implications

Research limitations for this study can be found in the small number of vocabularies used for the study. Future research will include implementing HIVE into the Dryad Repository and studying its application in a repository system.

Originality/value

This paper showcases several features used in HIVE system. By using traditional consistency measures to evaluate a semantic web technology, this paper emphasizes the link between traditional indexing and next generation machine-aided indexing (MAI) tools.

Details

Journal of Documentation, vol. 70 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 19 June 2019

Prafulla Bafna, Dhanya Pramod, Shailaja Shrwaikar and Atiya Hassan

Document management is growing in importance proportionate to the growth of unstructured data, and its applications are increasing from process benchmarking to customer…

Abstract

Purpose

Document management is growing in importance proportionate to the growth of unstructured data, and its applications are increasing from process benchmarking to customer relationship management and so on. The purpose of this paper is to improve important components of document management that is keyword extraction and document clustering. It is achieved through knowledge extraction by updating the phrase document matrix. The objective is to manage documents by extending the phrase document matrix and achieve refined clusters. The study achieves consistency in cluster quality in spite of the increasing size of data set. Domain independence of the proposed method is tested and compared with other methods.

Design/methodology/approach

In this paper, a synset-based phrase document matrix construction method is proposed where semantically similar phrases are grouped to reduce the dimension curse. When a large collection of documents is to be processed, it includes some documents that are very much related to the topic of interest known as model documents and also the documents that deviate from the topic of interest. These non-relevant documents may affect the cluster quality. The first step in knowledge extraction from the unstructured textual data is converting it into structured form either as term frequency-inverse document frequency matrix or as phrase document matrix. Once in structured form, a range of mining algorithms from classification to clustering can be applied.

Findings

In the enhanced approach, the model documents are used to extract key phrases with synset groups, whereas the other documents participate in the construction of the feature matrix. It gives a better feature vector representation and improved cluster quality.

Research limitations/implications

Various applications that require managing of unstructured documents can use this approach by specifically incorporating the domain knowledge with a thesaurus.

Practical implications

Experiment pertaining to the academic domain is presented that categorizes research papers according to the context and topic, and this will help academicians to organize and build knowledge in a better way. The grouping and feature extraction for resume data can facilitate the candidate selection process.

Social implications

Applications like knowledge management, clustering of search engine results, different recommender systems like hotel recommender, task recommender, and so on, will benefit from this study. Hence, the study contributes to improving document management in business domains or areas of interest of its users from various strata’s of society.

Originality/value

The study proposed an improvement to document management approach that can be applied in various domains. The efficacy of the proposed approach and its enhancement is validated on three different data sets of well-articulated documents from data sets such as biography, resume and research papers. These results can be used for benchmarking further work carried out in these areas.

Details

Benchmarking: An International Journal, vol. 26 no. 6
Type: Research Article
ISSN: 1463-5771

Keywords

Article
Publication date: 9 May 2022

Lei Zhao, Yingyi Zhang and Chengzhi Zhang

To understand the meaning of a sentence, humans can focus on important words in the sentence, which reflects our eyes staying on each word in different gaze time or times. Thus…

Abstract

Purpose

To understand the meaning of a sentence, humans can focus on important words in the sentence, which reflects our eyes staying on each word in different gaze time or times. Thus, some studies utilize eye-tracking values to optimize the attention mechanism in deep learning models. But these studies lack to explain the rationality of this approach. Whether the attention mechanism possesses this feature of human reading needs to be explored.

Design/methodology/approach

The authors conducted experiments on a sentiment classification task. Firstly, they obtained eye-tracking values from two open-source eye-tracking corpora to describe the feature of human reading. Then, the machine attention values of each sentence were learned from a sentiment classification model. Finally, a comparison was conducted to analyze machine attention values and eye-tracking values.

Findings

Through experiments, the authors found the attention mechanism can focus on important words, such as adjectives, adverbs and sentiment words, which are valuable for judging the sentiment of sentences on the sentiment classification task. It possesses the feature of human reading, focusing on important words in sentences when reading. Due to the insufficient learning of the attention mechanism, some words are wrongly focused. The eye-tracking values can help the attention mechanism correct this error and improve the model performance.

Originality/value

Our research not only provides a reasonable explanation for the study of using eye-tracking values to optimize the attention mechanism but also provides new inspiration for the interpretability of attention mechanism.

Details

Aslib Journal of Information Management, vol. 75 no. 1
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 11 December 2018

Zaki Malik, Khayyam Hashmi, Erfan Najmi and Abdelmounaam Rezgui

This paper aims to provide a number of distinct approaches towards this goal, i.e. to translate the information contained in the repositories into knowledge. For centuries, humans…

Abstract

Purpose

This paper aims to provide a number of distinct approaches towards this goal, i.e. to translate the information contained in the repositories into knowledge. For centuries, humans have gathered and generated data to study the different phenomena around them. Consequently, there are a variety of information repositories available in many different fields of study. However, the ability to access, integrate and properly interpret the relevant data sets in these repositories has mainly been limited by their ever expanding volumes. The goal of translating the available data to knowledge, eventually leading to wisdom, requires an understanding of the relations, ordering and associations among the data sets.

Design/methodology/approach

While the existing information repositories are rich in content, there are no easy means of understanding the relevance or influence of the different facts contained therein. Therefore, the interest of the general populace in terms of prioritizing some data items (or facts) over others is usually lost. In this paper, the goal is to provide approaches for transforming the available facts in the information repositories to wisdom. The authors target the lack of order in the facts presented in the repositories to create a hierarchical distribution based on the common understanding, expectations, opinions and judgments of the different users.

Findings

The authors present multiple approaches to extract and order the facts related to each concept, using both automatic and semi-automatic methods. The experiments show that the results of these approaches are similar and very close to the instinctive ordering of facts by users.

Originality/value

The authors believe that the work presented in this paper, with some additions, can be a feasible step to convert the available knowledge to wisdom and a step towards the future of online information systems.

Details

Journal of Knowledge Management, vol. 23 no. 1
Type: Research Article
ISSN: 1367-3270

Keywords

Open Access
Article
Publication date: 17 July 2020

Mukesh Kumar and Palak Rehan

Social media networks like Twitter, Facebook, WhatsApp etc. are most commonly used medium for sharing news, opinions and to stay in touch with peers. Messages on twitter are…

1193

Abstract

Social media networks like Twitter, Facebook, WhatsApp etc. are most commonly used medium for sharing news, opinions and to stay in touch with peers. Messages on twitter are limited to 140 characters. This led users to create their own novel syntax in tweets to express more in lesser words. Free writing style, use of URLs, markup syntax, inappropriate punctuations, ungrammatical structures, abbreviations etc. makes it harder to mine useful information from them. For each tweet, we can get an explicit time stamp, the name of the user, the social network the user belongs to, or even the GPS coordinates if the tweet is created with a GPS-enabled mobile device. With these features, Twitter is, in nature, a good resource for detecting and analyzing the real time events happening around the world. By using the speed and coverage of Twitter, we can detect events, a sequence of important keywords being talked, in a timely manner which can be used in different applications like natural calamity relief support, earthquake relief support, product launches, suspicious activity detection etc. The keyword detection process from Twitter can be seen as a two step process: detection of keyword in the raw text form (words as posted by the users) and keyword normalization process (reforming the users’ unstructured words in the complete meaningful English language words). In this paper a keyword detection technique based upon the graph, spanning tree and Page Rank algorithm is proposed. A text normalization technique based upon hybrid approach using Levenshtein distance, demetaphone algorithm and dictionary mapping is proposed to work upon the unstructured keywords as produced by the proposed keyword detector. The proposed normalization technique is validated using the standard lexnorm 1.2 dataset. The proposed system is used to detect the keywords from Twiter text being posted at real time. The detected and normalized keywords are further validated from the search engine results at later time for detection of events.

Details

Applied Computing and Informatics, vol. 17 no. 2
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 23 November 2010

Yongzheng Zhang, Evangelos Milios and Nur Zincir‐Heywood

Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel…

Abstract

Purpose

Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel topic‐based framework to address this problem.

Design/methodology/approach

A two‐stage framework is proposed. The first stage identifies the main topics covered in a web site via clustering and the second stage summarizes each topic separately. The proposed system is evaluated by a user study and compared with the single‐topic summarization approach.

Findings

The user study demonstrates that the clustering‐summarization approach statistically significantly outperforms the plain summarization approach in the multi‐topic web site summarization task. Text‐based clustering based on selecting features with high variance over web pages is reliable; outgoing links are useful if a rich set of cross links is available.

Research limitations/implications

More sophisticated clustering methods than those used in this study are worth investigating. The proposed method should be tested on web content that is less structured than organizational web sites, for example blogs.

Practical implications

The proposed summarization framework can be applied to the effective organization of search engine results and faceted or topical browsing of large web sites.

Originality/value

Several key components are integrated for web site summarization for the first time, including feature selection and link analysis, key phrase and key sentence extraction. Insight into the contributions of links and content to topic‐based summarization was gained. A classification approach is used to minimize the number of parameters.

Details

International Journal of Web Information Systems, vol. 6 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 10 of 47