Search results

1 – 10 of 335
Article
Publication date: 8 September 2023

Oussama Ayoub, Christophe Rodrigues and Nicolas Travers

This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data…

Abstract

Purpose

This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data that modern IR systems have to manage, existing solutions are needed to efficiently find the best set of documents for a given request. The words used to describe a query can differ from those used in related documents. Despite meaning closeness, nonoverlapping words are challenging for IR systems. This word gap becomes significant for long documents from specific domains.

Design/methodology/approach

To generate new words for a document, a deep learning (DL) masked language model is used to infer related words. Used DL models are pretrained on massive text data and carry common or specific domain knowledge to propose a better document representation.

Findings

The authors evaluate the approach of this study on specific IR domains with long documents to show the genericity of the proposed model and achieve encouraging results.

Originality/value

In this paper, to the best of the authors’ knowledge, an original unsupervised and modular IR system based on recent DL methods is introduced.

Details

International Journal of Web Information Systems, vol. 19 no. 5/6
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 11 November 2014

Mihaela Dinsoreanu and Rodica Potolea

The purpose of this paper is to address the challenge of opinion mining in text documents to perform further analysis such as community detection and consistency control. More…

Abstract

Purpose

The purpose of this paper is to address the challenge of opinion mining in text documents to perform further analysis such as community detection and consistency control. More specifically, we aim to identify and extract opinions from natural language documents and to represent them in a structured manner to identify communities of opinion holders based on their common opinions. Another goal is to rapidly identify similar or contradictory opinions on a target issued by different holders.

Design/methodology/approach

For the opinion extraction problem we opted for a supervised approach focusing on the feature selection problem to improve our classification results. On the community detection problem, we rely on the Infomap community detection algorithm and the multi-scale community detection framework used on a graph representation based on the available opinions and social data.

Findings

The classification performance in terms of precision and recall was significantly improved by adding a set of “meta-features” based on grouping rules of certain part of speech (POS) instead of the actual words. Concerning the evaluation of the community detection feature, we have used two quality metrics: the network modularity and the normalized mutual information (NMI). We evaluated seven one-target similarity functions and ten multi-target aggregation functions and concluded that linear functions perform poorly for data sets with multiple targets, while functions that calculate the average similarity have greater resilience to noise.

Originality/value

Although our solution relies on existing approaches, we managed to adapt and integrate them in an efficient manner. Based on the initial experimental results obtained, we managed to integrate original enhancements to improve the performance of the obtained results.

Details

International Journal of Web Information Systems, vol. 10 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 3 August 2021

Chuanming Yu, Haodong Xue, Manyi Wang and Lu An

Owing to the uneven distribution of annotated corpus among different languages, it is necessary to bridge the gap between low resource languages and high resource languages. From…

Abstract

Purpose

Owing to the uneven distribution of annotated corpus among different languages, it is necessary to bridge the gap between low resource languages and high resource languages. From the perspective of entity relation extraction, this paper aims to extend the knowledge acquisition task from a single language context to a cross-lingual context, and to improve the relation extraction performance for low resource languages.

Design/methodology/approach

This paper proposes a cross-lingual adversarial relation extraction (CLARE) framework, which decomposes cross-lingual relation extraction into parallel corpus acquisition and adversarial adaptation relation extraction. Based on the proposed framework, this paper conducts extensive experiments in two tasks, i.e. the English-to-Chinese and the English-to-Arabic cross-lingual entity relation extraction.

Findings

The Macro-F1 values of the optimal models in the two tasks are 0.880 1 and 0.789 9, respectively, indicating that the proposed CLARE framework for CLARE can significantly improve the effect of low resource language entity relation extraction. The experimental results suggest that the proposed framework can effectively transfer the corpus as well as the annotated tags from English to Chinese and Arabic. This study reveals that the proposed approach is less human labour intensive and more effective in the cross-lingual entity relation extraction than the manual method. It shows that this approach has high generalizability among different languages.

Originality/value

The research results are of great significance for improving the performance of the cross-lingual knowledge acquisition. The cross-lingual transfer may greatly reduce the time and cost of the manual construction of the multi-lingual corpus. It sheds light on the knowledge acquisition and organization from the unstructured text in the era of big data.

Details

The Electronic Library , vol. 39 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 28 October 2019

Farshid Mirzaalian and Elizabeth Halpenny

The purpose of this paper is to provide a review of hospitality and tourism studies that have used social media analytics to collect, examine, summarize and interpret “big data”…

3330

Abstract

Purpose

The purpose of this paper is to provide a review of hospitality and tourism studies that have used social media analytics to collect, examine, summarize and interpret “big data” derived from social media. It proposes improved approaches by documenting past and current analytic practice addressed by the selected studies in social media analytics.

Design/methodology/approach

Studies from the past 18 years were identified and collected from five international electronic bibliographic databases. Social media analytics-related terms and keywords in the titles, keywords or abstracts were used to identify relevant articles. Book chapters, conference papers and articles not written in English were excluded from analysis. The preferred reporting items for systematic reviews and meta-analyses (PRISMA) guided the search, and Stieglitz and Dang-Xuan’s (2013) social media analytics framework was adapted to categorize methods reported in each article.

Findings

The research purpose of each study was identified and categorized to better understand the questions social media analytics were being used to address, as well as the frequency of each method’s use. Since 2014, rapid growth of social media analytics was observed, along with an expanded use of multiple analytic methods, including accuracy testing. These factors suggest an increased commitment to and competency in conducting comprehensive and robust social media data analyses. Improved use of methods such as social network analysis, comparative analysis and trend analysis is recommended. Consumer-review networks and social networking sites were the main social media platforms from which data were gathered; simultaneous analysis of multi-platform/sources of data is recommended to improve validity and comprehensive understanding.

Originality/value

This is the first systematic literature review of the application of social media analytics in hospitality and tourism research. The study highlights advancements in social media analytics and recommends an expansion of approaches; common analytical methods such as text analysis and sentiment analysis should be supplemented by infrequently used approaches such as comparative analysis and spatial analysis.

研究目的

本文对酒店旅游学科中采用社交媒体数据分析的文献进行梳理。本文通过审阅其相关分析方法的文献来提出分析方法的改进策略。

研究设计/方法/途径

样本数据包括过去18年中五个国际在线文献索引库中的文献。搜索通过标题、关键词、或者摘要中出现社交媒体数据分析等相关字样的文章。书章节、会议文章、以及非英文文章未被收录在索引中。系统回顾和文献综述的方法(PRISMA)指导本文文献索引, Stieglitz和Dang-Xuan(2013)社交媒体数据分析框架作为本文文献分类的方法。

研究结果

本文汇报了每篇文献的研究目的以及系统归类以更好理解社交媒体数据分析的研究问题以及每种方法的使用频率。自2014年起, 社交媒体数据分析快速增长, 以及其他相关分析方法, 包括精度测试(accuracy testing)。这些结果表明更多全面、稳定的分析方法需求增强以及竞争激烈。本文推荐使用改良方法, 比如社交网络分析法、比较分析、趋势分析等。消费者评价网络和社交网站成为主要社交媒体网络数据的提供平台。本文推荐多源数据应该同步分析以提高有效性和全面性的理解。

研究原创性/价值

本文是首篇酒店旅游领域中对社交媒体数据分析的系统文献回顾型文章。本文强调了社交媒体数据分析的先进性以及扩展其方法的全面性;常见分析方法比如文本分析和情感分析应该结合非常见的分析方法比如比较分析法和空间分析法进行系统分析。

关键词 – 关键词 对比分析, 情感分析, 用户原创内容,社交媒体分析, 主题模型, 空间分析, 文本分析文章类型 文献综述

Article
Publication date: 21 January 2019

Issa Alsmadi and Keng Hoon Gan

Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type…

1156

Abstract

Purpose

Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type of document based on their content has a significant implication in many applications. The need to classify these documents in relevant classes according to their text contents should be interested in many practical reasons. Short-text classification is an essential step in many applications, such as spam filtering, sentiment analysis, Twitter personalization, customer review and many other applications related to social networks. Reviews on short text and its application are limited. Thus, this paper aims to discuss the characteristics of short text, its challenges and difficulties in classification. The paper attempt to introduce all stages in principle classification, the technique used in each stage and the possible development trend in each stage.

Design/methodology/approach

The paper as a review of the main aspect of short-text classification. The paper is structured based on the classification task stage.

Findings

This paper discusses related issues and approaches to these problems. Further research could be conducted to address the challenges in short texts and avoid poor accuracy in classification. Problems in low performance can be solved by using optimized solutions, such as genetic algorithms that are powerful in enhancing the quality of selected features. Soft computing solution has a fuzzy logic that makes short-text problems a promising area of research.

Originality/value

Using a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. Current solutions still have low performance, implying the need for improvement. This paper discusses related issues and approaches to these problems.

Details

International Journal of Web Information Systems, vol. 15 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 20 August 2018

Dharini Ramachandran and Parvathi Ramasubramanian

“What’s happening?” around you can be spread through the very pronounced social media to everybody. It provides a powerful platform that brings to light the latest news, trends…

Abstract

Purpose

“What’s happening?” around you can be spread through the very pronounced social media to everybody. It provides a powerful platform that brings to light the latest news, trends and happenings around the world in “near instant” time. Microblog is a popular Web service that enables users to post small pieces of digital content, such as text, picture, video and link to external resource. The raw data from microblog prove indispensable in extracting information from it, offering a way to single out the physical events and popular topics prevalent in social media. This study aims to present and review the varied methods carried out for event detection from microblogs. An event is an activity or action with a clear finite duration in which the target entity plays a key role. Event detection helps in the timely understanding of people’s opinion and actual condition of the detected events.

Design/methodology/approach

This paper presents a study of various approaches adopted for event detection from microblogs. The approaches are reviewed according to the techniques used, applications and the element detected (event or topic).

Findings

Various ideas explored, important observations inferred, corresponding outcomes and assessment of results from those approaches are discussed.

Originality/value

The approaches and techniques for event detection are studied in two categories: first, based on the kind of event being detected (physical occurrence or emerging/popular topic) and second, within each category, the approaches further categorized into supervised- and unsupervised-based techniques.

Article
Publication date: 8 August 2008

Alexander Ivanyukovich, Maurizio Marchese and Fausto Giunchiglia

The purpose of this paper is to provide support for automation of the annotation process of large corpora of digital content.

1250

Abstract

Purpose

The purpose of this paper is to provide support for automation of the annotation process of large corpora of digital content.

Design/methodology/approach

The paper presents and discusses an information extraction pipeline from digital document acquisition to information extraction, processing and management. An overall architecture that supports such an extraction pipeline is detailed and discussed.

Findings

The proposed pipeline is implemented in a working prototype of an autonomous digital library (A‐DL) system called ScienceTreks that: supports a broad range of methods for document acquisition; does not rely on any external information sources and is solely based on the existing information in the document itself and in the overall set in a given digital archive; and provides application programming interfaces (API) to support easy integration of external systems and tools in the existing pipeline.

Practical implications

The proposed A‐DL system can be used in automating end‐to‐end information retrieval and processing, supporting the control and elimination of error‐prone human intervention in the process.

Originality/value

High quality automatic metadata extraction is a crucial step in the move from linguistic entities to logical entities, relation information and logical relations, and therefore to the semantic level of digital library usability. This in turn creates the opportunity for value‐added services within existing and future semantic‐enabled digital library systems.

Details

Online Information Review, vol. 32 no. 4
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 21 June 2023

Debasis Majhi and Bhaskar Mukherjee

The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where…

Abstract

Purpose

The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where natural language processing (NLP) is being applied significantly.

Design/methodology/approach

By excavating international databases, 3,087 core papers that received at least 5% of the total citations have been identified. By calculating the average mean years of these core papers, and total citations received, a CPT (citation/publication/time) value was calculated in all 20 fronts to understand how a front is relatively receiving greater attention among peers within a course of time. One theme article has been finally identified from each of these 20 fronts.

Findings

Bidirectional encoder representations from transformers with CPT value 1.608 followed by sentiment analysis with CPT 1.292 received highest attention in NLP research. Columbia University New York, in terms of University, Journal of the American Medical Informatics Association, in terms of journals, USA followed by People Republic of China, in terms of country and Xu, H., University of Texas, in terms of author are the top in these fronts. It is identified that the NLP applications boost the performance of digital libraries and automated library systems in the digital environment.

Practical implications

Any research fronts that are identified in the findings of this paper may be used as a base for researchers who intended to perform extensive research on NLP.

Originality/value

To the best of the authors’ knowledge, the methodology adopted in this paper is the first of its kind where meta-analysis approach has been used for understanding the research fronts in sub field like NLP for a broad domain like LIS.

Details

Digital Library Perspectives, vol. 39 no. 3
Type: Research Article
ISSN: 2059-5816

Keywords

Article
Publication date: 14 August 2017

Wei Xu, Lingyu Liu and Wei Shang

Timely detection of emergency events and effective tracking of corresponding public opinions are critical in emergency management. As media are immediate sources of information on…

Abstract

Purpose

Timely detection of emergency events and effective tracking of corresponding public opinions are critical in emergency management. As media are immediate sources of information on emergencies, the purpose of this paper is to propose cross-media analytics to detect and track emergency events and provide decision support for government and emergency management departments.

Design/methodology/approach

In this paper, a novel emergency event detection and opinion mining method is proposed for emergency management using cross-media analytics. In the proposed approach, an event detection module is constructed to discover emergency events based on cross-media analytics, and after the detected event is confirmed as an emergency event, an opinion mining module is used to analyze public sentiments and then generate public sentiment time series for early warning via a semantic expansion technique.

Findings

Empirical results indicate that a specific emergency can be detected and that public opinion can be tracked effectively and efficiently using cross-media analytics. In addition, the proposed system can be used for decision support and real-time response for government and emergency management departments.

Research limitations/implications

This paper takes full advantage of cross-media information and proposes novel emergency event detection and opinion mining methods for emergency management using cross-media analytics. The empirical analysis results illustrate the efficiency of the proposed method.

Practical implications

The proposed method can be applied for detection of emergency events and tracking of public opinions for emergency decision support and governmental real-time response.

Originality/value

This research work contributes to the design of a decision support system for emergency event detection and opinion mining. In the proposed approaches, emergency events are detected by leveraging cross-media analytics, and public sentiments are measured using an auto-expansion of the domain dictionary in the field of emergency management to eliminate the misclassification of the general dictionary and to make the quantization more accurate.

Details

Online Information Review, vol. 41 no. 4
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 15 January 2018

Wei Lu, Heng Ding and Jiepu Jiang

The purpose of this paper is to utilize document expansion techniques for improving image representation and retrieval. This paper proposes a concise framework for tag-based image…

Abstract

Purpose

The purpose of this paper is to utilize document expansion techniques for improving image representation and retrieval. This paper proposes a concise framework for tag-based image retrieval (TBIR).

Design/methodology/approach

The proposed approach includes three core components: a strategy of selecting expansion (similar) images from the whole corpus (e.g. cluster-based or nearest neighbor-based); a technique for assessing image similarity, which is adopted for selecting expansion images (text, image, or mixed); and a model for matching the expanded image representation with the search query (merging or separate).

Findings

The results show that applying the proposed method yields significant improvements in effectiveness, and the method obtains better performance on the top of the rank and makes a great improvement on some topics with zero score in baseline. Moreover, nearest neighbor-based expansion strategy outperforms the cluster-based expansion strategy, and using image features for selecting expansion images is better than using text features in most cases, and the separate method for calculating the augmented probability P(q|RD) is able to erase the negative influences of error images in RD.

Research limitations/implications

Despite these methods only outperform on the top of the rank instead of the entire rank list, TBIR on mobile platforms still can benefit from this approach.

Originality/value

Unlike former studies addressing the sparsity, vocabulary mismatch, and tag relatedness in TBIR individually, the approach proposed by this paper addresses all these issues with a single document expansion framework. It is a comprehensive investigation of document expansion techniques in TBIR.

Details

Aslib Journal of Information Management, vol. 70 no. 1
Type: Research Article
ISSN: 2050-3806

Keywords

1 – 10 of 335