Search results
1 – 10 of 335Oussama Ayoub, Christophe Rodrigues and Nicolas Travers
This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data…
Abstract
Purpose
This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data that modern IR systems have to manage, existing solutions are needed to efficiently find the best set of documents for a given request. The words used to describe a query can differ from those used in related documents. Despite meaning closeness, nonoverlapping words are challenging for IR systems. This word gap becomes significant for long documents from specific domains.
Design/methodology/approach
To generate new words for a document, a deep learning (DL) masked language model is used to infer related words. Used DL models are pretrained on massive text data and carry common or specific domain knowledge to propose a better document representation.
Findings
The authors evaluate the approach of this study on specific IR domains with long documents to show the genericity of the proposed model and achieve encouraging results.
Originality/value
In this paper, to the best of the authors’ knowledge, an original unsupervised and modular IR system based on recent DL methods is introduced.
Details
Keywords
Mihaela Dinsoreanu and Rodica Potolea
The purpose of this paper is to address the challenge of opinion mining in text documents to perform further analysis such as community detection and consistency control. More…
Abstract
Purpose
The purpose of this paper is to address the challenge of opinion mining in text documents to perform further analysis such as community detection and consistency control. More specifically, we aim to identify and extract opinions from natural language documents and to represent them in a structured manner to identify communities of opinion holders based on their common opinions. Another goal is to rapidly identify similar or contradictory opinions on a target issued by different holders.
Design/methodology/approach
For the opinion extraction problem we opted for a supervised approach focusing on the feature selection problem to improve our classification results. On the community detection problem, we rely on the Infomap community detection algorithm and the multi-scale community detection framework used on a graph representation based on the available opinions and social data.
Findings
The classification performance in terms of precision and recall was significantly improved by adding a set of “meta-features” based on grouping rules of certain part of speech (POS) instead of the actual words. Concerning the evaluation of the community detection feature, we have used two quality metrics: the network modularity and the normalized mutual information (NMI). We evaluated seven one-target similarity functions and ten multi-target aggregation functions and concluded that linear functions perform poorly for data sets with multiple targets, while functions that calculate the average similarity have greater resilience to noise.
Originality/value
Although our solution relies on existing approaches, we managed to adapt and integrate them in an efficient manner. Based on the initial experimental results obtained, we managed to integrate original enhancements to improve the performance of the obtained results.
Details
Keywords
Chuanming Yu, Haodong Xue, Manyi Wang and Lu An
Owing to the uneven distribution of annotated corpus among different languages, it is necessary to bridge the gap between low resource languages and high resource languages. From…
Abstract
Purpose
Owing to the uneven distribution of annotated corpus among different languages, it is necessary to bridge the gap between low resource languages and high resource languages. From the perspective of entity relation extraction, this paper aims to extend the knowledge acquisition task from a single language context to a cross-lingual context, and to improve the relation extraction performance for low resource languages.
Design/methodology/approach
This paper proposes a cross-lingual adversarial relation extraction (CLARE) framework, which decomposes cross-lingual relation extraction into parallel corpus acquisition and adversarial adaptation relation extraction. Based on the proposed framework, this paper conducts extensive experiments in two tasks, i.e. the English-to-Chinese and the English-to-Arabic cross-lingual entity relation extraction.
Findings
The Macro-F1 values of the optimal models in the two tasks are 0.880 1 and 0.789 9, respectively, indicating that the proposed CLARE framework for CLARE can significantly improve the effect of low resource language entity relation extraction. The experimental results suggest that the proposed framework can effectively transfer the corpus as well as the annotated tags from English to Chinese and Arabic. This study reveals that the proposed approach is less human labour intensive and more effective in the cross-lingual entity relation extraction than the manual method. It shows that this approach has high generalizability among different languages.
Originality/value
The research results are of great significance for improving the performance of the cross-lingual knowledge acquisition. The cross-lingual transfer may greatly reduce the time and cost of the manual construction of the multi-lingual corpus. It sheds light on the knowledge acquisition and organization from the unstructured text in the era of big data.
Details
Keywords
Farshid Mirzaalian and Elizabeth Halpenny
The purpose of this paper is to provide a review of hospitality and tourism studies that have used social media analytics to collect, examine, summarize and interpret “big data”…
Abstract
Purpose
The purpose of this paper is to provide a review of hospitality and tourism studies that have used social media analytics to collect, examine, summarize and interpret “big data” derived from social media. It proposes improved approaches by documenting past and current analytic practice addressed by the selected studies in social media analytics.
Design/methodology/approach
Studies from the past 18 years were identified and collected from five international electronic bibliographic databases. Social media analytics-related terms and keywords in the titles, keywords or abstracts were used to identify relevant articles. Book chapters, conference papers and articles not written in English were excluded from analysis. The preferred reporting items for systematic reviews and meta-analyses (PRISMA) guided the search, and Stieglitz and Dang-Xuan’s (2013) social media analytics framework was adapted to categorize methods reported in each article.
Findings
The research purpose of each study was identified and categorized to better understand the questions social media analytics were being used to address, as well as the frequency of each method’s use. Since 2014, rapid growth of social media analytics was observed, along with an expanded use of multiple analytic methods, including accuracy testing. These factors suggest an increased commitment to and competency in conducting comprehensive and robust social media data analyses. Improved use of methods such as social network analysis, comparative analysis and trend analysis is recommended. Consumer-review networks and social networking sites were the main social media platforms from which data were gathered; simultaneous analysis of multi-platform/sources of data is recommended to improve validity and comprehensive understanding.
Originality/value
This is the first systematic literature review of the application of social media analytics in hospitality and tourism research. The study highlights advancements in social media analytics and recommends an expansion of approaches; common analytical methods such as text analysis and sentiment analysis should be supplemented by infrequently used approaches such as comparative analysis and spatial analysis.
研究目的
本文对酒店旅游学科中采用社交媒体数据分析的文献进行梳理。本文通过审阅其相关分析方法的文献来提出分析方法的改进策略。
研究设计/方法/途径
样本数据包括过去18年中五个国际在线文献索引库中的文献。搜索通过标题、关键词、或者摘要中出现社交媒体数据分析等相关字样的文章。书章节、会议文章、以及非英文文章未被收录在索引中。系统回顾和文献综述的方法(PRISMA)指导本文文献索引, Stieglitz和Dang-Xuan(2013)社交媒体数据分析框架作为本文文献分类的方法。
研究结果
本文汇报了每篇文献的研究目的以及系统归类以更好理解社交媒体数据分析的研究问题以及每种方法的使用频率。自2014年起, 社交媒体数据分析快速增长, 以及其他相关分析方法, 包括精度测试(accuracy testing)。这些结果表明更多全面、稳定的分析方法需求增强以及竞争激烈。本文推荐使用改良方法, 比如社交网络分析法、比较分析、趋势分析等。消费者评价网络和社交网站成为主要社交媒体网络数据的提供平台。本文推荐多源数据应该同步分析以提高有效性和全面性的理解。
研究原创性/价值
本文是首篇酒店旅游领域中对社交媒体数据分析的系统文献回顾型文章。本文强调了社交媒体数据分析的先进性以及扩展其方法的全面性;常见分析方法比如文本分析和情感分析应该结合非常见的分析方法比如比较分析法和空间分析法进行系统分析。
关键词 – 关键词 对比分析, 情感分析, 用户原创内容,社交媒体分析, 主题模型, 空间分析, 文本分析文章类型 文献综述
Details
Keywords
Issa Alsmadi and Keng Hoon Gan
Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type…
Abstract
Purpose
Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type of document based on their content has a significant implication in many applications. The need to classify these documents in relevant classes according to their text contents should be interested in many practical reasons. Short-text classification is an essential step in many applications, such as spam filtering, sentiment analysis, Twitter personalization, customer review and many other applications related to social networks. Reviews on short text and its application are limited. Thus, this paper aims to discuss the characteristics of short text, its challenges and difficulties in classification. The paper attempt to introduce all stages in principle classification, the technique used in each stage and the possible development trend in each stage.
Design/methodology/approach
The paper as a review of the main aspect of short-text classification. The paper is structured based on the classification task stage.
Findings
This paper discusses related issues and approaches to these problems. Further research could be conducted to address the challenges in short texts and avoid poor accuracy in classification. Problems in low performance can be solved by using optimized solutions, such as genetic algorithms that are powerful in enhancing the quality of selected features. Soft computing solution has a fuzzy logic that makes short-text problems a promising area of research.
Originality/value
Using a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. Current solutions still have low performance, implying the need for improvement. This paper discusses related issues and approaches to these problems.
Details
Keywords
Dharini Ramachandran and Parvathi Ramasubramanian
“What’s happening?” around you can be spread through the very pronounced social media to everybody. It provides a powerful platform that brings to light the latest news, trends…
Abstract
Purpose
“What’s happening?” around you can be spread through the very pronounced social media to everybody. It provides a powerful platform that brings to light the latest news, trends and happenings around the world in “near instant” time. Microblog is a popular Web service that enables users to post small pieces of digital content, such as text, picture, video and link to external resource. The raw data from microblog prove indispensable in extracting information from it, offering a way to single out the physical events and popular topics prevalent in social media. This study aims to present and review the varied methods carried out for event detection from microblogs. An event is an activity or action with a clear finite duration in which the target entity plays a key role. Event detection helps in the timely understanding of people’s opinion and actual condition of the detected events.
Design/methodology/approach
This paper presents a study of various approaches adopted for event detection from microblogs. The approaches are reviewed according to the techniques used, applications and the element detected (event or topic).
Findings
Various ideas explored, important observations inferred, corresponding outcomes and assessment of results from those approaches are discussed.
Originality/value
The approaches and techniques for event detection are studied in two categories: first, based on the kind of event being detected (physical occurrence or emerging/popular topic) and second, within each category, the approaches further categorized into supervised- and unsupervised-based techniques.
Details
Keywords
Alexander Ivanyukovich, Maurizio Marchese and Fausto Giunchiglia
The purpose of this paper is to provide support for automation of the annotation process of large corpora of digital content.
Abstract
Purpose
The purpose of this paper is to provide support for automation of the annotation process of large corpora of digital content.
Design/methodology/approach
The paper presents and discusses an information extraction pipeline from digital document acquisition to information extraction, processing and management. An overall architecture that supports such an extraction pipeline is detailed and discussed.
Findings
The proposed pipeline is implemented in a working prototype of an autonomous digital library (A‐DL) system called ScienceTreks that: supports a broad range of methods for document acquisition; does not rely on any external information sources and is solely based on the existing information in the document itself and in the overall set in a given digital archive; and provides application programming interfaces (API) to support easy integration of external systems and tools in the existing pipeline.
Practical implications
The proposed A‐DL system can be used in automating end‐to‐end information retrieval and processing, supporting the control and elimination of error‐prone human intervention in the process.
Originality/value
High quality automatic metadata extraction is a crucial step in the move from linguistic entities to logical entities, relation information and logical relations, and therefore to the semantic level of digital library usability. This in turn creates the opportunity for value‐added services within existing and future semantic‐enabled digital library systems.
Details
Keywords
Debasis Majhi and Bhaskar Mukherjee
The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where…
Abstract
Purpose
The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where natural language processing (NLP) is being applied significantly.
Design/methodology/approach
By excavating international databases, 3,087 core papers that received at least 5% of the total citations have been identified. By calculating the average mean years of these core papers, and total citations received, a CPT (citation/publication/time) value was calculated in all 20 fronts to understand how a front is relatively receiving greater attention among peers within a course of time. One theme article has been finally identified from each of these 20 fronts.
Findings
Bidirectional encoder representations from transformers with CPT value 1.608 followed by sentiment analysis with CPT 1.292 received highest attention in NLP research. Columbia University New York, in terms of University, Journal of the American Medical Informatics Association, in terms of journals, USA followed by People Republic of China, in terms of country and Xu, H., University of Texas, in terms of author are the top in these fronts. It is identified that the NLP applications boost the performance of digital libraries and automated library systems in the digital environment.
Practical implications
Any research fronts that are identified in the findings of this paper may be used as a base for researchers who intended to perform extensive research on NLP.
Originality/value
To the best of the authors’ knowledge, the methodology adopted in this paper is the first of its kind where meta-analysis approach has been used for understanding the research fronts in sub field like NLP for a broad domain like LIS.
Details
Keywords
Wei Xu, Lingyu Liu and Wei Shang
Timely detection of emergency events and effective tracking of corresponding public opinions are critical in emergency management. As media are immediate sources of information on…
Abstract
Purpose
Timely detection of emergency events and effective tracking of corresponding public opinions are critical in emergency management. As media are immediate sources of information on emergencies, the purpose of this paper is to propose cross-media analytics to detect and track emergency events and provide decision support for government and emergency management departments.
Design/methodology/approach
In this paper, a novel emergency event detection and opinion mining method is proposed for emergency management using cross-media analytics. In the proposed approach, an event detection module is constructed to discover emergency events based on cross-media analytics, and after the detected event is confirmed as an emergency event, an opinion mining module is used to analyze public sentiments and then generate public sentiment time series for early warning via a semantic expansion technique.
Findings
Empirical results indicate that a specific emergency can be detected and that public opinion can be tracked effectively and efficiently using cross-media analytics. In addition, the proposed system can be used for decision support and real-time response for government and emergency management departments.
Research limitations/implications
This paper takes full advantage of cross-media information and proposes novel emergency event detection and opinion mining methods for emergency management using cross-media analytics. The empirical analysis results illustrate the efficiency of the proposed method.
Practical implications
The proposed method can be applied for detection of emergency events and tracking of public opinions for emergency decision support and governmental real-time response.
Originality/value
This research work contributes to the design of a decision support system for emergency event detection and opinion mining. In the proposed approaches, emergency events are detected by leveraging cross-media analytics, and public sentiments are measured using an auto-expansion of the domain dictionary in the field of emergency management to eliminate the misclassification of the general dictionary and to make the quantization more accurate.
Details
Keywords
Wei Lu, Heng Ding and Jiepu Jiang
The purpose of this paper is to utilize document expansion techniques for improving image representation and retrieval. This paper proposes a concise framework for tag-based image…
Abstract
Purpose
The purpose of this paper is to utilize document expansion techniques for improving image representation and retrieval. This paper proposes a concise framework for tag-based image retrieval (TBIR).
Design/methodology/approach
The proposed approach includes three core components: a strategy of selecting expansion (similar) images from the whole corpus (e.g. cluster-based or nearest neighbor-based); a technique for assessing image similarity, which is adopted for selecting expansion images (text, image, or mixed); and a model for matching the expanded image representation with the search query (merging or separate).
Findings
The results show that applying the proposed method yields significant improvements in effectiveness, and the method obtains better performance on the top of the rank and makes a great improvement on some topics with zero score in baseline. Moreover, nearest neighbor-based expansion strategy outperforms the cluster-based expansion strategy, and using image features for selecting expansion images is better than using text features in most cases, and the separate method for calculating the augmented probability P(q|RD) is able to erase the negative influences of error images in RD.
Research limitations/implications
Despite these methods only outperform on the top of the rank instead of the entire rank list, TBIR on mobile platforms still can benefit from this approach.
Originality/value
Unlike former studies addressing the sparsity, vocabulary mismatch, and tag relatedness in TBIR individually, the approach proposed by this paper addresses all these issues with a single document expansion framework. It is a comprehensive investigation of document expansion techniques in TBIR.
Details