Search results

1 – 10 of over 4000
To view the access options for this content please click here
Article
Publication date: 6 February 2017

Aytug Onan

The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many…

Abstract

Purpose

The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in information retrieval, such as document organization, text filtering and sentiment analysis. Ensemble learning has been extensively studied to construct efficient text classification schemes with higher predictive performance and generalization ability. The purpose of this paper is to provide diversity among the classification algorithms of ensemble, which is a key issue in the ensemble design.

Design/methodology/approach

An ensemble scheme based on hybrid supervised clustering is presented for text classification. In the presented scheme, supervised hybrid clustering, which is based on cuckoo search algorithm and k-means, is introduced to partition the data samples of each class into clusters so that training subsets with higher diversities can be provided. Each classifier is trained on the diversified training subsets and the predictions of individual classifiers are combined by the majority voting rule. The predictive performance of the proposed classifier ensemble is compared to conventional classification algorithms (such as Naïve Bayes, logistic regression, support vector machines and C4.5 algorithm) and ensemble learning methods (such as AdaBoost, bagging and random subspace) using 11 text benchmarks.

Findings

The experimental results indicate that the presented classifier ensemble outperforms the conventional classification algorithms and ensemble learning methods for text classification.

Originality/value

The presented ensemble scheme is the first to use supervised clustering to obtain diverse ensemble for text classification

Details

Kybernetes, vol. 46 no. 2
Type: Research Article
ISSN: 0368-492X

Keywords

To view the access options for this content please click here
Article
Publication date: 19 June 2017

Qingchen Qiu, Xuelian Wu, Zhi Liu, Bo Tang, Yuefeng Zhao, Xinyi Wu, Hongliang Zhu and Yang Xin

This paper aims to provide a framework of the supervised hyperspectral classification, to study the traditional flowchart of hyperspectral image (HIS) analysis and…

Abstract

Purpose

This paper aims to provide a framework of the supervised hyperspectral classification, to study the traditional flowchart of hyperspectral image (HIS) analysis and processing. HSI technology has been proposed for many years, and the applications of this technology were promoted by technical advancements.

Design/methodology/approach

First, the properties and current situation of hyperspectral technology are summarized. Then, this paper introduces a series of common classification approaches. In addition, a comparison of different classification approaches on real hyperspectral data is conducted. Finally, this survey presents a discussion on the classification results and points out the classification development tendency.

Findings

The core of this survey is to review of the state of the art of the classification for hyperspectral images, to study the performance and efficiency of certain implementation measures and to point out the challenges still exist.

Originality value

The study categorized the supervised classification for hyperspectral images, demonstrated the comparisons among these methods and pointed out the challenges that still exist.

Details

Sensor Review, vol. 37 no. 3
Type: Research Article
ISSN: 0260-2288

Keywords

To view the access options for this content please click here
Article
Publication date: 1 October 2018

Maha Al-Yahya

In the context of information retrieval, text genre is as important as its content, and knowledge of the text genre enhances the search engine features by providing…

Abstract

Purpose

In the context of information retrieval, text genre is as important as its content, and knowledge of the text genre enhances the search engine features by providing customized retrieval. The purpose of this study is to explore and evaluate the use of stylometric analysis, a quantitative analysis for the linguistics features of text, to support the task of automated text genre detection for Classical Arabic text.

Design/methodology/approach

Unsupervised clustering and supervised classification were applied on the King Saud University Corpus of Classical Arabic texts (KSUCCA) using the most frequent words in the corpus (MFWs) as stylometric features. Four popular distance measures established in stylometric research are evaluated for the genre detection task.

Findings

The results of the experiments show that stylometry-based genre clustering and classification align well with human-defined genre. The evidence suggests that genre style signals exist for Classical Arabic and can be used to support the task of automated genre detection.

Originality/value

This work targets the task of genre detection in Classical Arabic text using stylometric features, an approach that has only been previously applied to Arabic authorship attribution. The study also provides a comparison of four distance measures used in stylomtreic analysis on the KSUCCA, a corpus with over 50 million words of Classical Arabic using clustering and classification.

Details

The Electronic Library, vol. 36 no. 5
Type: Research Article
ISSN: 0264-0473

Keywords

To view the access options for this content please click here
Article
Publication date: 27 November 2020

Hoda Daou

Social media is characterized by its volume, its speed of generation and its easy and open access; all this making it an important source of information that provides…

Abstract

Purpose

Social media is characterized by its volume, its speed of generation and its easy and open access; all this making it an important source of information that provides valuable insights. Content characteristics such as valence and emotions play an important role in the diffusion of information; in fact, emotions can shape virality of topics in social media. The purpose of this research is to fill the gap in event detection applied on online content by incorporating sentiment, more specifically strong sentiment, as main attribute in identifying relevant content.

Design/methodology/approach

The study proposes a methodology based on strong sentiment classification using machine learning and an advanced scoring technique.

Findings

The results show the following key findings: the proposed methodology is able to automatically capture trending topics and achieve better classification compared to state-of-the-art topic detection algorithms. In addition, the methodology is not context specific; it is able to successfully identify important events from various datasets within the context of politics, rallies, various news and real tragedies.

Originality/value

This study fills the gap of topic detection applied on online content by building on the assumption that important events trigger strong sentiment among the society. In addition, classic topic detection algorithms require tuning in terms of number of topics to search for. This methodology involves scoring the posts and, thus, does not require limiting the number topics; it also allows ordering the topics by relevance based on the value of the score.

Peer review

The peer review history for this article is available at: https://publons.com/publon/10.1108/OIR-12-2019-0373

Details

Online Information Review, vol. 45 no. 1
Type: Research Article
ISSN: 1468-4527

Keywords

Content available
Article
Publication date: 29 November 2019

Francisco Villarroel Ordenes and Shunyuan Zhang

The purpose of this paper is to describe and position the state-of-the-art of text and image mining methods in business research. By providing a detailed conceptual and…

Abstract

Purpose

The purpose of this paper is to describe and position the state-of-the-art of text and image mining methods in business research. By providing a detailed conceptual and technical review of both methods, it aims to increase their utilization in service research.

Design/methodology/approach

On a first stage, the authors review business literature in marketing, operations and management concerning the use of text and image mining methods. On a second stage, the authors identify and analyze empirical papers that used text and image mining methods in services journals and premier business. Finally, avenues for further research in services are provided.

Findings

The manuscript identifies seven text mining methods and describes their approaches, processes, techniques and algorithms, involved in their implementation. Four of these methods are positioned similarly for image mining. There are 39 papers using text mining in service research, with a focus on measuring consumer sentiment, experiences, and service quality. Due to the nonexistent use of image mining service journals, the authors review their application in marketing and management, and suggest ideas for further research in services.

Research limitations/implications

This manuscript focuses on the different methods and their implementation in service research, but it does not offer a complete review of business literature using text and image mining methods.

Practical implications

The results have a number of implications for the discipline that are presented and discussed. The authors provide research directions using text and image mining methods in service priority areas such as artificial intelligence, frontline employees, transformative consumer research and customer experience.

Originality/value

The manuscript provides an introduction to text and image mining methods to service researchers and practitioners interested in the analysis of unstructured data. This paper provides several suggestions concerning the use of new sources of data (e.g. customer reviews, social media images, employee reviews and emails), measurement of new constructs (beyond sentiment and valence) and the use of more recent methods (e.g. deep learning).

Details

Journal of Service Management, vol. 30 no. 5
Type: Research Article
ISSN: 1757-5818

Keywords

To view the access options for this content please click here
Article
Publication date: 27 November 2019

Farshid Mirzaalian and Elizabeth Halpenny

The purpose of this paper is to provide a review of hospitality and tourism studies that have used social media analytics to collect, examine, summarize and interpret “big…

Abstract

Purpose

The purpose of this paper is to provide a review of hospitality and tourism studies that have used social media analytics to collect, examine, summarize and interpret “big data” derived from social media. It proposes improved approaches by documenting past and current analytic practice addressed by the selected studies in social media analytics.

Design/methodology/approach

Studies from the past 18 years were identified and collected from five international electronic bibliographic databases. Social media analytics-related terms and keywords in the titles, keywords or abstracts were used to identify relevant articles. Book chapters, conference papers and articles not written in English were excluded from analysis. The preferred reporting items for systematic reviews and meta-analyses (PRISMA) guided the search, and Stieglitz and Dang-Xuan’s (2013) social media analytics framework was adapted to categorize methods reported in each article.

Findings

The research purpose of each study was identified and categorized to better understand the questions social media analytics were being used to address, as well as the frequency of each method’s use. Since 2014, rapid growth of social media analytics was observed, along with an expanded use of multiple analytic methods, including accuracy testing. These factors suggest an increased commitment to and competency in conducting comprehensive and robust social media data analyses. Improved use of methods such as social network analysis, comparative analysis and trend analysis is recommended. Consumer-review networks and social networking sites were the main social media platforms from which data were gathered; simultaneous analysis of multi-platform/sources of data is recommended to improve validity and comprehensive understanding.

Originality/value

This is the first systematic literature review of the application of social media analytics in hospitality and tourism research. The study highlights advancements in social media analytics and recommends an expansion of approaches; common analytical methods such as text analysis and sentiment analysis should be supplemented by infrequently used approaches such as comparative analysis and spatial analysis.

研究目的

本文对酒店旅游学科中采用社交媒体数据分析的文献进行梳理。本文通过审阅其相关分析方法的文献来提出分析方法的改进策略。

研究设计/方法/途径

样本数据包括过去18年中五个国际在线文献索引库中的文献。搜索通过标题、关键词、或者摘要中出现社交媒体数据分析等相关字样的文章。书章节、会议文章、以及非英文文章未被收录在索引中。系统回顾和文献综述的方法(PRISMA)指导本文文献索引, Stieglitz和Dang-Xuan(2013)社交媒体数据分析框架作为本文文献分类的方法。

研究结果

本文汇报了每篇文献的研究目的以及系统归类以更好理解社交媒体数据分析的研究问题以及每种方法的使用频率。自2014年起, 社交媒体数据分析快速增长, 以及其他相关分析方法, 包括精度测试(accuracy testing)。这些结果表明更多全面、稳定的分析方法需求增强以及竞争激烈。本文推荐使用改良方法, 比如社交网络分析法、比较分析、趋势分析等。消费者评价网络和社交网站成为主要社交媒体网络数据的提供平台。本文推荐多源数据应该同步分析以提高有效性和全面性的理解。

研究原创性/价值

本文是首篇酒店旅游领域中对社交媒体数据分析的系统文献回顾型文章。本文强调了社交媒体数据分析的先进性以及扩展其方法的全面性;常见分析方法比如文本分析和情感分析应该结合非常见的分析方法比如比较分析法和空间分析法进行系统分析。

关键词 –关键词 对比分析, 情感分析, 用户原创内容,社交媒体分析, 主题模型, 空间分析, 文本分析文章类型 文献综述

To view the access options for this content please click here
Article
Publication date: 4 June 2020

Antonia Michael and Jan Eloff

Malicious activities conducted by disgruntled employees via an email platform can cause profound damage to an organization such as financial and reputational losses. This…

Abstract

Purpose

Malicious activities conducted by disgruntled employees via an email platform can cause profound damage to an organization such as financial and reputational losses. This threat is known as an “Insider IT Sabotage” threat. This involves employees misusing their access rights to harm the organization. Events leading up to the attack are not technical but rather behavioural. The problem is that owing to the high volume and complexity of emails, the risk of insider IT sabotage cannot be diminished with rule-based approaches.

Design/methodology/approach

Malicious human behaviours that insiders within the insider IT sabotage category would possess are studied and mapped to phrases that would appear in email communications. A large email data set is classified according to behavioural characteristics of these employees. Machine learning algorithms are used to identify occurrences of this insider threat type. The accuracy of these approaches is measured.

Findings

It is shown in this paper that suspicious behaviour of disgruntled employees can be discovered, by means of machine intelligence techniques. The output of the machine learning classifier depends mainly on the depth and quality of the phrases and behaviour analysis, cleansing and number of email attributes examined. This process of labelling content in isolation could be improved if other attributes of the email data are included, such that a confidence score can be computed for each user.

Originality/value

This research presents a novel approach to show that the creation of a prototype that can automate the detection of insider IT sabotage within email systems to mitigate the risk within organizations.

Details

Information & Computer Security, vol. 28 no. 4
Type: Research Article
ISSN: 2056-4961

Keywords

To view the access options for this content please click here
Book part
Publication date: 13 June 2013

Li Xiao, Hye-jin Kim and Min Ding

Purpose – The advancement of multimedia technology has spurred the use of multimedia in business practice. The adoption of audio and visual data will accelerate as…

Abstract

Purpose – The advancement of multimedia technology has spurred the use of multimedia in business practice. The adoption of audio and visual data will accelerate as marketing scholars become more aware of the value of audio and visual data and the technologies required to reveal insights into marketing problems. This chapter aims to introduce marketing scholars into this field of research.Design/methodology/approach – This chapter reviews the current technology in audio and visual data analysis and discusses rewarding research opportunities in marketing using these data.Findings – Compared with traditional data like survey and scanner data, audio and visual data provides richer information and is easier to collect. Given these superiority, data availability, feasibility of storage, and increasing computational power, we believe that these data will contribute to better marketing practices with the help of marketing scholars in the near future.Practical implications: The adoption of audio and visual data in marketing practices will help practitioners to get better insights into marketing problems and thus make better decisions.Value/originality – This chapter makes first attempt in the marketing literature to review the current technology in audio and visual data analysis and proposes promising applications of such technology. We hope it will inspire scholars to utilize audio and visual data in marketing research.

Details

Review of Marketing Research
Type: Book
ISBN: 978-1-78190-761-0

Keywords

To view the access options for this content please click here
Article
Publication date: 1 July 2019

Nora Madi, Rawan Al-Matham and Hend Al-Khalifa

The purpose of this paper is to provide an overall review of grammar checking and relation extraction (RE) literature, their techniques and the open challenges associated…

Abstract

Purpose

The purpose of this paper is to provide an overall review of grammar checking and relation extraction (RE) literature, their techniques and the open challenges associated with them; and, finally, suggest future directions.

Design/methodology/approach

The review on grammar checking and RE was carried out using the following protocol: we prepared research questions, planed for searching strategy, addressed paper selection criteria to distinguish relevant works, extracted data from these works, and finally, analyzed and synthesized the data.

Findings

The output of error detection models could be used for creating a profile of a certain writer. Such profiles can be used for author identification, native language identification or even the level of education, to name a few. The automatic extraction of relations could be used to build or complete electronic lexical thesauri and knowledge bases.

Originality/value

Grammar checking is the process of detecting and sometimes correcting erroneous words in the text, while RE is the process of detecting and categorizing predefined relationships between entities or words that were identified in the text. The authors found that the most obvious challenge is the lack of data sets, especially for low-resource languages. Also, the lack of unified evaluation methods hinders the ability to compare results.

Details

Data Technologies and Applications, vol. 53 no. 3
Type: Research Article
ISSN: 2514-9288

Keywords

To view the access options for this content please click here
Article
Publication date: 20 December 2007

Isak Taksa, Sarah Zelikovitz and Amanda Spink

The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users.

Abstract

Purpose

The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users.

Design/methodology/approach

The authors describe an iterative system that begins with a small set of manually labeled terms, which are used to label queries from the log. A set of background knowledge related to these labeled queries is acquired by combining web search results on these queries. This background set is used to obtain many terms that are related to the classification task. The system then ranks each of the related terms, choosing those that most fit the personal properties of the users. These terms are then used to begin the next iteration.

Findings

The authors identify the difficulties of classifying web logs, by approaching this problem from a machine learning perspective. By applying the approach developed, the authors are able to show that many queries in a large query log can be classified.

Research limitations/implications

Testing results in this type of classification work is difficult, as the true personal properties of web users are unknown. Evaluation of the classification results in terms of the comparison of classified queries to well known age‐related sites is a direction that is currently being exploring.

Practical implications

This research is background work that can be incorporated in search engines or other web‐based applications, to help marketing companies and advertisers.

Originality/value

This research enhances the current state of knowledge in short‐text classification and query log learning.

Details

International Journal of Web Information Systems, vol. 3 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 10 of over 4000