Search results

1 – 10 of over 6000
Article
Publication date: 6 February 2017

Aytug Onan

The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in…

Abstract

Purpose

The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in information retrieval, such as document organization, text filtering and sentiment analysis. Ensemble learning has been extensively studied to construct efficient text classification schemes with higher predictive performance and generalization ability. The purpose of this paper is to provide diversity among the classification algorithms of ensemble, which is a key issue in the ensemble design.

Design/methodology/approach

An ensemble scheme based on hybrid supervised clustering is presented for text classification. In the presented scheme, supervised hybrid clustering, which is based on cuckoo search algorithm and k-means, is introduced to partition the data samples of each class into clusters so that training subsets with higher diversities can be provided. Each classifier is trained on the diversified training subsets and the predictions of individual classifiers are combined by the majority voting rule. The predictive performance of the proposed classifier ensemble is compared to conventional classification algorithms (such as Naïve Bayes, logistic regression, support vector machines and C4.5 algorithm) and ensemble learning methods (such as AdaBoost, bagging and random subspace) using 11 text benchmarks.

Findings

The experimental results indicate that the presented classifier ensemble outperforms the conventional classification algorithms and ensemble learning methods for text classification.

Originality/value

The presented ensemble scheme is the first to use supervised clustering to obtain diverse ensemble for text classification

Details

Kybernetes, vol. 46 no. 2
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 19 June 2017

Qingchen Qiu, Xuelian Wu, Zhi Liu, Bo Tang, Yuefeng Zhao, Xinyi Wu, Hongliang Zhu and Yang Xin

This paper aims to provide a framework of the supervised hyperspectral classification, to study the traditional flowchart of hyperspectral image (HIS) analysis and processing. HSI…

Abstract

Purpose

This paper aims to provide a framework of the supervised hyperspectral classification, to study the traditional flowchart of hyperspectral image (HIS) analysis and processing. HSI technology has been proposed for many years, and the applications of this technology were promoted by technical advancements.

Design/methodology/approach

First, the properties and current situation of hyperspectral technology are summarized. Then, this paper introduces a series of common classification approaches. In addition, a comparison of different classification approaches on real hyperspectral data is conducted. Finally, this survey presents a discussion on the classification results and points out the classification development tendency.

Findings

The core of this survey is to review of the state of the art of the classification for hyperspectral images, to study the performance and efficiency of certain implementation measures and to point out the challenges still exist.

Originality value

The study categorized the supervised classification for hyperspectral images, demonstrated the comparisons among these methods and pointed out the challenges that still exist.

Details

Sensor Review, vol. 37 no. 3
Type: Research Article
ISSN: 0260-2288

Keywords

Article
Publication date: 10 June 2021

Rebecca Tonietto, Lara O’Brien, Cyrus Van Haitsma, Chenyang Su, Nicole Blankertz, Hannah Grace Shaheen Mosiniak, Caleb Short and Heather Ann Dawson

The University of Michigan (U-M) is planning its course toward carbon neutrality. A key component in U-M carbon accounting is the calculation of carbon sinks via estimation of…

Abstract

Purpose

The University of Michigan (U-M) is planning its course toward carbon neutrality. A key component in U-M carbon accounting is the calculation of carbon sinks via estimation of carbon storage and biosequestration on U-M landholdings. Here, this paper aims to compare multiple remote sensing methods across U-M natural lands and urban campuses to determine the accurate and efficient protocol for land assessment and ecosystem service valuation that other institutions may scale as relevant.

Design/methodology/approach

This paper tested three remote sensing methods to determine land use and land cover (LULC), namely, unsupervised classification, supervised classification and supervised classification incorporating delineated wetlands. Using confusion matrices, this paper tested remote sensing approaches to ground-truthed data, the paper obtained via field-based vegetation surveys across a subset of U-M landholdings.

Findings

In natural areas, supervised classification incorporating delineated wetlands was the most accurate and efficient approach. In urban settings, maps incorporating institutional knowledge and campus tree surveys better estimated LULC. Using LULC and literature-based carbon data, this paper estimated that U-M lands store 1.37–3.68 million metric tons of carbon and sequester 45,000–86,000 Mt CO2e/yr, valued at $2.2m–$4.3m annually ($50/metric ton, social cost of carbon).

Originality/value

This paper compared methods to identify an efficient and accurate remote sensing methodology to identify LULC and estimate carbon storage, biosequestration rates and economic values of ecosystem services provided.

Details

International Journal of Sustainability in Higher Education, vol. 22 no. 5
Type: Research Article
ISSN: 1467-6370

Keywords

Article
Publication date: 5 April 2021

Nasser Assery, Yuan (Dorothy) Xiaohong, Qu Xiuli, Roy Kaushik and Sultan Almalki

This study aims to propose an unsupervised learning model to evaluate the credibility of disaster-related Twitter data and present a performance comparison with commonly used…

Abstract

Purpose

This study aims to propose an unsupervised learning model to evaluate the credibility of disaster-related Twitter data and present a performance comparison with commonly used supervised machine learning models.

Design/methodology/approach

First historical tweets on two recent hurricane events are collected via Twitter API. Then a credibility scoring system is implemented in which the tweet features are analyzed to give a credibility score and credibility label to the tweet. After that, supervised machine learning classification is implemented using various classification algorithms and their performances are compared.

Findings

The proposed unsupervised learning model could enhance the emergency response by providing a fast way to determine the credibility of disaster-related tweets. Additionally, the comparison of the supervised classification models reveals that the Random Forest classifier performs significantly better than the SVM and Logistic Regression classifiers in classifying the credibility of disaster-related tweets.

Originality/value

In this paper, an unsupervised 10-point scoring model is proposed to evaluate the tweets’ credibility based on the user-based and content-based features. This technique could be used to evaluate the credibility of disaster-related tweets on future hurricanes and would have the potential to enhance emergency response during critical events. The comparative study of different supervised learning methods has revealed effective supervised learning methods for evaluating the credibility of Tweeter data.

Details

Information Discovery and Delivery, vol. 50 no. 1
Type: Research Article
ISSN: 2398-6247

Keywords

Article
Publication date: 4 October 2018

Maha Al-Yahya

In the context of information retrieval, text genre is as important as its content, and knowledge of the text genre enhances the search engine features by providing customized…

Abstract

Purpose

In the context of information retrieval, text genre is as important as its content, and knowledge of the text genre enhances the search engine features by providing customized retrieval. The purpose of this study is to explore and evaluate the use of stylometric analysis, a quantitative analysis for the linguistics features of text, to support the task of automated text genre detection for Classical Arabic text.

Design/methodology/approach

Unsupervised clustering and supervised classification were applied on the King Saud University Corpus of Classical Arabic texts (KSUCCA) using the most frequent words in the corpus (MFWs) as stylometric features. Four popular distance measures established in stylometric research are evaluated for the genre detection task.

Findings

The results of the experiments show that stylometry-based genre clustering and classification align well with human-defined genre. The evidence suggests that genre style signals exist for Classical Arabic and can be used to support the task of automated genre detection.

Originality/value

This work targets the task of genre detection in Classical Arabic text using stylometric features, an approach that has only been previously applied to Arabic authorship attribution. The study also provides a comparison of four distance measures used in stylomtreic analysis on the KSUCCA, a corpus with over 50 million words of Classical Arabic using clustering and classification.

Details

The Electronic Library, vol. 36 no. 5
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 7 February 2023

Riju Bhattacharya, Naresh Kumar Nagwani and Sarsij Tripathi

A community demonstrates the unique qualities and relationships between its members that distinguish it from other communities within a network. Network analysis relies heavily on…

Abstract

Purpose

A community demonstrates the unique qualities and relationships between its members that distinguish it from other communities within a network. Network analysis relies heavily on community detection. Despite the traditional spectral clustering and statistical inference methods, deep learning techniques for community detection have grown in popularity due to their ease of processing high-dimensional network data. Graph convolutional neural networks (GCNNs) have received much attention recently and have developed into a potential and ubiquitous method for directly detecting communities on graphs. Inspired by the promising results of graph convolutional networks (GCNs) in analyzing graph structure data, a novel community graph convolutional network (CommunityGCN) as a semi-supervised node classification model has been proposed and compared with recent baseline methods graph attention network (GAT), GCN-based technique for unsupervised community detection and Markov random fields combined with graph convolutional network (MRFasGCN).

Design/methodology/approach

This work presents the method for identifying communities that combines the notion of node classification via message passing with the architecture of a semi-supervised graph neural network. Six benchmark datasets, namely, Cora, CiteSeer, ACM, Karate, IMDB and Facebook, have been used in the experimentation.

Findings

In the first set of experiments, the scaled normalized average matrix of all neighbor's features including the node itself was obtained, followed by obtaining the weighted average matrix of low-dimensional nodes. In the second set of experiments, the average weighted matrix was forwarded to the GCN with two layers and the activation function for predicting the node class was applied. The results demonstrate that node classification with GCN can improve the performance of identifying communities on graph datasets.

Originality/value

The experiment reveals that the CommunityGCN approach has given better results with accuracy, normalized mutual information, F1 and modularity scores of 91.26, 79.9, 92.58 and 70.5 per cent, respectively, for detecting communities in the graph network, which is much greater than the range of 55.7–87.07 per cent reported in previous literature. Thus, it has been concluded that the GCN with node classification models has improved the accuracy.

Details

Data Technologies and Applications, vol. 57 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 27 November 2020

Hoda Daou

Social media is characterized by its volume, its speed of generation and its easy and open access; all this making it an important source of information that provides valuable…

Abstract

Purpose

Social media is characterized by its volume, its speed of generation and its easy and open access; all this making it an important source of information that provides valuable insights. Content characteristics such as valence and emotions play an important role in the diffusion of information; in fact, emotions can shape virality of topics in social media. The purpose of this research is to fill the gap in event detection applied on online content by incorporating sentiment, more specifically strong sentiment, as main attribute in identifying relevant content.

Design/methodology/approach

The study proposes a methodology based on strong sentiment classification using machine learning and an advanced scoring technique.

Findings

The results show the following key findings: the proposed methodology is able to automatically capture trending topics and achieve better classification compared to state-of-the-art topic detection algorithms. In addition, the methodology is not context specific; it is able to successfully identify important events from various datasets within the context of politics, rallies, various news and real tragedies.

Originality/value

This study fills the gap of topic detection applied on online content by building on the assumption that important events trigger strong sentiment among the society. In addition, classic topic detection algorithms require tuning in terms of number of topics to search for. This methodology involves scoring the posts and, thus, does not require limiting the number topics; it also allows ordering the topics by relevance based on the value of the score.

Peer review

The peer review history for this article is available at: https://publons.com/publon/10.1108/OIR-12-2019-0373

Details

Online Information Review, vol. 45 no. 1
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 9 October 2019

Francisco Villarroel Ordenes and Shunyuan Zhang

The purpose of this paper is to describe and position the state-of-the-art of text and image mining methods in business research. By providing a detailed conceptual and technical…

3681

Abstract

Purpose

The purpose of this paper is to describe and position the state-of-the-art of text and image mining methods in business research. By providing a detailed conceptual and technical review of both methods, it aims to increase their utilization in service research.

Design/methodology/approach

On a first stage, the authors review business literature in marketing, operations and management concerning the use of text and image mining methods. On a second stage, the authors identify and analyze empirical papers that used text and image mining methods in services journals and premier business. Finally, avenues for further research in services are provided.

Findings

The manuscript identifies seven text mining methods and describes their approaches, processes, techniques and algorithms, involved in their implementation. Four of these methods are positioned similarly for image mining. There are 39 papers using text mining in service research, with a focus on measuring consumer sentiment, experiences, and service quality. Due to the nonexistent use of image mining service journals, the authors review their application in marketing and management, and suggest ideas for further research in services.

Research limitations/implications

This manuscript focuses on the different methods and their implementation in service research, but it does not offer a complete review of business literature using text and image mining methods.

Practical implications

The results have a number of implications for the discipline that are presented and discussed. The authors provide research directions using text and image mining methods in service priority areas such as artificial intelligence, frontline employees, transformative consumer research and customer experience.

Originality/value

The manuscript provides an introduction to text and image mining methods to service researchers and practitioners interested in the analysis of unstructured data. This paper provides several suggestions concerning the use of new sources of data (e.g. customer reviews, social media images, employee reviews and emails), measurement of new constructs (beyond sentiment and valence) and the use of more recent methods (e.g. deep learning).

Details

Journal of Service Management, vol. 30 no. 5
Type: Research Article
ISSN: 1757-5818

Keywords

Article
Publication date: 28 October 2019

Farshid Mirzaalian and Elizabeth Halpenny

The purpose of this paper is to provide a review of hospitality and tourism studies that have used social media analytics to collect, examine, summarize and interpret “big data”…

3329

Abstract

Purpose

The purpose of this paper is to provide a review of hospitality and tourism studies that have used social media analytics to collect, examine, summarize and interpret “big data” derived from social media. It proposes improved approaches by documenting past and current analytic practice addressed by the selected studies in social media analytics.

Design/methodology/approach

Studies from the past 18 years were identified and collected from five international electronic bibliographic databases. Social media analytics-related terms and keywords in the titles, keywords or abstracts were used to identify relevant articles. Book chapters, conference papers and articles not written in English were excluded from analysis. The preferred reporting items for systematic reviews and meta-analyses (PRISMA) guided the search, and Stieglitz and Dang-Xuan’s (2013) social media analytics framework was adapted to categorize methods reported in each article.

Findings

The research purpose of each study was identified and categorized to better understand the questions social media analytics were being used to address, as well as the frequency of each method’s use. Since 2014, rapid growth of social media analytics was observed, along with an expanded use of multiple analytic methods, including accuracy testing. These factors suggest an increased commitment to and competency in conducting comprehensive and robust social media data analyses. Improved use of methods such as social network analysis, comparative analysis and trend analysis is recommended. Consumer-review networks and social networking sites were the main social media platforms from which data were gathered; simultaneous analysis of multi-platform/sources of data is recommended to improve validity and comprehensive understanding.

Originality/value

This is the first systematic literature review of the application of social media analytics in hospitality and tourism research. The study highlights advancements in social media analytics and recommends an expansion of approaches; common analytical methods such as text analysis and sentiment analysis should be supplemented by infrequently used approaches such as comparative analysis and spatial analysis.

研究目的

本文对酒店旅游学科中采用社交媒体数据分析的文献进行梳理。本文通过审阅其相关分析方法的文献来提出分析方法的改进策略。

研究设计/方法/途径

样本数据包括过去18年中五个国际在线文献索引库中的文献。搜索通过标题、关键词、或者摘要中出现社交媒体数据分析等相关字样的文章。书章节、会议文章、以及非英文文章未被收录在索引中。系统回顾和文献综述的方法(PRISMA)指导本文文献索引, Stieglitz和Dang-Xuan(2013)社交媒体数据分析框架作为本文文献分类的方法。

研究结果

本文汇报了每篇文献的研究目的以及系统归类以更好理解社交媒体数据分析的研究问题以及每种方法的使用频率。自2014年起, 社交媒体数据分析快速增长, 以及其他相关分析方法, 包括精度测试(accuracy testing)。这些结果表明更多全面、稳定的分析方法需求增强以及竞争激烈。本文推荐使用改良方法, 比如社交网络分析法、比较分析、趋势分析等。消费者评价网络和社交网站成为主要社交媒体网络数据的提供平台。本文推荐多源数据应该同步分析以提高有效性和全面性的理解。

研究原创性/价值

本文是首篇酒店旅游领域中对社交媒体数据分析的系统文献回顾型文章。本文强调了社交媒体数据分析的先进性以及扩展其方法的全面性;常见分析方法比如文本分析和情感分析应该结合非常见的分析方法比如比较分析法和空间分析法进行系统分析。

关键词 – 关键词 对比分析, 情感分析, 用户原创内容,社交媒体分析, 主题模型, 空间分析, 文本分析文章类型 文献综述

Article
Publication date: 4 June 2020

Antonia Michael and Jan Eloff

Malicious activities conducted by disgruntled employees via an email platform can cause profound damage to an organization such as financial and reputational losses. This threat…

Abstract

Purpose

Malicious activities conducted by disgruntled employees via an email platform can cause profound damage to an organization such as financial and reputational losses. This threat is known as an “Insider IT Sabotage” threat. This involves employees misusing their access rights to harm the organization. Events leading up to the attack are not technical but rather behavioural. The problem is that owing to the high volume and complexity of emails, the risk of insider IT sabotage cannot be diminished with rule-based approaches.

Design/methodology/approach

Malicious human behaviours that insiders within the insider IT sabotage category would possess are studied and mapped to phrases that would appear in email communications. A large email data set is classified according to behavioural characteristics of these employees. Machine learning algorithms are used to identify occurrences of this insider threat type. The accuracy of these approaches is measured.

Findings

It is shown in this paper that suspicious behaviour of disgruntled employees can be discovered, by means of machine intelligence techniques. The output of the machine learning classifier depends mainly on the depth and quality of the phrases and behaviour analysis, cleansing and number of email attributes examined. This process of labelling content in isolation could be improved if other attributes of the email data are included, such that a confidence score can be computed for each user.

Originality/value

This research presents a novel approach to show that the creation of a prototype that can automate the detection of insider IT sabotage within email systems to mitigate the risk within organizations.

Details

Information & Computer Security, vol. 28 no. 4
Type: Research Article
ISSN: 2056-4961

Keywords

1 – 10 of over 6000