Search results

1 – 10 of 174
Article
Publication date: 7 November 2016

Ismail Hmeidi, Mahmoud Al-Ayyoub, Nizar A. Mahyoub and Mohammed A. Shehab

Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth…

Abstract

Purpose

Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth of online data and the increasing tendency of internet users to be more comfortable with assigning multiple labels/tags to describe documents, emails, posts, etc. The dimensionality of labels makes MTC more difficult and challenging compared with traditional single-labeled text classification (TC). Because it is a natural extension of TC, several ways are proposed to benefit from the rich literature of TC through what is called problem transformation (PT) methods. Basically, PT methods transform the multi-label data into a single-label one that is suitable for traditional single-label classification algorithms. Another approach is to design novel classification algorithms customized for MTC. Over the past decade, several works have appeared on both approaches focusing mainly on the English language. This work aims to present an elaborate study of MTC of Arabic articles.

Design/methodology/approach

This paper presents a novel lexicon-based method for MTC, where the keywords that are most associated with each label are extracted from the training data along with a threshold that can later be used to determine whether each test document belongs to a certain label.

Findings

The experiments show that the presented approach outperforms the currently available approaches. Specifically, the results of our experiments show that the best accuracy obtained from existing approaches is only 18 per cent, whereas the accuracy of the presented lexicon-based approach can reach an accuracy level of 31 per cent.

Originality/value

Although there exist some tools that can be customized to address the MTC problem for Arabic text, their accuracies are very low when applied to Arabic articles. This paper presents a novel method for MTC. The experiments show that the presented approach outperforms the currently available approaches.

Details

International Journal of Web Information Systems, vol. 12 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 8 April 2021

Mariem Bounabi, Karim Elmoutaouakil and Khalid Satori

This paper aims to present a new term weighting approach for text classification as a text mining task. The original method, neutrosophic term frequency – inverse term frequency…

Abstract

Purpose

This paper aims to present a new term weighting approach for text classification as a text mining task. The original method, neutrosophic term frequency – inverse term frequency (NTF-IDF), is an extended version of the popular fuzzy TF-IDF (FTF-IDF) and uses the neutrosophic reasoning to analyze and generate weights for terms in natural languages. The paper also propose a comparative study between the popular FTF-IDF and NTF-IDF and their impacts on different machine learning (ML) classifiers for document categorization goals.

Design/methodology/approach

After preprocessing textual data, the original Neutrosophic TF-IDF applies the neutrosophic inference system (NIS) to produce weights for terms representing a document. Using the local frequency TF, global frequency IDF and text N's length as NIS inputs, this study generate two neutrosophic weights for a given term. The first measure provides information on the relevance degree for a word, and the second one represents their ambiguity degree. Next, the Zhang combination function is applied to combine neutrosophic weights outputs and present the final term weight, inserted in the document's representative vector. To analyze the NTF-IDF impact on the classification phase, this study uses a set of ML algorithms.

Findings

Practicing the neutrosophic logic (NL) characteristics, the authors have been able to study the ambiguity of the terms and their degree of relevance to represent a document. NL's choice has proven its effectiveness in defining significant text vectorization weights, especially for text classification tasks. The experimentation part demonstrates that the new method positively impacts the categorization. Moreover, the adopted system's recognition rate is higher than 91%, an accuracy score not attained using the FTF-IDF. Also, using benchmarked data sets, in different text mining fields, and many ML classifiers, i.e. SVM and Feed-Forward Network, and applying the proposed term scores NTF-IDF improves the accuracy by 10%.

Originality/value

The novelty of this paper lies in two aspects. First, a new term weighting method, which uses the term frequencies as components to define the relevance and the ambiguity of term; second, the application of NL to infer weights is considered as an original model in this paper, which also aims to correct the shortcomings of the FTF-IDF which uses fuzzy logic and its drawbacks. The introduced technique was combined with different ML models to improve the accuracy and relevance of the obtained feature vectors to fed the classification mechanism.

Details

International Journal of Web Information Systems, vol. 17 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 4 December 2017

Fuzan Chen, Harris Wu, Runliang Dou and Minqiang Li

The purpose of this paper is to build a compact and accurate classifier for high-dimensional classification.

Abstract

Purpose

The purpose of this paper is to build a compact and accurate classifier for high-dimensional classification.

Design/methodology/approach

A classification approach based on class-dependent feature subspace (CFS) is proposed. CFS is a class-dependent integration of a support vector machine (SVM) classifier and associated discriminative features. For each class, our genetic algorithm (GA)-based approach evolves the best subset of discriminative features and SVM classifier simultaneously. To guarantee convergence and efficiency, the authors customize the GA in terms of encoding strategy, fitness evaluation, and genetic operators.

Findings

Experimental studies demonstrated that the proposed CFS-based approach is superior to other state-of-the-art classification algorithms on UCI data sets in terms of both concise interpretation and predictive power for high-dimensional data.

Research limitations/implications

UCI data sets rather than real industrial data are used to evaluate the proposed approach. In addition, only single-label classification is addressed in the study.

Practical implications

The proposed method not only constructs an accurate classification model but also obtains a compact combination of discriminative features. It is helpful for business makers to get a concise understanding of the high-dimensional data.

Originality/value

The authors propose a compact and effective classification approach for high-dimensional data. Instead of the same feature subset for all the classes, the proposed CFS-based approach obtains the optimal subset of discriminative feature and SVM classifier for each class. The proposed approach enhances both interpretability and predictive power for high-dimensional data.

Details

Industrial Management & Data Systems, vol. 117 no. 10
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 22 November 2019

Shuo Xu and Xin An

Image classification is becoming a supporting technology in several image-processing tasks. Due to rich semantic information contained in the images, it is very popular for an…

Abstract

Purpose

Image classification is becoming a supporting technology in several image-processing tasks. Due to rich semantic information contained in the images, it is very popular for an image to have several labels or tags. This paper aims to develop a novel multi-label classification approach with superior performance.

Design/methodology/approach

Many multi-label classification problems share two main characteristics: label correlations and label imbalance. However, most of current methods are devoted to either model label relationship or to only deal with unbalanced problem with traditional single-label methods. In this paper, multi-label classification problem is regarded as an unbalanced multi-task learning problem. Multi-task least-squares support vector machine (MTLS-SVM) is generalized for this problem, renamed as multi-label LS-SVM (ML2S-SVM).

Findings

Experimental results on the emotions, scene, yeast and bibtex data sets indicate that the ML2S-SVM is competitive with respect to the state-of-the-art methods in terms of Hamming loss and instance-based F1 score. The values of resulting parameters largely influence the performance of ML2S-SVM, so it is necessary for users to identify proper parameters in advance.

Originality/value

On the basis of MTLS-SVM, a novel multi-label classification approach, ML2S-SVM, is put forward. This method can overcome the unbalanced problem but also explicitly models arbitrary order correlations among labels by allowing multiple labels to share a subspace. In addition, the multi-label classification approach has a wider range of applications. That is to say, it is not limited to the field of image classification.

Details

The Electronic Library, vol. 37 no. 6
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 21 September 2012

Ahmet Soylu, Felix Mödritscher, Fridolin Wild, Patrick De Causmaecker and Piet Desmet

Mashups have been studied extensively in the literature; nevertheless, the large body of work in this area focuses on service/data level integration and leaves UI level…

Abstract

Purpose

Mashups have been studied extensively in the literature; nevertheless, the large body of work in this area focuses on service/data level integration and leaves UI level integration, hence UI mashups, almost unexplored. The latter generates digital environments in which participating sources exist as individual entities; member applications and data sources share the same graphical space particularly in the form of widgets. However, the true integration can only be realized through enabling widgets to be responsive to the events happening in each other. The authors call such an integration “widget orchestration” and the resulting application “mashup by orchestration”. This article aims to explore and address challenges regarding the realization of widget‐based UI mashups and UI level integration, prominently in terms of widget orchestration, and to assess their suitability for building web‐based personal environments.

Design/methodology/approach

The authors provide a holistic view on mashups and a theoretical grounding for widget‐based personal environments. The authors identify the following challenges: widget interoperability, end‐user data mobility as a basis for manual widget orchestration, user behavior mining – for extracting behavioral patterns – as a basis for automated widget orchestration, and infrastructure. The authors introduce functional widget interfaces for application interoperability, exploit semantic web technologies for data interoperability, and realize end‐user data mobility on top of this interoperability framework. The authors employ semantically enhanced workflow/process mining techniques, along with Petri nets as a formal ground, for user behavior mining. The authors outline a reference platform and architecture that is compliant with the authors' strategies, and extend W3C widget specification respectively – prominently with a communication channel – to foster standardization. The authors evaluate their solution approaches regarding interoperability and infrastructure through a qualitative comparison with respect to existing literature, and provide a computational evaluation of the behavior mining approach. The authors realize a prototype for a widget‐based personal learning environment for foreign language learning to demonstrate the feasibility of their solution strategies. The prototype is also used as a basis for the end‐user assessment of widget‐based personal environments and widget orchestration.

Findings

The evaluation results suggest that the interoperability framework, platform, and architecture have certain advantages over existing approaches, and the proposed behavior mining techniques are adequate for the extraction of behavioral patterns. User assessments show that widget‐based UI mashups with orchestration (i.e. mashups by orchestration) are promising for the creation of personal environments as well as for an enhanced user experience.

Originality/value

This article provides an extensive exploration of mashups by orchestration and their role in the creation of personal environments. Key challenges are described, along with novel solution strategies to meet them.

Article
Publication date: 6 February 2020

Diana Olivia, Ashalatha Nayak, Mamatha Balachandra and Jaison John

The purpose of this study is to develop an efficient prediction model using vital signs and standard medical score systems, which predicts the clinical severity level of the…

Abstract

Purpose

The purpose of this study is to develop an efficient prediction model using vital signs and standard medical score systems, which predicts the clinical severity level of the patient in advance based on the quick sequential organ failure assessment (qSOFA) medical score method.

Design/methodology/approach

To predict the clinical severity level of the patient in advance, the authors have formulated a training dataset that is constructed based on the qSOFA medical score method. Further, along with the multiple vital signs, different standard medical scores and their correlation features are used to build and improve the accuracy of the prediction model. It is made sure that the constructed training set is suitable for the severity level prediction because the formulated dataset has different clusters each corresponding to different severity levels according to qSOFA score.

Findings

From the experimental result, it is found that the inclusion of the standard medical scores and their correlation along with multiple vital signs improves the accuracy of the clinical severity level prediction model. In addition, the authors showed that the training dataset formulated from the temporal data (which includes vital signs and medical scores) based on the qSOFA medical scoring system has the clusters which correspond to each severity level in qSOFA score. Finally, it is found that RAndom k-labELsets multi-label classification performs better prediction of severity level compared to neural network-based multi-label classification.

Originality/value

This paper helps in identifying patient' clinical status.

Details

Information Discovery and Delivery, vol. 48 no. 1
Type: Research Article
ISSN: 2398-6247

Keywords

Article
Publication date: 25 January 2024

Kuan-Cheng Lin, Nien-Tzu Li and Mu-Yen Chen

As global issues such as climate change, economic growth, social equality and the wealth gap are widely discussed, education for sustainable development (ESD) allows every human…

Abstract

Purpose

As global issues such as climate change, economic growth, social equality and the wealth gap are widely discussed, education for sustainable development (ESD) allows every human being to acquire the knowledge, skills, attitudes and values necessary to shape a sustainable future. It also requires participatory teaching and learning methods that motivate and empower learners to change their behavior and take action for sustainable development. Teachers have begun rating pupils based on peer assessment for open evaluation. Peer assessment enables students to transition from passive to active feedback recipients. The assessors improve critical thinking and encourage introspection, resulting in more significant recommendations. However, the quality of peer assessment is variable, resulting in reviewers not recognizing the remarks of other reviewers, therefore the benefits of peer assessment cannot be fulfilled. In the past, researchers frequently employed post-event questionnaires to examine the effects of peer assessment on learning effectiveness, which did not accurately reflect the quality of peer assessment in real time.

Design/methodology/approach

This study employs a multi-label model and develops a self-feedback system in order to use the AIOLPA system in the classroom to enhance students' learning efficacy and the validity of peer assessment.

Findings

The research findings indicate that the better peer assessment through the rapid feedback system, for the evaluator, encourages more self-reflection and attempts to provide more ideas, so bringing the peer rating closer to the instructor rating and assisting the evaluator. Improve self-evaluation and critical thinking for the evaluator, peers make suggestions and comments to help improve the work and support the growth of students' learning effectiveness, which can lead to more suggestions and an increase in the work’s quality.

Originality/value

ESD consequently promotes competencies like critical thinking, imagining future scenarios and making decisions in a collaborative way. This study builds an online peer assessment system with a self-feedback mechanism capable of classifying peer comments, comparing them with scores in a consistent manner and providing prompt feedback to critics.

Details

Library Hi Tech, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 23 November 2021

Feifei Sun and Guohong Shi

This paper aims to effectively explore the application effect of big data techniques based on an α-support vector machine-stochastic gradient descent (SVMSGD) algorithm in…

Abstract

Purpose

This paper aims to effectively explore the application effect of big data techniques based on an α-support vector machine-stochastic gradient descent (SVMSGD) algorithm in third-party logistics, obtain the valuable information hidden in the logistics big data and promote the logistics enterprises to make more reasonable planning schemes.

Design/methodology/approach

In this paper, the forgetting factor is introduced without changing the algorithm's complexity and proposed an algorithm based on the forgetting factor called the α-SVMSGD algorithm. The algorithm selectively deletes or retains the historical data, which improves the adaptability of the classifier to the real-time new logistics data. The simulation results verify the application effect of the algorithm.

Findings

With the increase of training times, the test error percentages of gradient descent (GD) algorithm, gradient descent support (SGD) algorithm and the α-SVMSGD algorithm decrease gradually; in the process of logistics big data processing, the α-SVMSGD algorithm has the efficiency of SGD algorithm while ensuring that the GD direction approaches the optimal solution direction and can use a small amount of data to obtain more accurate results and enhance the convergence accuracy.

Research limitations/implications

The threshold setting of the forgetting factor still needs to be improved. Setting thresholds for different data types in self-learning has become a research direction. The number of forgotten data can be effectively controlled through big data processing technology to improve data support for the normal operation of third-party logistics.

Practical implications

It can effectively reduce the time-consuming of data mining, realize the rapid and accurate convergence of sample data without increasing the complexity of samples, improve the efficiency of logistics big data mining, reduce the redundancy of historical data, and has a certain reference value in promoting the development of logistics industry.

Originality/value

The classification algorithm proposed in this paper has feasibility and high convergence in third-party logistics big data mining. The α-SVMSGD algorithm proposed in this paper has a certain application value in real-time logistics data mining, but the design of the forgetting factor threshold needs to be improved. In the future, the authors will continue to study how to set different data type thresholds in self-learning.

Details

Journal of Enterprise Information Management, vol. 35 no. 4/5
Type: Research Article
ISSN: 1741-0398

Keywords

Article
Publication date: 3 October 2023

Abid Iqbal, Khurram Shahzad, Shakeel Ahmad Khan and Muhammad Shahzad Chaudhry

The purpose of this study is to identify the relationship between artificial intelligence (AI) and fake news detection. It also intended to explore the negative effects of fake…

Abstract

Purpose

The purpose of this study is to identify the relationship between artificial intelligence (AI) and fake news detection. It also intended to explore the negative effects of fake news on society and to find out trending techniques for fake news detection.

Design/methodology/approach

“Preferred Reporting Items for the Systematic Review and Meta-Analysis” were applied as a research methodology for conducting the study. Twenty-five peer-reviewed, most relevant core studies were included to carry out a systematic literature review.

Findings

Findings illustrated that AI has a strong positive relationship with the detection of fake news. The study displayed that fake news caused emotional problems, threats to important institutions of the state and a bad impact on culture. Results of the study also revealed that big data analytics, fact-checking websites, automatic detection tools and digital literacy proved fruitful in identifying fake news.

Originality/value

The study offers theoretical implications for the researchers to further explore the area of AI in relation to fake news detection. It also provides managerial implications for educationists, IT experts and policymakers. This study is an important benchmark to control the generation and dissemination of fake news on social media platforms.

Details

Global Knowledge, Memory and Communication, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9342

Keywords

Abstract

Details

Big Data Analytics for the Prediction of Tourist Preferences Worldwide
Type: Book
ISBN: 978-1-83549-339-7

1 – 10 of 174