Search results

1 – 10 of 147
Article
Publication date: 8 April 2021

Mariem Bounabi, Karim Elmoutaouakil and Khalid Satori

This paper aims to present a new term weighting approach for text classification as a text mining task. The original method, neutrosophic term frequency – inverse term frequency…

Abstract

Purpose

This paper aims to present a new term weighting approach for text classification as a text mining task. The original method, neutrosophic term frequency – inverse term frequency (NTF-IDF), is an extended version of the popular fuzzy TF-IDF (FTF-IDF) and uses the neutrosophic reasoning to analyze and generate weights for terms in natural languages. The paper also propose a comparative study between the popular FTF-IDF and NTF-IDF and their impacts on different machine learning (ML) classifiers for document categorization goals.

Design/methodology/approach

After preprocessing textual data, the original Neutrosophic TF-IDF applies the neutrosophic inference system (NIS) to produce weights for terms representing a document. Using the local frequency TF, global frequency IDF and text N's length as NIS inputs, this study generate two neutrosophic weights for a given term. The first measure provides information on the relevance degree for a word, and the second one represents their ambiguity degree. Next, the Zhang combination function is applied to combine neutrosophic weights outputs and present the final term weight, inserted in the document's representative vector. To analyze the NTF-IDF impact on the classification phase, this study uses a set of ML algorithms.

Findings

Practicing the neutrosophic logic (NL) characteristics, the authors have been able to study the ambiguity of the terms and their degree of relevance to represent a document. NL's choice has proven its effectiveness in defining significant text vectorization weights, especially for text classification tasks. The experimentation part demonstrates that the new method positively impacts the categorization. Moreover, the adopted system's recognition rate is higher than 91%, an accuracy score not attained using the FTF-IDF. Also, using benchmarked data sets, in different text mining fields, and many ML classifiers, i.e. SVM and Feed-Forward Network, and applying the proposed term scores NTF-IDF improves the accuracy by 10%.

Originality/value

The novelty of this paper lies in two aspects. First, a new term weighting method, which uses the term frequencies as components to define the relevance and the ambiguity of term; second, the application of NL to infer weights is considered as an original model in this paper, which also aims to correct the shortcomings of the FTF-IDF which uses fuzzy logic and its drawbacks. The introduced technique was combined with different ML models to improve the accuracy and relevance of the obtained feature vectors to fed the classification mechanism.

Details

International Journal of Web Information Systems, vol. 17 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 28 February 2023

Meltem Aksoy, Seda Yanık and Mehmet Fatih Amasyali

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals…

Abstract

Purpose

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals are primarily based on manual matching of similar topics, discipline areas and keywords declared by project applicants. When the number of proposals increases, this task becomes complex and requires excessive time. This paper aims to demonstrate how to effectively use the rich information in the titles and abstracts of Turkish project proposals to group them automatically.

Design/methodology/approach

This study proposes a model that effectively groups Turkish project proposals by combining word embedding, clustering and classification techniques. The proposed model uses FastText, BERT and term frequency/inverse document frequency (TF/IDF) word-embedding techniques to extract terms from the titles and abstracts of project proposals in Turkish. The extracted terms were grouped using both the clustering and classification techniques. Natural groups contained within the corpus were discovered using k-means, k-means++, k-medoids and agglomerative clustering algorithms. Additionally, this study employs classification approaches to predict the target class for each document in the corpus. To classify project proposals, various classifiers, including k-nearest neighbors (KNN), support vector machines (SVM), artificial neural networks (ANN), classification and regression trees (CART) and random forest (RF), are used. Empirical experiments were conducted to validate the effectiveness of the proposed method by using real data from the Istanbul Development Agency.

Findings

The results show that the generated word embeddings can effectively represent proposal texts as vectors, and can be used as inputs for clustering or classification algorithms. Using clustering algorithms, the document corpus is divided into five groups. In addition, the results demonstrate that the proposals can easily be categorized into predefined categories using classification algorithms. SVM-Linear achieved the highest prediction accuracy (89.2%) with the FastText word embedding method. A comparison of manual grouping with automatic classification and clustering results revealed that both classification and clustering techniques have a high success rate.

Research limitations/implications

The proposed model automatically benefits from the rich information in project proposals and significantly reduces numerous time-consuming tasks that managers must perform manually. Thus, it eliminates the drawbacks of the current manual methods and yields significantly more accurate results. In the future, additional experiments should be conducted to validate the proposed method using data from other funding organizations.

Originality/value

This study presents the application of word embedding methods to effectively use the rich information in the titles and abstracts of Turkish project proposals. Existing research studies focus on the automatic grouping of proposals; traditional frequency-based word embedding methods are used for feature extraction methods to represent project proposals. Unlike previous research, this study employs two outperforming neural network-based textual feature extraction techniques to obtain terms representing the proposals: BERT as a contextual word embedding method and FastText as a static word embedding method. Moreover, to the best of our knowledge, there has been no research conducted on the grouping of project proposals in Turkish.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 16 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 29 April 2021

Hossein Toosi, Mohammad Amin Ghaaderi and Zahra Shokrani

The purpose of this study is to compare the trend of academic project management research in Iran and the World in five-year periods with a text mining approach and TF–IDF method.

Abstract

Purpose

The purpose of this study is to compare the trend of academic project management research in Iran and the World in five-year periods with a text mining approach and TF–IDF method.

Design/methodology/approach

The research population consists of 1205 theses presented between 2000 and 2019 in Iranian universities. The central library website of the mentioned universities was used for data collection, and the text mining approach with the TF–IDF method was used for data analysis.

Findings

The remarkable results of this study include: Concrete structures are the most frequent among structural systems, Risk Management is the most frequent among PMBOK Knowledge Areas, Design-build (DB) system is the most frequent among Project Delivery Systems, Engineering, procurement and construction (EPC) is the most frequent among DB Project Delivery Systems, Financial Management is the most frequent among specialized construction knowledge areas, Soft Skills is the most frequent among Global Trends, Contracting Companies is the most frequent among Project Parties, Construction Projects is the most frequent among Project Areas, Power Plant and Refinery is the most frequent among Project Subjects, Optimization is the most frequent among Problem-Solving Approaches, Fuzzy Logic is the most frequent among Novel Algorithms and Motivation is the most frequent among Soft Skills.

Originality/value

The innovative aspect of this research is that for the first time, text mining has been used to analyze academic research on project and construction management, and also for the first time, academic research on construction industry in Iran has been compared with global research.

Details

Engineering, Construction and Architectural Management, vol. 29 no. 3
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 9 November 2021

Yuyan Luo, Tao Tong, Xiaoxu Zhang, Zheng Yang and Ling Li

In the era of information overload, the density of tourism information and the increasingly sophisticated information needs of consumers have created information confusion for…

425

Abstract

Purpose

In the era of information overload, the density of tourism information and the increasingly sophisticated information needs of consumers have created information confusion for tourists and scenic-area managers. The study aims to help scenic-area managers determine the strengths and weaknesses in the development process of scenic areas and to solve the practical problem of tourists' difficulty in quickly and accurately obtaining the destination image of a scenic area and finding a scenic area that meets their needs.

Design/methodology/approach

The study uses a variety of machine learning methods, namely, the latent Dirichlet allocation (LDA) theme extraction model, term frequency-inverse document frequency (TF-IDF) weighting method and sentiment analysis. This work also incorporates probabilistic hesitant fuzzy algorithm (PHFA) in multi-attribute decision-making to form an enhanced tourism destination image mining and analysis model based on visitor expression information. The model is intended to help managers and visitors identify the strengths and weaknesses in the development of scenic areas. Jiuzhaigou is used as an example for empirical analysis.

Findings

In the study, a complete model for the mining analysis of tourism destination image was constructed, and 24,222 online reviews on Jiuzhaigou, China were analyzed in text. The results revealed a total of 10 attributes and 100 attribute elements. From the identified attributes, three negative attributes were identified, namely, crowdedness, tourism cost and accommodation environment. The study provides suggestions for tourists to select attractions and offers recommendations and improvement measures for Jiuzhaigou in terms of crowd control and post-disaster reconstruction.

Originality/value

Previous research in this area has used small sample data for qualitative analysis. Thus, the current study fills this gap in the literature by proposing a machine learning method that incorporates PHFA through the combination of the ideas of management and multi-attribute decision theory. In addition, the study considers visitors' emotions and thematic preferences from the perspective of their expressed information, based on which the tourism destination image is analyzed. Optimization strategies are provided to help managers of scenic spots in their decision-making.

Details

Kybernetes, vol. 52 no. 3
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 2 July 2020

N. Venkata Sailaja, L. Padmasree and N. Mangathayaru

Text mining has been used for various knowledge discovery based applications, and thus, a lot of research has been contributed towards it. Latest trending research in the text…

176

Abstract

Purpose

Text mining has been used for various knowledge discovery based applications, and thus, a lot of research has been contributed towards it. Latest trending research in the text mining is adopting the incremental learning data, as it is economical while dealing with large volume of information.

Design/methodology/approach

The primary intention of this research is to design and develop a technique for incremental text categorization using optimized Support Vector Neural Network (SVNN). The proposed technique involves four major steps, such as pre-processing, feature selection, classification and feature extraction. Initially, the data is pre-processed based on stop word removal and stemming. Then, the feature extraction is done by extracting semantic word-based features and Term Frequency and Inverse Document Frequency (TF-IDF). From the extracted features, the important features are selected using Bhattacharya distance measure and the features are subjected as the input to the proposed classifier. The proposed classifier performs incremental learning using SVNN, wherein the weights are bounded in a limit using rough set theory. Moreover, for the optimal selection of weights in SVNN, Moth Search (MS) algorithm is used. Thus, the proposed classifier, named Rough set MS-SVNN, performs the text categorization for the incremental data, given as the input.

Findings

For the experimentation, the 20 News group dataset, and the Reuters dataset are used. Simulation results indicate that the proposed Rough set based MS-SVNN has achieved 0.7743, 0.7774 and 0.7745 for the precision, recall and F-measure, respectively.

Originality/value

In this paper, an online incremental learner is developed for the text categorization. The text categorization is done by developing the Rough set MS-SVNN classifier, which classifies the incoming texts based on the boundary condition evaluated by the Rough set theory, and the optimal weights from the MS. The proposed online text categorization scheme has the basic steps, like pre-processing, feature extraction, feature selection and classification. The pre-processing is carried out to identify the unique words from the dataset, and the features like semantic word-based features and TF-IDF are obtained from the keyword set. Feature selection is done by setting a minimum Bhattacharya distance measure, and the selected features are provided to the proposed Rough set MS-SVNN for the classification.

Details

Data Technologies and Applications, vol. 54 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Book part
Publication date: 26 October 2017

Sudhanshu Joshi, Manu Sharma and Shalu Rathi

The chapter examines a comprehensive review of cross-disciplinary literature in the domain of supply chain forecasting during research period 1991–2017, with the primary aim of…

Abstract

The chapter examines a comprehensive review of cross-disciplinary literature in the domain of supply chain forecasting during research period 1991–2017, with the primary aim of exploring the growth of literature from operational to demand centric forecasting and decision making in service supply chain systems. A noted list of 15,000 articles from journals and search results are used from academic databases (viz. Science Direct, Web of Sciences). Out of various content analysis techniques (Seuring & Gold, 2012), latent sementic analysis (LSA) is used as a content analysis tool (Wei, Yang, & Lin, 2008; Kundu et al., 2015). The reason for adoption of LSA over existing bibliometric techniques is to use the combination of text analysis and mining method to formulate latent factors. LSA creates the scientific grounding to understand the trends. Using LSA, Understanding future research trends will assist researchers in the area of service supply chain forecasting. The study will be beneficial for practitioners of the strategic and operational aspects of service supply chain decision making. The chapter incorporates four sections. The first section describes the introduction to service supply chain management and research development in this domain. The second section describes usage of LSA for current study. The third section describes the finding and results. The fourth and final sections conclude the chapter with a brief discussion on research findings, its limitations, and the implications for future research. The outcomes of analysis presented in this chapter also provide opportunities for researchers/professionals to position their future service supply chain research and/or implementation strategies.

Article
Publication date: 7 July 2023

Rongying Zhao and Weijie Zhu

This paper aims to conduct a comprehensive analysis to evaluate the current situation of journals, examine the factors that influence their development, and establish an…

Abstract

Purpose

This paper aims to conduct a comprehensive analysis to evaluate the current situation of journals, examine the factors that influence their development, and establish an evaluation index system and model. The objective is to enhance the theory and methodologies used for journal evaluation and provide guidance for their positive development.

Design/methodology/approach

This study uses empirical data from economics journals to analyse their evaluation dimensions, methods, index system and evaluation framework. This study then assigns weights to journal data using single and combined evaluations in three dimensions: influence, communication and novelty. It calculates several evaluation metrics, including the explanation rate, information entropy value, difference coefficient and novelty degree. Finally, this study applies the concept of fuzzy mathematics to measure the final results.

Findings

The use of affiliation degree and fuzzy Borda number can synthesize ranking and score differences among evaluation methods. It combines internal objective information and improves model accuracy. The novelty of journal topics positively correlates with both the journal impact factor and social media mentions. In addition, journal communication power indicators compensate for the shortcomings of traditional citation analysis. Finally, the three-dimensional representative evaluation index serves as a reminder to academic journals to avoid the vortex of the Matthew effect.

Originality/value

This paper proposes a journal evaluation model comprising academic influence, communication power and novelty dimensions. It uses fuzzy Borda evaluation to address issues related to the weighing of single evaluation methods. This study also analyses the relationship of the three dimensions and offers insights for journal development in the new media era.

Details

The Electronic Library , vol. 41 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 21 January 2019

Issa Alsmadi and Keng Hoon Gan

Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type…

1101

Abstract

Purpose

Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type of document based on their content has a significant implication in many applications. The need to classify these documents in relevant classes according to their text contents should be interested in many practical reasons. Short-text classification is an essential step in many applications, such as spam filtering, sentiment analysis, Twitter personalization, customer review and many other applications related to social networks. Reviews on short text and its application are limited. Thus, this paper aims to discuss the characteristics of short text, its challenges and difficulties in classification. The paper attempt to introduce all stages in principle classification, the technique used in each stage and the possible development trend in each stage.

Design/methodology/approach

The paper as a review of the main aspect of short-text classification. The paper is structured based on the classification task stage.

Findings

This paper discusses related issues and approaches to these problems. Further research could be conducted to address the challenges in short texts and avoid poor accuracy in classification. Problems in low performance can be solved by using optimized solutions, such as genetic algorithms that are powerful in enhancing the quality of selected features. Soft computing solution has a fuzzy logic that makes short-text problems a promising area of research.

Originality/value

Using a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. Current solutions still have low performance, implying the need for improvement. This paper discusses related issues and approaches to these problems.

Details

International Journal of Web Information Systems, vol. 15 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 3 November 2020

Femi Emmanuel Ayo, Olusegun Folorunso, Friday Thomas Ibharalu and Idowu Ademola Osinuga

Hate speech is an expression of intense hatred. Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors. Hate speech detection with…

Abstract

Purpose

Hate speech is an expression of intense hatred. Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors. Hate speech detection with social media data has witnessed special research attention in recent studies, hence, the need to design a generic metadata architecture and efficient feature extraction technique to enhance hate speech detection.

Design/methodology/approach

This study proposes a hybrid embeddings enhanced with a topic inference method and an improved cuckoo search neural network for hate speech detection in Twitter data. The proposed method uses a hybrid embeddings technique that includes Term Frequency-Inverse Document Frequency (TF-IDF) for word-level feature extraction and Long Short Term Memory (LSTM) which is a variant of recurrent neural networks architecture for sentence-level feature extraction. The extracted features from the hybrid embeddings then serve as input into the improved cuckoo search neural network for the prediction of a tweet as hate speech, offensive language or neither.

Findings

The proposed method showed better results when tested on the collected Twitter datasets compared to other related methods. In order to validate the performances of the proposed method, t-test and post hoc multiple comparisons were used to compare the significance and means of the proposed method with other related methods for hate speech detection. Furthermore, Paired Sample t-Test was also conducted to validate the performances of the proposed method with other related methods.

Research limitations/implications

Finally, the evaluation results showed that the proposed method outperforms other related methods with mean F1-score of 91.3.

Originality/value

The main novelty of this study is the use of an automatic topic spotting measure based on naïve Bayes model to improve features representation.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 13 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 3 October 2019

Thara Angskun and Jitimon Angskun

This paper aims to introduce a hierarchical fuzzy system for an online review analysis named FLORA. FLORA enables tourists to decide their destination without reading numerous…

Abstract

Purpose

This paper aims to introduce a hierarchical fuzzy system for an online review analysis named FLORA. FLORA enables tourists to decide their destination without reading numerous reviews from experienced tourists. It summarizes reviews and visualizes them through a hierarchical structure. The visualization does not only present overall quality of an accommodation, but it also presents the condition of the bed, hospitality of the front desk receptionist and much more in a snap.

Design/methodology/approach

FLORA is a complete system which acquires online reviews, analyzes sentiments, computes feature scores and summarizes results in a hierarchical view. FLORA is designed to use an overall score, rated by real tourists as a baseline for accuracy comparison. The accuracy of FLORA has achieved by a novel sentiment analysis process (as part of a knowledge acquisition engine) based on semantic analysis and a novel rating technique, called hierarchical fuzzy calculation, in the knowledge inference engine.

Findings

The performance comparison of FLORA against related work has been assessed in two aspects. The first aspect focuses on review analysis with binary format representation. The results reveal that the hierarchical fuzzy method, with probability weighting of FLORA, is achieved with the highest values in precision, recall and F-measure. The second aspect looks at review analysis with a five-point rating scale rating by comparing with one of the most advanced research methods, called fuzzy domain ontology. The results reveal that the hierarchical fuzzy method, with probability weighting of FLORA, returns the closest results to the tourist-defined rating.

Research limitations/implications

This research advances knowledge of online review analysis by contributing a novel sentiment analysis process and a novel rating technique. The FLORA system has two limitations. First, the reviews are based on individual expression, which is an arbitrary distinction and not always grammatically correct. Consequently, some opinions may not be extracted because the context free grammar rules are insufficient. Second, natural languages evolve and diversify all the time. Many emerging words or phrases, including idioms, proverbs and slang, are often used in online reviews. Thus, those words or phrases need to be manually updated in the knowledge base.

Practical implications

This research contributes to the tourism business and assists travelers by introducing comprehensive and easy to understand information about each accommodation to travelers. Although the FLORA system was originally designed and tested with accommodation reviews, it can also be used with reviews of any products or services by updating data in the knowledge base. Thus, businesses, which have online reviews for their products or services, can benefit from the FLORA system.

Originality/value

This research proposes a FLORA system which analyzes sentiments from online reviews, computes feature scores and summarizes results in a hierarchical view. Moreover, this work is able to use the overall score, rated by real tourists, as a baseline for accuracy comparison. The main theoretical implication is a novel sentiment analysis process based on semantic analysis and a novel rating technique called hierarchical fuzzy calculation.

Details

Journal of Systems and Information Technology, vol. 21 no. 3
Type: Research Article
ISSN: 1328-7265

Keywords

1 – 10 of 147