Search results
1 – 10 of over 75000To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning…
Abstract
Purpose
To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such.
Design/methodology/approach
A range of works dealing with automated classification of full‐text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages.
Findings
Provides major similarities and differences between the three approaches: document pre‐processing and utilization of web‐specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized.
Research limitations/implications
The paper does not attempt to provide an exhaustive bibliography of related resources.
Practical implications
As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities.
Originality/value
To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.
Details
Keywords
Lars Witell and Martin Löfgren
The purpose of this paper is to investigate whether the different approaches to the classification of quality attributes deliver consistent results.
Abstract
Purpose
The purpose of this paper is to investigate whether the different approaches to the classification of quality attributes deliver consistent results.
Design/methodology/approach
The investigation includes four approaches and enables comparisons to be made from a methodological perspective and from an output perspective. The different approaches are described, analyzed, and discussed in the context of an empirical study that investigates how 430 respondents perceive the performance of an e‐service. The theory of attractive quality rests on a solid theoretical foundation and a methodological approach to classify quality attributes. Recently, various authors have suggested alternative approaches to the traditional five‐level Kano questionnaire – including a three‐level Kano questionnaire, direct classification, and a dual‐importance grid.
Findings
The classification of quality attributes are found to be dependent on the approach that is utilized. The development of new ways to classify quality attributes should follow rigid procedures to provide reliable and consistent results.
Originality/value
This is the first attempt to compare alternative approaches to classify quality attributes. For managers, our results provide guidance on what approach to choose based on the strengths and weaknesses with the different approaches.
Details
Keywords
Fuzan Chen, Harris Wu, Runliang Dou and Minqiang Li
The purpose of this paper is to build a compact and accurate classifier for high-dimensional classification.
Abstract
Purpose
The purpose of this paper is to build a compact and accurate classifier for high-dimensional classification.
Design/methodology/approach
A classification approach based on class-dependent feature subspace (CFS) is proposed. CFS is a class-dependent integration of a support vector machine (SVM) classifier and associated discriminative features. For each class, our genetic algorithm (GA)-based approach evolves the best subset of discriminative features and SVM classifier simultaneously. To guarantee convergence and efficiency, the authors customize the GA in terms of encoding strategy, fitness evaluation, and genetic operators.
Findings
Experimental studies demonstrated that the proposed CFS-based approach is superior to other state-of-the-art classification algorithms on UCI data sets in terms of both concise interpretation and predictive power for high-dimensional data.
Research limitations/implications
UCI data sets rather than real industrial data are used to evaluate the proposed approach. In addition, only single-label classification is addressed in the study.
Practical implications
The proposed method not only constructs an accurate classification model but also obtains a compact combination of discriminative features. It is helpful for business makers to get a concise understanding of the high-dimensional data.
Originality/value
The authors propose a compact and effective classification approach for high-dimensional data. Instead of the same feature subset for all the classes, the proposed CFS-based approach obtains the optimal subset of discriminative feature and SVM classifier for each class. The proposed approach enhances both interpretability and predictive power for high-dimensional data.
Details
Keywords
Erik Bergström, Fredrik Karlsson and Rose-Mharie Åhlfeldt
The purpose of this paper is to develop a method for information classification. The proposed method draws on established standards, such as the ISO/IEC 27002 and information…
Abstract
Purpose
The purpose of this paper is to develop a method for information classification. The proposed method draws on established standards, such as the ISO/IEC 27002 and information classification practices. The long-term goal of the method is to decrease the subjective judgement in the implementation of information classification in organisations, which can lead to information security breaches because the information is under- or over-classified.
Design/methodology/approach
The results are based on a design science research approach, implemented as five iterations spanning the years 2013 to 2019.
Findings
The paper presents a method for information classification and the design principles underpinning the method. The empirical demonstration shows that senior and novice information security managers perceive the method as a useful tool for classifying information assets in an organisation.
Research limitations/implications
Existing research has, to a limited extent, provided extensive advice on how to approach information classification in organisations systematically. The method presented in this paper can act as a starting point for further research in this area, aiming at decreasing subjectivity in the information classification process. Additional research is needed to fully validate the proposed method for information classification and its potential to reduce the subjective judgement.
Practical implications
The research contributes to practice by offering a method for information classification. It provides a hands-on-tool for how to implement an information classification process. Besides, this research proves that it is possible to devise a method to support information classification. This is important, because, even if an organisation chooses not to adopt the proposed method, the very fact that this method has proved useful should encourage any similar endeavour.
Originality/value
The proposed method offers a detailed and well-elaborated tool for information classification. The method is generic and adaptable, depending on organisational needs.
Details
Keywords
Ismail Hmeidi, Mahmoud Al-Ayyoub, Nizar A. Mahyoub and Mohammed A. Shehab
Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth…
Abstract
Purpose
Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth of online data and the increasing tendency of internet users to be more comfortable with assigning multiple labels/tags to describe documents, emails, posts, etc. The dimensionality of labels makes MTC more difficult and challenging compared with traditional single-labeled text classification (TC). Because it is a natural extension of TC, several ways are proposed to benefit from the rich literature of TC through what is called problem transformation (PT) methods. Basically, PT methods transform the multi-label data into a single-label one that is suitable for traditional single-label classification algorithms. Another approach is to design novel classification algorithms customized for MTC. Over the past decade, several works have appeared on both approaches focusing mainly on the English language. This work aims to present an elaborate study of MTC of Arabic articles.
Design/methodology/approach
This paper presents a novel lexicon-based method for MTC, where the keywords that are most associated with each label are extracted from the training data along with a threshold that can later be used to determine whether each test document belongs to a certain label.
Findings
The experiments show that the presented approach outperforms the currently available approaches. Specifically, the results of our experiments show that the best accuracy obtained from existing approaches is only 18 per cent, whereas the accuracy of the presented lexicon-based approach can reach an accuracy level of 31 per cent.
Originality/value
Although there exist some tools that can be customized to address the MTC problem for Arabic text, their accuracies are very low when applied to Arabic articles. This paper presents a novel method for MTC. The experiments show that the presented approach outperforms the currently available approaches.
Details
Keywords
Vinod Kumar, Zillur Rahman and A. A. Kazmi
This paper aims to review the literature on stakeholder identification and classification related to sustainability marketing from 1998 to 2012 and provides a generalized approach…
Abstract
Purpose
This paper aims to review the literature on stakeholder identification and classification related to sustainability marketing from 1998 to 2012 and provides a generalized approach to stakeholder identification and classification in the field of sustainability marketing.
Design/methodology/approach
Beginning with brief introductions of the key concepts, the research discusses landmark studies on the subject in detail. The review process then begins by identifying and selecting relevant research papers from various online databases. Finally, 60 research papers are found suitable for the review and are examined to theoretically analyze the stakeholder identification and classification schemes used in sustainability marketing literature.
Findings
This study identifies trends of growth in stakeholder identification and classification literature. In addition, there are two major findings. First, stakeholder identification can be done with the help of previous studies, with support from managers or via a combination of both. Second, future research can adopt generic stakeholder classification schemes or relative classification schemes based on dimensions of sustainability to classify stakeholders in relation to sustainability marketing. In relative stakeholder classification, regulatory stakeholders may be considered separately.
Research limitations/implications
While the literature review may be incomplete, as it uses only a title-based advanced search, researchers and practitioners can still benefit from this simplified approach to manage stakeholders.
Originality/value
The study introduces a generalized approach to stakeholder identification and classification related to sustainability marketing and provides a bibliography from 1998 to 2012 that can be used by academics and managers.
Details
Keywords
Michael John Khoo, Jae-wook Ahn, Ceri Binding, Hilary Jane Jones, Xia Lin, Diana Massam and Douglas Tudhope
– The purpose of this paper is to describe a new approach to a well-known problem for digital libraries, how to search across multiple unrelated libraries with a single query.
Abstract
Purpose
The purpose of this paper is to describe a new approach to a well-known problem for digital libraries, how to search across multiple unrelated libraries with a single query.
Design/methodology/approach
The approach involves creating new Dewey Decimal Classification terms and numbers from existing Dublin Core records. In total, 263,550 records were harvested from three digital libraries. Weighted key terms were extracted from the title, description and subject fields of each record. Ranked DDC classes were automatically generated from these key terms by considering DDC hierarchies via a series of filtering and aggregation stages. A mean reciprocal ranking evaluation compared a sample of 49 generated classes against DDC classes created by a trained librarian for the same records.
Findings
The best results combined weighted key terms from the title, description and subject fields. Performance declines with increased specificity of DDC level. The results compare favorably with similar studies.
Research limitations/implications
The metadata harvest required manual intervention and the evaluation was resource intensive. Future research will look at evaluation methodologies that take account of issues of consistency and ecological validity.
Practical implications
The method does not require training data and is easily scalable. The pipeline can be customized for individual use cases, for example, recall or precision enhancing.
Social implications
The approach can provide centralized access to information from multiple domains currently provided by individual digital libraries.
Originality/value
The approach addresses metadata normalization in the context of web resources. The automatic classification approach accounts for matches within hierarchies, aggregating lower level matches to broader parents and thus approximates the practices of a human cataloger.
Details
Keywords
Peter J. Wild, Matt D. Giess and Chris A. McMahon
The purpose of this paper is to highlight the difficulty of applying faceted classification outside of library contexts and also to indicate that faceted approaches are poorly…
Abstract
Purpose
The purpose of this paper is to highlight the difficulty of applying faceted classification outside of library contexts and also to indicate that faceted approaches are poorly expressed to non‐experts.
Design/methodology/approach
The faceted approach is being applied outside of its “home” community, with mixed results. The approach is based in part on examination of a broad base of literature and in part on results and reflections on a case study applying faceted notions to “real world” engineering documentation.
Findings
The paper comes across a number of pragmatic and theoretical issues namely: differing interpretations of the facet notion; confusion between faceted analysis and faceted classification; lack of methodological guidance; the use of simplistic domains as exemplars; description verses analysis; facet recognition is unproblematic; and is the process purely top‐down or bottom‐up.
Research limitations/implications
That facet analysis is not inherently associated with a particular epistemology; that greater guidance about the derivation is needed, that greater realism is needed when teaching faceted approaches.
Practical implications
Experiences of applying faceted classifications are presented that can be drawn upon to guide future work in the area.
Originality/value
No previous work has reflected on the actual empirical experience used to create a faceted description, especially with reference to engineering documents.
Details
Keywords
Chedia Dhaoui, Cynthia M. Webster and Lay Peng Tan
With the soaring volumes of brand-related social media conversations, digital marketers have extensive opportunities to track and analyse consumers’ feelings and opinions about…
Abstract
Purpose
With the soaring volumes of brand-related social media conversations, digital marketers have extensive opportunities to track and analyse consumers’ feelings and opinions about brands, products or services embedded within consumer-generated content (CGC). These “Big Data” opportunities render manual approaches to sentiment analysis impractical and raise the need to develop automated tools to analyse consumer sentiment expressed in text format. This paper aims to evaluate and compare the performance of two prominent approaches to automated sentiment analysis applied to CGC on social media and explores the benefits of combining them.
Design/methodology/approach
A sample of 850 consumer comments from 83 Facebook brand pages are used to test and compare lexicon-based and machine learning approaches to sentiment analysis, as well as their combination, using the LIWC2015 lexicon and RTextTools machine learning package.
Findings
Results show the two approaches are similar in accuracy, both achieving higher accuracy when classifying positive sentiment than negative sentiment. However, they differ substantially in their classification ensembles. The combined approach demonstrates significantly improved performance in classifying positive sentiment.
Research limitations/implications
Further research is required to improve the accuracy of negative sentiment classification. The combined approach needs to be applied to other kinds of CGCs on social media such as tweets.
Practical implications
The findings inform decision-making around which sentiment analysis approaches (or a combination thereof) is best to analyse CGC on social media.
Originality/value
This study combines two sentiment analysis approaches and demonstrates significantly improved performance.
Details
Keywords
S. P. Sarmah and U. C. Moharana
The purpose of this paper is to present a fuzzy-rule-based model to classify spare parts inventories considering multiple criteria for better management of maintenance activities…
Abstract
Purpose
The purpose of this paper is to present a fuzzy-rule-based model to classify spare parts inventories considering multiple criteria for better management of maintenance activities to overcome production down situation.
Design/methodology/approach
Fuzzy-rule-based approach for multi-criteria decision making is used to classify the spare parts inventories. Total cost is computed for each group considering suitable inventory policies and compared with other existing models.
Findings
Fuzzy-rule-based multi-criteria classification model provides better results as compared to aggregate scoring and traditional ABC classification. This model offers the flexibility for inventory management experts to provide their subjective inputs.
Practical implications
The web-based model developed in this paper can be implemented in various industries such as manufacturing, chemical plants, and mining, etc., which deal with large number of spares. This method classifies the spares into three categories A, B and C considering multiple criteria and relationships among those criteria. The framework is flexible enough to add additional criteria and to modify fuzzy-rule-base at any point of time by the decision makers. This model can be easily integrated to any customized Enterprise Resource Planning applications.
Originality/value
The value of this paper is in applying Fuzzy-rule-based approach for Multi-criteria Inventory Classification of spare parts. This rule-based approach considering multiple criteria is not very common in classification of spare parts inventories. Total cost comparison is made to compare the performance of proposed model with the traditional classifications and the result shows that proposed fuzzy-rule-based classification approach performs better than the traditional ABC and gives almost the same cost as aggregate scoring model. Hence, this method is valid and adds a new value to spare parts classification for better management decisions.
Details