Search results

1 – 10 of over 59000
To view the access options for this content please click here
Article
Publication date: 1 November 2005

Mohamed Hammami, Youssef Chahir and Liming Chen

Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering…

Abstract

Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering undesirable web content. In this paper, we investigate this problem through WebGuard, our automatic machine learning based pornographic website classification and filtering system. Facing the Internet more and more visual and multimedia as exemplified by pornographic websites, we focus here our attention on the use of skin color related visual content based analysis along with textual and structural content based analysis for improving pornographic website filtering. While the most commercial filtering products on the marketplace are mainly based on textual content‐based analysis such as indicative keywords detection or manually collected black list checking, the originality of our work resides on the addition of structural and visual content‐based analysis to the classical textual content‐based analysis along with several major‐data mining techniques for learning and classifying. Experimented on a testbed of 400 websites including 200 adult sites and 200 non pornographic ones, WebGuard, our Web filtering engine scored a 96.1% classification accuracy rate when only textual and structural content based analysis are used, and 97.4% classification accuracy rate when skin color related visual content based analysis is driven in addition. Further experiments on a black list of 12 311 adult websites manually collected and classified by the French Ministry of Education showed that WebGuard scored 87.82% classification accuracy rate when using only textual and structural content‐based analysis, and 95.62% classification accuracy rate when the visual content‐based analysis is driven in addition. The basic framework of WebGuard can apply to other categorization problems of websites which combine, as most of them do today, textual and visual content.

Details

International Journal of Web Information Systems, vol. 1 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

To view the access options for this content please click here
Article
Publication date: 9 July 2020

James Wakiru, Liliane Pintelon, Peter Muchiri and Peter Chemweno

The purpose of this paper is to develop a maintenance decision support system (DSS) framework using in-service lubricant data for fault diagnosis. The DSS reveals embedded…

Abstract

Purpose

The purpose of this paper is to develop a maintenance decision support system (DSS) framework using in-service lubricant data for fault diagnosis. The DSS reveals embedded patterns in the data (knowledge discovery) and automatically quantifies the influence of lubricant parameters on the unhealthy state of the machine using alternative classifiers. The classifiers are compared for robustness from which decision-makers select an appropriate classifier given a specific lubricant data set.

Design/methodology/approach

The DSS embeds a framework integrating cluster and principal component analysis, for feature extraction, and eight classifiers among them extreme gradient boosting (XGB), random forest (RF), decision trees (DT) and logistic regression (LR). A qualitative and quantitative criterion is developed in conjunction with practitioners for comparing the classifier models.

Findings

The results show the importance of embedded knowledge, explored via a knowledge discovery approach. Moreover, the efficacy of the embedded knowledge on maintenance DSS is emphasized. Importantly, the proposed framework is demonstrated as plausible for decision support due to its high accuracy and consideration of practitioners needs.

Practical implications

The proposed framework will potentially assist maintenance managers in accurately exploiting lubricant data for maintenance DSS, while offering insights with reduced time and errors.

Originality/value

Advances in lubricant-based intelligent approach for fault diagnosis is seldom utilized in practice, however, may be incorporated in the information management systems offering high predictive accuracy. The classification models' comparison approach, will inevitably assist the industry in selecting amongst divergent models' for DSS.

Details

Journal of Quality in Maintenance Engineering, vol. 27 no. 2
Type: Research Article
ISSN: 1355-2511

Keywords

To view the access options for this content please click here
Article
Publication date: 13 August 2019

Hongshan Xiao and Yu Wang

Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical…

Abstract

Purpose

Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance.

Design/methodology/approach

A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification.

Findings

The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets.

Research limitations/implications

Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue.

Practical implications

Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems.

Originality/value

A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.

Details

Kybernetes, vol. 48 no. 9
Type: Research Article
ISSN: 0368-492X

Keywords

To view the access options for this content please click here
Article
Publication date: 16 March 2010

Cataldo Zuccaro

The purpose of this paper is to discuss and assess the structural characteristics (conceptual utility) of the most popular classification and predictive techniques…

Downloads
1824

Abstract

Purpose

The purpose of this paper is to discuss and assess the structural characteristics (conceptual utility) of the most popular classification and predictive techniques employed in customer relationship management and customer scoring and to evaluate their classification and predictive precision.

Design/methodology/approach

A sample of customers' credit rating and socio‐demographic profiles are employed to evaluate the analytic and classification properties of discriminant analysis, binary logistic regression, artificial neural networks, C5 algorithm, and regression trees employing Chi‐squared Automatic Interaction Detector (CHAID).

Findings

With regards to interpretability and the conceptual utility of the parameters generated by the five techniques, logistic regression provides easily interpretable parameters through its logit. The logits can be interpreted in the same way as regression slopes. In addition, the logits can be converted to odds providing a common sense evaluation of the relative importance of each independent variable. Finally, the technique provides robust statistical tests to evaluate the model parameters. Finally, both CHAID and the C5 algorithm provide visual tools (regression tree) and semantic rules (rule set for classification) to facilitate the interpretation of the model parameters. These can be highly desirable properties when the researcher attempts to explain the conceptual and operational foundations of the model.

Originality/value

Most treatments of complex classification procedures have been undertaken idiosyncratically, that is, evaluating only one technique. This paper evaluates and compares the conceptual utility and predictive precision of five different classification techniques on a moderate sample size and provides clear guidelines in technique selection when undertaking customer scoring and classification.

Details

Journal of Modelling in Management, vol. 5 no. 1
Type: Research Article
ISSN: 1746-5664

Keywords

To view the access options for this content please click here
Article
Publication date: 11 September 2017

Chedia Dhaoui, Cynthia M. Webster and Lay Peng Tan

With the soaring volumes of brand-related social media conversations, digital marketers have extensive opportunities to track and analyse consumers’ feelings and opinions…

Downloads
5418

Abstract

Purpose

With the soaring volumes of brand-related social media conversations, digital marketers have extensive opportunities to track and analyse consumers’ feelings and opinions about brands, products or services embedded within consumer-generated content (CGC). These “Big Data” opportunities render manual approaches to sentiment analysis impractical and raise the need to develop automated tools to analyse consumer sentiment expressed in text format. This paper aims to evaluate and compare the performance of two prominent approaches to automated sentiment analysis applied to CGC on social media and explores the benefits of combining them.

Design/methodology/approach

A sample of 850 consumer comments from 83 Facebook brand pages are used to test and compare lexicon-based and machine learning approaches to sentiment analysis, as well as their combination, using the LIWC2015 lexicon and RTextTools machine learning package.

Findings

Results show the two approaches are similar in accuracy, both achieving higher accuracy when classifying positive sentiment than negative sentiment. However, they differ substantially in their classification ensembles. The combined approach demonstrates significantly improved performance in classifying positive sentiment.

Research limitations/implications

Further research is required to improve the accuracy of negative sentiment classification. The combined approach needs to be applied to other kinds of CGCs on social media such as tweets.

Practical implications

The findings inform decision-making around which sentiment analysis approaches (or a combination thereof) is best to analyse CGC on social media.

Originality/value

This study combines two sentiment analysis approaches and demonstrates significantly improved performance.

Details

Journal of Consumer Marketing, vol. 34 no. 6
Type: Research Article
ISSN: 0736-3761

Keywords

To view the access options for this content please click here
Article
Publication date: 1 March 1995

CLARE BEGHTOL

Undiscovered public knowledge is a relatively unstudied phenomenon, and the few extended examples that have been published are intradisciplinary. This paper presents the…

Abstract

Undiscovered public knowledge is a relatively unstudied phenomenon, and the few extended examples that have been published are intradisciplinary. This paper presents the concept of ‘facet’ as an example of interdisciplinary undiscovered public knowledge. ‘Facets’ were central to the bibliographic classification theory of S.R. Ranganathan in India and to the behavioural research of L. Guttman in Israel. The term had the same meaning in both fields, and the concept was developed and exploited at about the same time in both, but two separate, unconnected literatures grew up around the term and its associated concepts. This paper examines the origins and parallel uses of the concept and the term in both fields as a case study of interdisciplinary knowledge that could have been, but was apparently not, discovered any time between the early 1950s and the present using simple, readily available information retrieval techniques.

Details

Journal of Documentation, vol. 51 no. 3
Type: Research Article
ISSN: 0022-0418

To view the access options for this content please click here
Article
Publication date: 1 January 2006

Vanda Broughton

The aim of this article is to estimate the impact of faceted classification and the faceted analytical method on the development of various information retrieval tools…

Downloads
9491

Abstract

Purpose

The aim of this article is to estimate the impact of faceted classification and the faceted analytical method on the development of various information retrieval tools over the latter part of the twentieth and early twenty‐first centuries.

Design/methodology/approach

The article presents an examination of various subject access tools intended for retrieval of both print and digital materials to determine whether they exhibit features of faceted systems. Some attention is paid to use of the faceted approach as a means of structuring information on commercial web sites. The secondary and research literature is also surveyed for commentary on and evaluation of facet analysis as a basis for the building of vocabulary and conceptual tools.

Findings

The study finds that faceted systems are now very common, with a major increase in their use over the last 15 years. Most LIS subject indexing tools (classifications, subject heading lists and thesauri) now demonstrate features of facet analysis to a greater or lesser degree. A faceted approach is frequently taken to the presentation of product information on commercial web sites, and there is an independent strand of theory and documentation related to this application. There is some significant research on semi‐automatic indexing and retrieval (query expansion and query formulation) using facet analytical techniques.

Originality/value

This article provides an overview of an important conceptual approach to information retrieval, and compares different understandings and applications of this methodology.

Details

Aslib Proceedings, vol. 58 no. 1/2
Type: Research Article
ISSN: 0001-253X

Keywords

To view the access options for this content please click here
Article
Publication date: 11 November 2013

Nina Preschitschek, Helen Niemann, Jens Leker and Martin G. Moehrle

The convergence of industries exposes the involved firms to various challenges. In such a setting, a firm's response time becomes key to its future success. Hence

Downloads
2970

Abstract

Purpose

The convergence of industries exposes the involved firms to various challenges. In such a setting, a firm's response time becomes key to its future success. Hence, different approaches to anticipating convergence have been developed in the recent past. So far, especially IPC co-classification patent analyses have been successfully applied in different industry settings to anticipate convergence on a broader industry/technology level. Here, the aim is to develop a concept to anticipate convergence even in small samples, simultaneously providing more detailed information on its origin and direction.

Design/methodology/approach

The authors assigned 326 US-patents on phytosterols to four different technological fields and measured the semantic similarity of the patents from the different technological fields. Finally, they compared these results to those of an IPC co-classification analysis of the same patent sample.

Findings

An increasing semantic similarity of food and pharmaceutical patents and personal care and pharmaceutical patents over time could be regarded as an indicator of convergence. The IPC co-classification analyses proved to be unsuitable for finding evidence for convergence here.

Originality/value

Semantic analyses provide the opportunity to analyze convergence processes in greater detail, even if only limited data are available. However, IPC co-classification analyses are still relevant in analyzing large amounts of data. The appropriateness of the semantic similarity approach requires verification, e.g. by applying it to other convergence settings.

Content available
Article
Publication date: 9 December 2019

Zhiwen Pan, Jiangtian Li, Yiqiang Chen, Jesus Pacheco, Lianjun Dai and Jun Zhang

The General Society Survey(GSS) is a kind of government-funded survey which aims at examining the Socio-economic status, quality of life, and structure of contemporary…

Abstract

Purpose

The General Society Survey(GSS) is a kind of government-funded survey which aims at examining the Socio-economic status, quality of life, and structure of contemporary society. GSS data set is regarded as one of the authoritative source for the government and organization practitioners to make data-driven policies. The previous analytic approaches for GSS data set are designed by combining expert knowledges and simple statistics. By utilizing the emerging data mining algorithms, we proposed a comprehensive data management and data mining approach for GSS data sets.

Design/methodology/approach

The approach are designed to be operated in a two-phase manner: a data management phase which can improve the quality of GSS data by performing attribute pre-processing and filter-based attribute selection; a data mining phase which can extract hidden knowledge from the data set by performing data mining analysis including prediction analysis, classification analysis, association analysis and clustering analysis.

Findings

According to experimental evaluation results, the paper have the following findings: Performing attribute selection on GSS data set can increase the performance of both classification analysis and clustering analysis; all the data mining analysis can effectively extract hidden knowledge from the GSS data set; the knowledge generated by different data mining analysis can somehow cross-validate each other.

Originality/value

By leveraging the power of data mining techniques, the proposed approach can explore knowledge in a fine-grained manner with minimum human interference. Experiments on Chinese General Social Survey data set are conducted at the end to evaluate the performance of our approach.

Details

International Journal of Crowd Science, vol. 3 no. 3
Type: Research Article
ISSN: 2398-7294

Keywords

To view the access options for this content please click here
Article
Publication date: 29 April 2014

Irene Roda, Marco Macchi, Luca Fumagalli and Pablo Viveros

Spare parts management plays a relevant role for equipment-intensive companies. An important step of such process is the spare parts classification, enabling properly…

Downloads
2165

Abstract

Purpose

Spare parts management plays a relevant role for equipment-intensive companies. An important step of such process is the spare parts classification, enabling properly managing different items by taking into account their peculiarities. The purpose of this paper is to review the state of the art of classification of spare parts for manufacturing equipment by presenting an extensive literature analysis followed by an industrial assessment, with the final aim to identify eventual discrepancies.

Design/methodology/approach

Not only is the attention put on the literature about the subject, but also on an on-field analysis, that is presented comprehending an extensive survey and two in-depth exploratory case studies. The copper mining sector was chosen being representative for the case of capital intensive plants where the cost of maintenance has relevant weight on the total operating cost.

Findings

The paper highlights the status of the scientific literature on spare parts classification by showing the current situation in the real industrial world. The paper depicts the existing barriers that leave gaps between theory and real practice for the application of an effective multi-criteria spare parts classification.

Originality/value

The paper provides a review of the theory on spare parts classification methods and criteria, as well as empirical evidences especially for what concern current situation and barriers for an effective implementation in the industrial environment. The paper should be of interest to both academics and practitioners, since it provides original insights on the discrepancies between scientific and industrial world.

Details

Journal of Manufacturing Technology Management, vol. 25 no. 4
Type: Research Article
ISSN: 1741-038X

Keywords

1 – 10 of over 59000