Search results

1 – 10 of 478
Article
Publication date: 1 November 2005

Mohamed Hammami, Youssef Chahir and Liming Chen

Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering undesirable…

Abstract

Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering undesirable web content. In this paper, we investigate this problem through WebGuard, our automatic machine learning based pornographic website classification and filtering system. Facing the Internet more and more visual and multimedia as exemplified by pornographic websites, we focus here our attention on the use of skin color related visual content based analysis along with textual and structural content based analysis for improving pornographic website filtering. While the most commercial filtering products on the marketplace are mainly based on textual content‐based analysis such as indicative keywords detection or manually collected black list checking, the originality of our work resides on the addition of structural and visual content‐based analysis to the classical textual content‐based analysis along with several major‐data mining techniques for learning and classifying. Experimented on a testbed of 400 websites including 200 adult sites and 200 non pornographic ones, WebGuard, our Web filtering engine scored a 96.1% classification accuracy rate when only textual and structural content based analysis are used, and 97.4% classification accuracy rate when skin color related visual content based analysis is driven in addition. Further experiments on a black list of 12 311 adult websites manually collected and classified by the French Ministry of Education showed that WebGuard scored 87.82% classification accuracy rate when using only textual and structural content‐based analysis, and 95.62% classification accuracy rate when the visual content‐based analysis is driven in addition. The basic framework of WebGuard can apply to other categorization problems of websites which combine, as most of them do today, textual and visual content.

Details

International Journal of Web Information Systems, vol. 1 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 14 August 2017

Sudeep Thepade, Rik Das and Saurav Ghosh

Current practices in data classification and retrieval have experienced a surge in the use of multimedia content. Identification of desired information from the huge image…

Abstract

Purpose

Current practices in data classification and retrieval have experienced a surge in the use of multimedia content. Identification of desired information from the huge image databases has been facing increased complexities for designing an efficient feature extraction process. Conventional approaches of image classification with text-based image annotation have faced assorted limitations due to erroneous interpretation of vocabulary and huge time consumption involved due to manual annotation. Content-based image recognition has emerged as an alternative to combat the aforesaid limitations. However, exploring rich feature content in an image with a single technique has lesser probability of extract meaningful signatures compared to multi-technique feature extraction. Therefore, the purpose of this paper is to explore the possibilities of enhanced content-based image recognition by fusion of classification decision obtained using diverse feature extraction techniques.

Design/methodology/approach

Three novel techniques of feature extraction have been introduced in this paper and have been tested with four different classifiers individually. The four classifiers used for performance testing were K nearest neighbor (KNN) classifier, RIDOR classifier, artificial neural network classifier and support vector machine classifier. Thereafter, classification decisions obtained using KNN classifier for different feature extraction techniques have been integrated by Z-score normalization and feature scaling to create fusion-based framework of image recognition. It has been followed by the introduction of a fusion-based retrieval model to validate the retrieval performance with classified query. Earlier works on content-based image identification have adopted fusion-based approach. However, to the best of the authors’ knowledge, fusion-based query classification has been addressed for the first time as a precursor of retrieval in this work.

Findings

The proposed fusion techniques have successfully outclassed the state-of-the-art techniques in classification and retrieval performances. Four public data sets, namely, Wang data set, Oliva and Torralba (OT-scene) data set, Corel data set and Caltech data set comprising of 22,615 images on the whole are used for the evaluation purpose.

Originality/value

To the best of the authors’ knowledge, fusion-based query classification has been addressed for the first time as a precursor of retrieval in this work. The novel idea of exploring rich image features by fusion of multiple feature extraction techniques has also encouraged further research on dimensionality reduction of feature vectors for enhanced classification results.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 10 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 9 April 2019

Aabid Hussain, Sumeer Gul, Tariq Ahmad Shah and Sheikh Shueb

The purpose of this study is to explore the retrieval effectiveness of three image search engines (ISE) – Google Images, Yahoo Image Search and Picsearch in terms of their image…

Abstract

Purpose

The purpose of this study is to explore the retrieval effectiveness of three image search engines (ISE) – Google Images, Yahoo Image Search and Picsearch in terms of their image retrieval capability. It is an effort to carry out a Cranfield experiment to know how efficient the commercial giants in the image search are and how efficient an image specific search engine is.

Design/methodology/approach

The keyword search feature of three ISEs – Google images, Yahoo Image Search and Picsearch – was exploited to make search with keyword captions of photos as query terms. Selected top ten images were used to act as a testbed for the study, as images were searched in accordance with features of the test bed. Features to be looked for included size (1200 × 800), format of images (JPEG/JPG) and the rank of the original image retrieved by ISEs under study. To gauge the overall retrieval effectiveness in terms of set standards, only first 50 result hits were checked. Retrieval efficiency of select ISEs were examined with respect to their precision and relative recall.

Findings

Yahoo Image Search outscores Google Images and Picsearch both in terms of precision and relative recall. Regarding other criteria – image size, image format and image rank in search results, Google Images is ahead of others.

Research limitations/implications

The study only takes into consideration basic image search feature, i.e. text-based search.

Practical implications

The study implies that image search engines should focus on relevant descriptions. The study evaluated text-based image retrieval facilities and thereby offers a choice to users to select best among the available ISEs for their use.

Originality/value

The study provides an insight into the effectiveness of the three ISEs. The study is one of the few studies to gauge retrieval effectiveness of ISEs. Study also produced key findings that are important for all ISE users and researchers and the Web image search industry. Findings of the study will also prove useful for search engine companies to improve their services.

Details

The Electronic Library , vol. 37 no. 1
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 15 March 2018

Fatemeh Alyari and Nima Jafari Navimipour

This paper aims to identify, evaluate and integrate the findings of all relevant and high-quality individual studies addressing one or more research questions about recommender…

2476

Abstract

Purpose

This paper aims to identify, evaluate and integrate the findings of all relevant and high-quality individual studies addressing one or more research questions about recommender systems and performing a comprehensive study of empirical research on recommender systems that have been divided into five main categories. To achieve this aim, the authors use systematic literature review (SLR) as a powerful method to collect and critically analyze the research papers. Also, the authors discuss the selected recommender systems and its main techniques, as well as their benefits and drawbacks in general.

Design/methodology/approach

In this paper, the SLR method is utilized with the aim of identifying, evaluating and integrating the findings of all relevant and high-quality individual studies addressing one or more research questions about recommender systems and performing a comprehensive study of empirical research on recommender systems that have been divided into five main categories. Also, the authors discussed recommender system and its techniques in general without a specific domain.

Findings

The major developments in categories of recommender systems are reviewed, and new challenges are outlined. Furthermore, insights on the identification of open issues and guidelines for future research are provided. Also, this paper presents the systematical analysis of the recommender system literature from 2005. The authors identified 536 papers, which were reduced to 51 primary studies through the paper selection process.

Originality/value

This survey will directly support academics and practical professionals in their understanding of developments in recommender systems and its techniques.

Details

Kybernetes, vol. 47 no. 5
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 18 October 2018

Kalyan Nagaraj, Biplab Bhattacharjee, Amulyashree Sridhar and Sharvani GS

Phishing is one of the major threats affecting businesses worldwide in current times. Organizations and customers face the hazards arising out of phishing attacks because of…

Abstract

Purpose

Phishing is one of the major threats affecting businesses worldwide in current times. Organizations and customers face the hazards arising out of phishing attacks because of anonymous access to vulnerable details. Such attacks often result in substantial financial losses. Thus, there is a need for effective intrusion detection techniques to identify and possibly nullify the effects of phishing. Classifying phishing and non-phishing web content is a critical task in information security protocols, and full-proof mechanisms have yet to be implemented in practice. The purpose of the current study is to present an ensemble machine learning model for classifying phishing websites.

Design/methodology/approach

A publicly available data set comprising 10,068 instances of phishing and legitimate websites was used to build the classifier model. Feature extraction was performed by deploying a group of methods, and relevant features extracted were used for building the model. A twofold ensemble learner was developed by integrating results from random forest (RF) classifier, fed into a feedforward neural network (NN). Performance of the ensemble classifier was validated using k-fold cross-validation. The twofold ensemble learner was implemented as a user-friendly, interactive decision support system for classifying websites as phishing or legitimate ones.

Findings

Experimental simulations were performed to access and compare the performance of the ensemble classifiers. The statistical tests estimated that RF_NN model gave superior performance with an accuracy of 93.41 per cent and minimal mean squared error of 0.000026.

Research limitations/implications

The research data set used in this study is publically available and easy to analyze. Comparative analysis with other real-time data sets of recent origin must be performed to ensure generalization of the model against various security breaches. Different variants of phishing threats must be detected rather than focusing particularly toward phishing website detection.

Originality/value

The twofold ensemble model is not applied for classification of phishing websites in any previous studies as per the knowledge of authors.

Details

Journal of Systems and Information Technology, vol. 20 no. 3
Type: Research Article
ISSN: 1328-7265

Keywords

Article
Publication date: 12 October 2021

Didem Ölçer and Tuğba Taşkaya Temizel

This paper proposes a framework that automatically assesses content coverage and information quality of health websites for end-users.

Abstract

Purpose

This paper proposes a framework that automatically assesses content coverage and information quality of health websites for end-users.

Design/methodology/approach

The study investigates the impact of textual and content-based features in predicting the quality of health-related texts. Content-based features were acquired using an evidence-based practice guideline in diabetes. A set of textual features inspired by professional health literacy guidelines and the features commonly used for assessing information quality in other domains were also used. In this study, 60 websites about type 2 diabetes were methodically selected for inclusion. Two general practitioners used DISCERN to assess each website in terms of its content coverage and quality.

Findings

The proposed framework outputs were compared with the experts' evaluation scores. The best accuracy was obtained as 88 and 92% with textual features and content-based features for coverage assessment respectively. When both types of features were used, the proposed framework achieved 90% accuracy. For information quality assessment, the content-based features resulted in a higher accuracy of 92% against 88% obtained using the textual features.

Research limitations/implications

The experiments were conducted for websites about type 2 diabetes. As the whole process is costly and requires extensive expert human labelling, the study was carried out in a single domain. However, the methodology is generalizable to other health domains for which evidence-based practice guidelines are available.

Practical implications

Finding high-quality online health information is becoming increasingly difficult due to the high volume of information generated by non-experts in the area. The search engines fail to rank objective health websites higher within the search results. The proposed framework can aid search engine and information platform developers to implement better retrieval techniques, in turn, facilitating end-users' access to high-quality health information.

Social implications

Erroneous, biased or partial health information is a serious problem for end-users who need access to objective information on their health problems. Such information may cause patients to stop their treatments provided by professionals. It might also have adverse financial implications by causing unnecessary expenditures on ineffective treatments. The ability to access high-quality health information has a positive effect on the health of both individuals and the whole society.

Originality/value

The paper demonstrates that automatic assessment of health websites is a domain-specific problem, which cannot be addressed with the general information quality assessment methodologies in the literature. Content coverage of health websites has also been studied in the health domain for the first time in the literature.

Details

Online Information Review, vol. 46 no. 4
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 16 August 2019

Neda Tadi Bani and Shervan Fekri-Ershad

Large amount of data are stored in image format. Image retrieval from bulk databases has become a hot research topic. An alternative method for efficient image retrieval is…

Abstract

Purpose

Large amount of data are stored in image format. Image retrieval from bulk databases has become a hot research topic. An alternative method for efficient image retrieval is proposed based on a combination of texture and colour information. The main purpose of this paper is to propose a new content based image retrieval approach using combination of color and texture information in spatial and transform domains jointly.

Design/methodology/approach

Various methods are provided for image retrieval, which try to extract the image contents based on texture, colour and shape. The proposed image retrieval method extracts global and local texture and colour information in two spatial and frequency domains. In this way, image is filtered by Gaussian filter, then co-occurrence matrices are made in different directions and the statistical features are extracted. The purpose of this phase is to extract noise-resistant local textures. Then the quantised histogram is produced to extract global colour information in the spatial domain. Also, Gabor filter banks are used to extract local texture features in the frequency domain. After concatenating the extracted features and using the normalised Euclidean criterion, retrieval is performed.

Findings

The performance of the proposed method is evaluated based on the precision, recall and run time measures on the Simplicity database. It is compared with many efficient methods of this field. The comparison results showed that the proposed method provides higher precision than many existing methods.

Originality/value

The comparison results showed that the proposed method provides higher precision than many existing methods. Rotation invariant, scale invariant and low sensitivity to noise are some advantages of the proposed method. The run time of the proposed method is within the usual time frame of algorithms in this domain, which indicates that the proposed method can be used online.

Details

The Electronic Library , vol. 37 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 8 January 2024

Na Ye, Dingguo Yu, Xiaoyu Ma, Yijie Zhou and Yanqin Yan

Fake news in cyberspace has greatly interfered with national governance, economic development and cultural communication, which has greatly increased the demand for fake news…

Abstract

Purpose

Fake news in cyberspace has greatly interfered with national governance, economic development and cultural communication, which has greatly increased the demand for fake news detection and intervention. At present, the recognition methods based on news content all lose part of the information to varying degrees. This paper proposes a lightweight content-based detection method to achieve early identification of false information with low computation costs.

Design/methodology/approach

The authors' research proposes a lightweight fake news detection framework for English text, including a new textual feature extraction method, specifically mapping English text and symbols to 0–255 using American Standard Code for Information Interchange (ASCII) codes, treating the completed sequence of numbers as the values of picture pixel points and using a computer vision model to detect them. The authors also compare the authors' framework with traditional word2vec, Glove, bidirectional encoder representations from transformers (BERT) and other methods.

Findings

The authors conduct experiments on the lightweight neural networks Ghostnet and Shufflenet, and the experimental results show that the authors' proposed framework outperforms the baseline in accuracy on both lightweight networks.

Originality/value

The authors' method does not rely on additional information from text data and can efficiently perform the fake news detection task with less computational resource consumption. In addition, the feature extraction method of this framework is relatively new and enlightening for text content-based classification detection, which can detect fake news in time at the early stage of fake news propagation.

Details

Online Information Review, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 23 August 2019

Shenlong Wang, Kaixin Han and Jiafeng Jin

In the past few decades, the content-based image retrieval (CBIR), which focuses on the exploration of image feature extraction methods, has been widely investigated. The term of…

Abstract

Purpose

In the past few decades, the content-based image retrieval (CBIR), which focuses on the exploration of image feature extraction methods, has been widely investigated. The term of feature extraction is used in two cases: application-based feature expression and mathematical approaches for dimensionality reduction. Feature expression is a technique of describing the image color, texture and shape information with feature descriptors; thus, obtaining effective image features expression is the key to extracting high-level semantic information. However, most of the previous studies regarding image feature extraction and expression methods in the CBIR have not performed systematic research. This paper aims to introduce the basic image low-level feature expression techniques for color, texture and shape features that have been developed in recent years.

Design/methodology/approach

First, this review outlines the development process and expounds the principle of various image feature extraction methods, such as color, texture and shape feature expression. Second, some of the most commonly used image low-level expression algorithms are implemented, and the benefits and drawbacks are summarized. Third, the effectiveness of the global and local features in image retrieval, including some classical models and their illustrations provided by part of our experiment, are analyzed. Fourth, the sparse representation and similarity measurement methods are introduced, and the retrieval performance of statistical methods is evaluated and compared.

Findings

The core of this survey is to review the state of the image low-level expression methods and study the pros and cons of each method, their applicable occasions and certain implementation measures. This review notes that image peculiarities of single-feature descriptions may lead to unsatisfactory image retrieval capabilities, which have significant singularity and considerable limitations and challenges in the CBIR.

Originality/value

A comprehensive review of the latest developments in image retrieval using low-level feature expression techniques is provided in this paper. This review not only introduces the major approaches for image low-level feature expression but also supplies a pertinent reference for those engaging in research regarding image feature extraction.

Article
Publication date: 13 May 2021

Chanattra Ammatmanee and Lu Gan

Due to the worldwide growth of digital image sharing and the maturity of the tourism industry, the vast and growing collections of digital images have become a challenge for those…

Abstract

Purpose

Due to the worldwide growth of digital image sharing and the maturity of the tourism industry, the vast and growing collections of digital images have become a challenge for those who use and/or manage these image data across tourism settings. To overcome the image indexing task with less labour cost and improve the image retrieval task with less human errors, the content-based image retrieval (CBIR) technique has been investigated for the tourism domain particularly. This paper aims to review the relevant literature in the field to understand these previous works and identify research gaps for future directions.

Design/methodology/approach

A systematic and comprehensive review of CBIR studies in tourism from the year 2010 to 2019, focussing on journal articles and conference proceedings in reputable online databases, is conducted by taking a comparative approach to critically analyse and address the trends of each fundamental element in these research experiments.

Findings

Based on the review of the literature, the trends of CBIR studies in tourism is to improve image representation and retrieval by advancing existing feature extraction techniques, contributing novel techniques in the feature extraction process through fine-tuning fusion features and improving image query of CBIR systems. Co-authorship, tourist attraction sector and fusion image features have been in focus. Nonetheless, the number of studies in other tourism sectors and available image databases could be further explored.

Originality/value

The fact that no existing academic review of CBIR studies in tourism makes this paper a novel contribution.

Details

The Electronic Library , vol. 39 no. 2
Type: Research Article
ISSN: 0264-0473

Keywords

1 – 10 of 478