Search results

1 – 10 of over 18000
To view the access options for this content please click here
Article
Publication date: 16 October 2009

A.C.M. Fong, S.C. Hui and P.Y. Lee

With the proliferation of objectionable world wide web (WWW or web) materials such as pornography and violence, there is an increasing need for effective web content…

Abstract

Purpose

With the proliferation of objectionable world wide web (WWW or web) materials such as pornography and violence, there is an increasing need for effective web content filtering tools to protect unsuspecting users from the harmful effect of such materials. This paper aims to discuss this issue.

Design/methodology/approach

Using pornographic web materials as a case study, the authors have developed an effective filtering solution that uses machine intelligence to perform offline web page classification into allowed and disallowed web pages.

Findings

The results are stored in a database for fast online retrieval whenever access to a web page is requested.

Practical implications

The separation between offline classification and online filtering ensures fast blocking decisions are made from the user's viewpoint.

Originality/value

There is an urgent and continued need for effective measures against the proliferation of objectionable materials on the web. In this paper, the authors describe a possible solution in the form of a complete working system. Future research will focus on adding appropriate modules to tackle other types of objectionable materials than the type described. The basic framework, however, should be applicable to a wide range of materials.

Details

Kybernetes, vol. 38 no. 9
Type: Research Article
ISSN: 0368-492X

Keywords

To view the access options for this content please click here
Article
Publication date: 1 November 2005

Mohamed Hammami, Youssef Chahir and Liming Chen

Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering…

Abstract

Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering undesirable web content. In this paper, we investigate this problem through WebGuard, our automatic machine learning based pornographic website classification and filtering system. Facing the Internet more and more visual and multimedia as exemplified by pornographic websites, we focus here our attention on the use of skin color related visual content based analysis along with textual and structural content based analysis for improving pornographic website filtering. While the most commercial filtering products on the marketplace are mainly based on textual content‐based analysis such as indicative keywords detection or manually collected black list checking, the originality of our work resides on the addition of structural and visual content‐based analysis to the classical textual content‐based analysis along with several major‐data mining techniques for learning and classifying. Experimented on a testbed of 400 websites including 200 adult sites and 200 non pornographic ones, WebGuard, our Web filtering engine scored a 96.1% classification accuracy rate when only textual and structural content based analysis are used, and 97.4% classification accuracy rate when skin color related visual content based analysis is driven in addition. Further experiments on a black list of 12 311 adult websites manually collected and classified by the French Ministry of Education showed that WebGuard scored 87.82% classification accuracy rate when using only textual and structural content‐based analysis, and 95.62% classification accuracy rate when the visual content‐based analysis is driven in addition. The basic framework of WebGuard can apply to other categorization problems of websites which combine, as most of them do today, textual and visual content.

Details

International Journal of Web Information Systems, vol. 1 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

To view the access options for this content please click here
Article
Publication date: 16 October 2009

Koraljka Golub and Marianne Lykke

The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven…

Abstract

Purpose

The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to investigate the performance of an automated classification algorithm based on the Ei classification scheme.

Design/methodology/approach

A user study was conducted in which users solved four controlled searching tasks. The users browsed the Ei classification scheme in order to examine the suitability of the classification systems for browsing. The classification algorithm was evaluated by the users who judged the correctness of the automatically assigned classes.

Findings

The study showed that the Ei classification scheme is suited for browsing. Automatically assigned classes were on average partly correct, with some classes working better than others. Success of browsing showed to be correlated and dependent on classification correctness.

Research limitations/implications

Further research should address problems of disparate evaluations of one and the same web page. Additional reasons behind browsing failures in the Ei classification scheme also need further investigation.

Practical implications

Improvements for browsing were identified: describing class captions and/or listing their subclasses from start; allowing for searching for words from class captions with synonym search (easily provided for Ei since the classes are mapped to thesauri terms); when searching for class captions, returning the hierarchical tree expanded around the class in which caption the search term is found. The need for improvements of classification schemes was also indicated.

Originality/value

A user‐based evaluation of automated subject classification in the context of browsing has not been conducted before; hence the study also presents new findings concerning methodology.

Details

Journal of Documentation, vol. 65 no. 6
Type: Research Article
ISSN: 0022-0418

Keywords

To view the access options for this content please click here
Article
Publication date: 1 April 2004

Ben Choi and Xiaogang Peng

Automatic classification of Web pages is an effective way to organise the vast amount of information and to assist in retrieving relevant information from the Internet…

Abstract

Automatic classification of Web pages is an effective way to organise the vast amount of information and to assist in retrieving relevant information from the Internet. Although many automatic classification systems have been proposed, most of them ignore the conflict between the fixed number of categories and the growing number of Web pages being added into the systems. They also require searching through all existing categories to make any classification. This article proposes a dynamic and hierarchical classification system that is capable of adding new categories as required, organising the Web pages into a tree structure, and classifying Web pages by searching through only one path of the tree. The proposed single‐path search technique reduces the search complexity from θ(n) to θ(log(n)). Test results show that the system improves the accuracy of classification by 6 percent in comparison to related systems. The dynamic‐category expansion technique also achieves satisfying results for adding new categories into the system as required.

Details

Online Information Review, vol. 28 no. 2
Type: Research Article
ISSN: 1468-4527

Keywords

To view the access options for this content please click here
Article
Publication date: 1 December 2003

Kristin Eschenfelder

This paper takes a social shaping of technology approach to identify and explain sources of conflict in the design or enhancement of corporate Web sites. Data from a…

Abstract

This paper takes a social shaping of technology approach to identify and explain sources of conflict in the design or enhancement of corporate Web sites. Data from a multi‐case field study show how Web site classification schemes embedded in Web site design elements created intra‐organizational conflicts because the schemes could not equally accommodate different sub‐units' customer requirements. Interview data demonstrate Web managers' perceptions that Web classification schemes privileged certain sets of customer needs, and Web managers' actions to shape the design of classification schemes to satisfy their perceived customer needs. Data analysis identified three design elements of Web sites associated with sub‐unit conflict: classification categories, templates and tool bars, and database entities and attributes.

Details

Information Technology & People, vol. 16 no. 4
Type: Research Article
ISSN: 0959-3845

Keywords

To view the access options for this content please click here
Article
Publication date: 1 October 2003

Mike Thelwall, Liwen Vaughan, Viv Cothey, Xuemei Li and Alastair G. Smith

The use of the Web by academic researchers is discipline‐dependent and highly variable. It is increasingly central for sharing information, disseminating results and…

Abstract

The use of the Web by academic researchers is discipline‐dependent and highly variable. It is increasingly central for sharing information, disseminating results and publicising research projects. This pilot study seeks to identify the subjects that have the most impact on the Web, and look for national differences in online subject visibility. The highest impact sites were from computing, but there were major national differences in the impact of engineering and technology sites. Another difference was that Taiwan had more high impact non‐academic sites hosted by universities. As a pilot study, the classification process itself was also investigated and the problems of applying subject classification to academic Web sites discussed. The study draws out a number of issues in this regard, having no simple solutions and point to the need to interpret the results with caution.

Details

Online Information Review, vol. 27 no. 5
Type: Research Article
ISSN: 1468-4527

Keywords

To view the access options for this content please click here
Article
Publication date: 19 October 2010

Ashish Kathuria, Bernard J. Jansen, Carolyn Hafernik and Amanda Spink

Web search engines are frequently used by people to locate information on the Internet. However, not all queries have an informational goal. Instead of information, some…

Abstract

Purpose

Web search engines are frequently used by people to locate information on the Internet. However, not all queries have an informational goal. Instead of information, some people may be looking for specific web sites or may wish to conduct transactions with web services. This paper aims to focus on automatically classifying the different user intents behind web queries.

Design/methodology/approach

For the research reported in this paper, 130,000 web search engine queries are categorized as informational, navigational, or transactional using a k‐means clustering approach based on a variety of query traits.

Findings

The research findings show that more than 75 percent of web queries (clustered into eight classifications) are informational in nature, with about 12 percent each for navigational and transactional. Results also show that web queries fall into eight clusters, six primarily informational, and one each of primarily transactional and navigational.

Research limitations/implications

This study provides an important contribution to web search literature because it provides information about the goals of searchers and a method for automatically classifying the intents of the user queries. Automatic classification of user intent can lead to improved web search engines by tailoring results to specific user needs.

Practical implications

The paper discusses how web search engines can use automatically classified user queries to provide more targeted and relevant results in web searching by implementing a real time classification method as presented in this research.

Originality/value

This research investigates a new application of a method for automatically classifying the intent of user queries. There has been limited research to date on automatically classifying the user intent of web queries, even though the pay‐off for web search engines can be quite beneficial.

Details

Internet Research, vol. 20 no. 5
Type: Research Article
ISSN: 1066-2243

Keywords

To view the access options for this content please click here
Article
Publication date: 1 August 2005

Ming Yin Ming, Dion Hoe‐lian Goh, Ee‐Peng Lim and Aixin Sun

A web site usually contains a large number of concept entities, each consisting of one or more web pages connected by hyperlinks. In order to discover these concept…

Abstract

A web site usually contains a large number of concept entities, each consisting of one or more web pages connected by hyperlinks. In order to discover these concept entities for more expressive web site queries and other applications, the web unit mining problem has been proposed. Web unit mining aims to determine web pages that constitute a concept entity and classify concept entities into categories. Nevertheless, the performance of an existing web unit mining algorithm, iWUM, suffers as it may create more than one web unit (incomplete web units) from a single concept entity. This paper presents two methods to solve this problem. The first method introduces a more effective web fragment construction method so as reduce later classification errors. The second method incorporates site‐specific knowledge to discover and handle incomplete web units. Experiments show that incomplete web units can be removed and overall accuracy has been significantly improved, especially on the precision and F1 measures.

Details

International Journal of Web Information Systems, vol. 1 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

To view the access options for this content please click here
Article
Publication date: 21 November 2008

Mohamed Hammami, Radhouane Guermazi and Abdelmajid Ben Hamadou

The growth of the web and the increasing number of documents electronically available has been paralleled by the emergence of harmful web pages content such as…

Abstract

Purpose

The growth of the web and the increasing number of documents electronically available has been paralleled by the emergence of harmful web pages content such as pornography, violence, racism, etc. This emergence involved the necessity of providing filtering systems designed to secure the internet access. Most of them process mainly the adult content and focus on blocking pornography, marginalizing violence. The purpose of this paper is to propose a violent web content detection and filtering system, which uses textual and structural content‐based analysis.

Design/methodology/approach

The violent web content detection and filtering system uses textual and structural content‐based analysis based on a violent keyword dictionary. The paper focuses on the keyword dictionary preparation, and presents a comparative study of different data mining techniques to block violent content web pages.

Findings

The solution presented in this paper showed its effectiveness by scoring a 89 per cent classification accuracy rate on its test data set.

Research limitations/implications

Many future work directions can be considered. This paper analyzed only the web page, and an additional analysis of the visual content can be one of the directions of future work. Future research is underway to develop effective filtering tools for other types of harmful web pages, such as racist, etc.

Originality/value

The paper's major contributions are first, the study and comparison of several decision tree building algorithms to build a violent web classifier based on a textual and structural content‐based analysis for improving web filtering. Second, showing laborious dictionary building by finding automatically discriminative indicative keywords.

Details

International Journal of Web Information Systems, vol. 4 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

To view the access options for this content please click here
Article
Publication date: 14 August 2007

Yanbo Ru and Ellis Horowitz

Most e‐commerce web sites use HTML forms for user authentication, new user registration, newsletter subscription, and searching for products and services. The purpose of…

Abstract

Purpose

Most e‐commerce web sites use HTML forms for user authentication, new user registration, newsletter subscription, and searching for products and services. The purpose of this paper is to present a method for automated classification of HTML forms, which is important for search engine applications, e.g. Yahoo Shopping and Google's Froogle, as they can be used to improve the quality of the index and accuracy of search results.

Design/methodology/approach

Describes a technique for classifying HTML forms based on their features. Develops algorithms for automatic feature generation of HTML forms and a neural network to classify them.

Findings

The authors tested their classifier on an e‐commerce data set and a randomly retrieved data set and achieved accuracy of 94.7 and 93.9 per cent respectively. Experimental results show that the classifier is effective and efficient on both test beds, suggesting that it is a promising general purpose method.

Originality/value

The paper is of value to those involved with information management and e‐commerce.

Details

Online Information Review, vol. 31 no. 4
Type: Research Article
ISSN: 1468-4527

Keywords

1 – 10 of over 18000