Search results

1 – 10 of over 1000
To view the access options for this content please click here
Article
Publication date: 1 November 2005

Mohamed Hammami, Youssef Chahir and Liming Chen

Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering…

Abstract

Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering undesirable web content. In this paper, we investigate this problem through WebGuard, our automatic machine learning based pornographic website classification and filtering system. Facing the Internet more and more visual and multimedia as exemplified by pornographic websites, we focus here our attention on the use of skin color related visual content based analysis along with textual and structural content based analysis for improving pornographic website filtering. While the most commercial filtering products on the marketplace are mainly based on textual content‐based analysis such as indicative keywords detection or manually collected black list checking, the originality of our work resides on the addition of structural and visual content‐based analysis to the classical textual content‐based analysis along with several major‐data mining techniques for learning and classifying. Experimented on a testbed of 400 websites including 200 adult sites and 200 non pornographic ones, WebGuard, our Web filtering engine scored a 96.1% classification accuracy rate when only textual and structural content based analysis are used, and 97.4% classification accuracy rate when skin color related visual content based analysis is driven in addition. Further experiments on a black list of 12 311 adult websites manually collected and classified by the French Ministry of Education showed that WebGuard scored 87.82% classification accuracy rate when using only textual and structural content‐based analysis, and 95.62% classification accuracy rate when the visual content‐based analysis is driven in addition. The basic framework of WebGuard can apply to other categorization problems of websites which combine, as most of them do today, textual and visual content.

Details

International Journal of Web Information Systems, vol. 1 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

To view the access options for this content please click here
Article
Publication date: 21 November 2008

Mohamed Hammami, Radhouane Guermazi and Abdelmajid Ben Hamadou

The growth of the web and the increasing number of documents electronically available has been paralleled by the emergence of harmful web pages content such as…

Abstract

Purpose

The growth of the web and the increasing number of documents electronically available has been paralleled by the emergence of harmful web pages content such as pornography, violence, racism, etc. This emergence involved the necessity of providing filtering systems designed to secure the internet access. Most of them process mainly the adult content and focus on blocking pornography, marginalizing violence. The purpose of this paper is to propose a violent web content detection and filtering system, which uses textual and structural content‐based analysis.

Design/methodology/approach

The violent web content detection and filtering system uses textual and structural content‐based analysis based on a violent keyword dictionary. The paper focuses on the keyword dictionary preparation, and presents a comparative study of different data mining techniques to block violent content web pages.

Findings

The solution presented in this paper showed its effectiveness by scoring a 89 per cent classification accuracy rate on its test data set.

Research limitations/implications

Many future work directions can be considered. This paper analyzed only the web page, and an additional analysis of the visual content can be one of the directions of future work. Future research is underway to develop effective filtering tools for other types of harmful web pages, such as racist, etc.

Originality/value

The paper's major contributions are first, the study and comparison of several decision tree building algorithms to build a violent web classifier based on a textual and structural content‐based analysis for improving web filtering. Second, showing laborious dictionary building by finding automatically discriminative indicative keywords.

Details

International Journal of Web Information Systems, vol. 4 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

To view the access options for this content please click here
Article
Publication date: 1 March 2003

P.Y. Lee, S.C. Hui and A.C.M. Fong

With the proliferation of objectionable materials (e.g. pornography, violence, drugs, etc.) available on the WWW, there is an urgent need for effective countermeasures to…

Abstract

With the proliferation of objectionable materials (e.g. pornography, violence, drugs, etc.) available on the WWW, there is an urgent need for effective countermeasures to protect children and other unsuspecting users from exposure to such materials. Using pornographic Web pages as a case study, this paper presents a thorough analysis of the distinguishing features of such Web pages. The objective of the study is to gain knowledge on the structure and characteristics of typical pornographic Web pages so that effective Web filtering techniques can be developed to filter them automatically. In this paper, we first survey the existing techniques for Web content filtering. A study on the characteristics of pornographic Web pages is then presented. The implementation of a Web content filtering system that combines the use of an artificial neural network and the knowledge gained in the analysis of pornographic Web pages is also given.

Details

Internet Research, vol. 13 no. 1
Type: Research Article
ISSN: 1066-2243

Keywords

To view the access options for this content please click here

Abstract

Purpose

Ubiquitous web applications (UWA) are a new type of web applications which are accessed in various contexts, i.e. through different devices, by users with various interests, at anytime from anyplace around the globe. For such full‐fledged, complex software systems, a methodologically sound engineering approach in terms of model‐driven engineering (MDE) is crucial. Several modeling approaches have already been proposed that capture the ubiquitous nature of web applications, each of them having different origins, pursuing different goals and providing a pantheon of concepts. This paper aims to give an in‐depth comparison of seven modeling approaches supporting the development of UWAs.

Design/methodology/approach

This methodology is conducted by applying a detailed set of evaluation criteria and by demonstrating its applicability on basis of an exemplary tourism web application. In particular, five commonly found ubiquitous scenarios are investigated, thus providing initial insight into the modeling concepts of each approach as well as to facilitate their comparability.

Findings

The results gained indicate that many modeling approaches lack a proper MDE foundation in terms of meta‐models and tool support. The proposed modeling mechanisms for ubiquity are often limited, since they neither cover all relevant context factors in an explicit, self‐contained, and extensible way, nor allow for a wide spectrum of extensible adaptation operations. The provided modeling concepts frequently do not allow dealing with all different parts of a web application in terms of its content, hypertext, and presentation levels as well as their structural and behavioral features. Finally, current modeling approaches do not reflect the crosscutting nature of ubiquity but rather intermingle context and adaptation issues with the core parts of a web application, thus hampering maintainability and extensibility.

Originality/value

Different from other surveys in the area of modeling web applications, this paper specifically considers modeling concepts for their ubiquitous nature, together with an investigation of available support for MDD in a comprehensive way, using a well‐defined as well as fine‐grained catalogue of more than 30 evaluation criteria.

Details

International Journal of Web Information Systems, vol. 4 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

To view the access options for this content please click here
Article
Publication date: 17 August 2015

Takahiro Komamizu, Toshiyuki Amagasa and Hiroyuki Kitagawa

The purpose of this paper is to extract appropriate terms to summarize the current results in terms of the contents of textual facets. Faceted search on XML data helps…

Abstract

Purpose

The purpose of this paper is to extract appropriate terms to summarize the current results in terms of the contents of textual facets. Faceted search on XML data helps users find necessary information from XML data by giving attribute–content pairs (called facet-value pair) about the current search results. However, if most of the contents of a facet have longer texts in average (such facets are called textual facets), it is not easy to overview the current results.

Design/methodology/approach

The proposed approach is based upon subsumption relationships of terms among the contents of a facet. The subsumption relationship can be extracted using co-occurrences of terms among a number of documents (in this paper, a content of a facet is considered as a document). Subsumption relationships compose hierarchies, and the authors utilize the hierarchies to extract facet-values from textual facets. In the faceted search context, users have ambiguous search demands, they expect broader terms. Thus, we extract high-level terms in the hierarchies as facet-values.

Findings

The main findings of this paper are the extracted terms improve users’ search experiences, especially in cases when the search demands are ambiguous.

Originality/value

An originality of this paper is the way to utilize the textual contents of XML data for improving users’ search experiences on faceted search. The other originality is how to design the tasks to evaluate exploratory search like faceted search.

Details

International Journal of Web Information Systems, vol. 11 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

To view the access options for this content please click here
Article
Publication date: 24 October 2008

Nicolas Virtsonis and Sally Harridge‐March

The purpose of this paper is to examine the way in which brand positioning elements are manifested in the business‐to‐business (B2B) online environment.

Abstract

Purpose

The purpose of this paper is to examine the way in which brand positioning elements are manifested in the business‐to‐business (B2B) online environment.

Design/methodology/approach

The UK print industry is used to investigate the web site elements used to communicate positioning elements through the content analysis of corporate web pages of 30 UK print suppliers.

Findings

A framework is developed to show how web site communications are manifested in the online B2B environment.

Research limitations/implications

Because the research vehicle is a sample of websites from only one industry the findings may not be transferable to all industries nor to the whole industry. However, the model is a useful framework for helping managers to plan their online communications.

Practical implications

The paper concludes by giving recommendations about how the framework can be used by practitioners in order to improve the linkage between communications messages and the means for transferring these messages.

Originality/value

This is a novel approach to examining branding elements in the online environment. Comparatively little literature exists which examines branding in the online B2B environment.

Details

Marketing Intelligence & Planning, vol. 26 no. 7
Type: Research Article
ISSN: 0263-4503

Keywords

To view the access options for this content please click here
Article
Publication date: 8 August 2008

Giovanni Tummarello, Christian Morbidoni, Paolo Puliti and Francesco Piazza

The purpose of this paper is to investigate and prove the feasibility of a semantic web (SW) based approach to textual encoding. It aims to discuss benefits and novel…

Abstract

Purpose

The purpose of this paper is to investigate and prove the feasibility of a semantic web (SW) based approach to textual encoding. It aims to discuss benefits and novel possibilities with respect to traditional XML‐based approaches.

Design/methodology/approach

The markup process can be seen as a task of knowledge representation where elements such as words, sentences and pages are instances of conceptual classes forming a semantic network. An ontology web language ontology for textual encoding has been developed, capturing structural and grammatical aspects. Different approaches and tools to query the encoded text are investigated.

Findings

resource description framework (RDF) is powerful and expressive enough to fulfil tasks traditionally done in XML as well as to enable new possibilities such as collaborative and distributed textual encoding and the use of ontology‐based reasoning in text processing and querying. While the encoding of overlapping hierarchies through the use of existing approaches is often complex and leads to idiosyncratic solutions, this problem is naturally solved using SW languages.

Research limitations/implications

To make the approach suitable for widespread adoption, further work is required both in ontologies modelling and in applications (e.g. markup editing).

Practical implications

The prototype implementation imports existing encoded texts, transforms them into RDF‐based markups and uses SW query languages to answer cross‐hierarchy queries. Existing tools (reasoners, search and query engines, etc.) can be used immediately.

Originality/value

This methodology enables distributed interoperability and reuse of previous encoded results and opens the way to novel collaborative textual markup scenarios.

Details

Online Information Review, vol. 32 no. 4
Type: Research Article
ISSN: 1468-4527

Keywords

To view the access options for this content please click here
Article
Publication date: 8 July 2019

Mia Høj Mathiasson and Henrik Jochumsen

The purpose of this paper is to report on a new approach for researching public library programs through Facebook events. The term public library programs refers to…

Abstract

Purpose

The purpose of this paper is to report on a new approach for researching public library programs through Facebook events. The term public library programs refers to publicly announced activities and events taking place within or in relation to a public library. In Denmark, programs are an important part of the practices of public libraries and have been growing in both number and variety within recent years.

Design/methodology/approach

The data for the study presented in this paper consists of Facebook events announcing public library programs. In the study of this data, grounded theory is used as a research strategy and methods of web archiving are used for collecting both the textual and the visual content of the Facebook events.

Findings

The combination of Facebook events as data, grounded theory as a research strategy and web archiving as methods for data collection proves to be useful for researching the format and content of public library programs, which have already taken place.

Research limitations/implications

Only a limited number of Facebook events are examined and the context is restricted to one country.

Originality/value

This paper presents a promising approach for researching public library programs through social media content and provides new insights into both methods and data as well as the phenomenon investigated. Thereby, this paper contributes to a conception of an under-developed researched area as well as a new approach for studying it.

Details

Journal of Documentation, vol. 75 no. 4
Type: Research Article
ISSN: 0022-0418

Keywords

To view the access options for this content please click here
Article
Publication date: 1 March 2000

Christian Bauer and Arno Scharl

Describes an approach automatically to classify and evaluate publicly accessible World Wide Web sites. The suggested methodology is equally valuable for analyzing content

Abstract

Describes an approach automatically to classify and evaluate publicly accessible World Wide Web sites. The suggested methodology is equally valuable for analyzing content and hypertext structures of commercial, educational and non‐profit organizations. Outlines a research methodology for model building and validation and defines the most relevant attributes of such a process. A set of operational criteria for classifying Web sites is developed. The introduced software tool supports the automated gathering of these parameters, and thereby assures the necessary “critical mass” of empirical data. Based on the preprocessed information, a multi‐methodological approach is chosen that comprises statistical clustering, textual analysis, supervised and non‐supervised neural networks and manual classification for validation purposes.

Details

Internet Research, vol. 10 no. 1
Type: Research Article
ISSN: 1066-2243

Keywords

To view the access options for this content please click here
Article
Publication date: 20 April 2010

Tai‐Li Wang

The blogging phenomenon has become a primary mode of mainstream communication for the Web 2.0 era. While previous studies found that campaign web sites did not realise…

Abstract

Purpose

The blogging phenomenon has become a primary mode of mainstream communication for the Web 2.0 era. While previous studies found that campaign web sites did not realise two‐way communication ideals, the current study aims to investigate potential differences in communication patterns between campaign blogs and web sites during Taiwan's 2008 general election, with the aim of exploring whether the blogging phenomenon can improve the process of online political communication.

Design/methodology/approach

The study used a content analysis approach, the web style analysis method, which was designed specifically for analysing web content, and applied it to an online campaign context in a different political culture, using Taiwan's general election as a case study.

Findings

Results indicated that the themes of both campaign blogs and web sites focused on “attacking opponents” rather than focusing on political policies or information on particular issues. However, campaign blogs and web sites significantly differed in all other dimensions, including structural features, functions, interactivity and appeal strategies. Overall, in terms of the online democratic ideal, campaign blogs appeared to allow more democratic, broader, deeper and easier two‐way communication models between candidates and voters or among voters.

Research limitations/implications

The current study focused on candidates' blogs and web sites and did not explore the other vast parts of the online political sphere, particularly independent or citizen‐based blogs, which play significant roles in the decentralised and participant‐networked public spheres.

Originality/value

The study illuminates the role of hyperlinks on campaign blogs. By providing a greater abundance of external links than campaign web sites, campaign blogs allowed more voters, especially younger ones, to share political information in a manner that is quite different from the traditional one‐way communication model. The paper also argues that interactivity measures should be incorporated into the web style analysis method.

Details

Online Information Review, vol. 34 no. 2
Type: Research Article
ISSN: 1468-4527

Keywords

1 – 10 of over 1000