Search results
1 – 10 of over 1000Mohamed Hammami, Youssef Chahir and Liming Chen
Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering…
Abstract
Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering undesirable web content. In this paper, we investigate this problem through WebGuard, our automatic machine learning based pornographic website classification and filtering system. Facing the Internet more and more visual and multimedia as exemplified by pornographic websites, we focus here our attention on the use of skin color related visual content based analysis along with textual and structural content based analysis for improving pornographic website filtering. While the most commercial filtering products on the marketplace are mainly based on textual content‐based analysis such as indicative keywords detection or manually collected black list checking, the originality of our work resides on the addition of structural and visual content‐based analysis to the classical textual content‐based analysis along with several major‐data mining techniques for learning and classifying. Experimented on a testbed of 400 websites including 200 adult sites and 200 non pornographic ones, WebGuard, our Web filtering engine scored a 96.1% classification accuracy rate when only textual and structural content based analysis are used, and 97.4% classification accuracy rate when skin color related visual content based analysis is driven in addition. Further experiments on a black list of 12 311 adult websites manually collected and classified by the French Ministry of Education showed that WebGuard scored 87.82% classification accuracy rate when using only textual and structural content‐based analysis, and 95.62% classification accuracy rate when the visual content‐based analysis is driven in addition. The basic framework of WebGuard can apply to other categorization problems of websites which combine, as most of them do today, textual and visual content.
Details
Keywords
Mohamed Hammami, Radhouane Guermazi and Abdelmajid Ben Hamadou
The growth of the web and the increasing number of documents electronically available has been paralleled by the emergence of harmful web pages content such as…
Abstract
Purpose
The growth of the web and the increasing number of documents electronically available has been paralleled by the emergence of harmful web pages content such as pornography, violence, racism, etc. This emergence involved the necessity of providing filtering systems designed to secure the internet access. Most of them process mainly the adult content and focus on blocking pornography, marginalizing violence. The purpose of this paper is to propose a violent web content detection and filtering system, which uses textual and structural content‐based analysis.
Design/methodology/approach
The violent web content detection and filtering system uses textual and structural content‐based analysis based on a violent keyword dictionary. The paper focuses on the keyword dictionary preparation, and presents a comparative study of different data mining techniques to block violent content web pages.
Findings
The solution presented in this paper showed its effectiveness by scoring a 89 per cent classification accuracy rate on its test data set.
Research limitations/implications
Many future work directions can be considered. This paper analyzed only the web page, and an additional analysis of the visual content can be one of the directions of future work. Future research is underway to develop effective filtering tools for other types of harmful web pages, such as racist, etc.
Originality/value
The paper's major contributions are first, the study and comparison of several decision tree building algorithms to build a violent web classifier based on a textual and structural content‐based analysis for improving web filtering. Second, showing laborious dictionary building by finding automatically discriminative indicative keywords.
Details
Keywords
P.Y. Lee, S.C. Hui and A.C.M. Fong
With the proliferation of objectionable materials (e.g. pornography, violence, drugs, etc.) available on the WWW, there is an urgent need for effective countermeasures to…
Abstract
With the proliferation of objectionable materials (e.g. pornography, violence, drugs, etc.) available on the WWW, there is an urgent need for effective countermeasures to protect children and other unsuspecting users from exposure to such materials. Using pornographic Web pages as a case study, this paper presents a thorough analysis of the distinguishing features of such Web pages. The objective of the study is to gain knowledge on the structure and characteristics of typical pornographic Web pages so that effective Web filtering techniques can be developed to filter them automatically. In this paper, we first survey the existing techniques for Web content filtering. A study on the characteristics of pornographic Web pages is then presented. The implementation of a Web content filtering system that combines the use of an artificial neural network and the knowledge gained in the analysis of pornographic Web pages is also given.
Details
Keywords
Wieland Schwinger, Werner Retschitzegger, Andrea Schauerhuber, Gerti Kappel, Manuel Wimmer, Birgit Pröll, Cristina Cachero Castro, Sven Casteleyn, Olga De Troyer, Piero Fraternali, Irene Garrigos, Franca Garzotto, Athula Ginige, Geert‐Jan Houben, Nora Koch, Nathalie Moreno, Oscar Pastor, Paolo Paolini, Vicente Pelechano Ferragud, Gustavo Rossi, Daniel Schwabe, Massimo Tisi, Antonio Vallecillo, Kees van der Sluijs and Gefei Zhang
Ubiquitous web applications (UWA) are a new type of web applications which are accessed in various contexts, i.e. through different devices, by users with various…
Abstract
Purpose
Ubiquitous web applications (UWA) are a new type of web applications which are accessed in various contexts, i.e. through different devices, by users with various interests, at anytime from anyplace around the globe. For such full‐fledged, complex software systems, a methodologically sound engineering approach in terms of model‐driven engineering (MDE) is crucial. Several modeling approaches have already been proposed that capture the ubiquitous nature of web applications, each of them having different origins, pursuing different goals and providing a pantheon of concepts. This paper aims to give an in‐depth comparison of seven modeling approaches supporting the development of UWAs.
Design/methodology/approach
This methodology is conducted by applying a detailed set of evaluation criteria and by demonstrating its applicability on basis of an exemplary tourism web application. In particular, five commonly found ubiquitous scenarios are investigated, thus providing initial insight into the modeling concepts of each approach as well as to facilitate their comparability.
Findings
The results gained indicate that many modeling approaches lack a proper MDE foundation in terms of meta‐models and tool support. The proposed modeling mechanisms for ubiquity are often limited, since they neither cover all relevant context factors in an explicit, self‐contained, and extensible way, nor allow for a wide spectrum of extensible adaptation operations. The provided modeling concepts frequently do not allow dealing with all different parts of a web application in terms of its content, hypertext, and presentation levels as well as their structural and behavioral features. Finally, current modeling approaches do not reflect the crosscutting nature of ubiquity but rather intermingle context and adaptation issues with the core parts of a web application, thus hampering maintainability and extensibility.
Originality/value
Different from other surveys in the area of modeling web applications, this paper specifically considers modeling concepts for their ubiquitous nature, together with an investigation of available support for MDD in a comprehensive way, using a well‐defined as well as fine‐grained catalogue of more than 30 evaluation criteria.
Details
Keywords
Takahiro Komamizu, Toshiyuki Amagasa and Hiroyuki Kitagawa
The purpose of this paper is to extract appropriate terms to summarize the current results in terms of the contents of textual facets. Faceted search on XML data helps…
Abstract
Purpose
The purpose of this paper is to extract appropriate terms to summarize the current results in terms of the contents of textual facets. Faceted search on XML data helps users find necessary information from XML data by giving attribute–content pairs (called facet-value pair) about the current search results. However, if most of the contents of a facet have longer texts in average (such facets are called textual facets), it is not easy to overview the current results.
Design/methodology/approach
The proposed approach is based upon subsumption relationships of terms among the contents of a facet. The subsumption relationship can be extracted using co-occurrences of terms among a number of documents (in this paper, a content of a facet is considered as a document). Subsumption relationships compose hierarchies, and the authors utilize the hierarchies to extract facet-values from textual facets. In the faceted search context, users have ambiguous search demands, they expect broader terms. Thus, we extract high-level terms in the hierarchies as facet-values.
Findings
The main findings of this paper are the extracted terms improve users’ search experiences, especially in cases when the search demands are ambiguous.
Originality/value
An originality of this paper is the way to utilize the textual contents of XML data for improving users’ search experiences on faceted search. The other originality is how to design the tasks to evaluate exploratory search like faceted search.
Details
Keywords
Nicolas Virtsonis and Sally Harridge‐March
The purpose of this paper is to examine the way in which brand positioning elements are manifested in the business‐to‐business (B2B) online environment.
Abstract
Purpose
The purpose of this paper is to examine the way in which brand positioning elements are manifested in the business‐to‐business (B2B) online environment.
Design/methodology/approach
The UK print industry is used to investigate the web site elements used to communicate positioning elements through the content analysis of corporate web pages of 30 UK print suppliers.
Findings
A framework is developed to show how web site communications are manifested in the online B2B environment.
Research limitations/implications
Because the research vehicle is a sample of websites from only one industry the findings may not be transferable to all industries nor to the whole industry. However, the model is a useful framework for helping managers to plan their online communications.
Practical implications
The paper concludes by giving recommendations about how the framework can be used by practitioners in order to improve the linkage between communications messages and the means for transferring these messages.
Originality/value
This is a novel approach to examining branding elements in the online environment. Comparatively little literature exists which examines branding in the online B2B environment.
Details
Keywords
Giovanni Tummarello, Christian Morbidoni, Paolo Puliti and Francesco Piazza
The purpose of this paper is to investigate and prove the feasibility of a semantic web (SW) based approach to textual encoding. It aims to discuss benefits and novel…
Abstract
Purpose
The purpose of this paper is to investigate and prove the feasibility of a semantic web (SW) based approach to textual encoding. It aims to discuss benefits and novel possibilities with respect to traditional XML‐based approaches.
Design/methodology/approach
The markup process can be seen as a task of knowledge representation where elements such as words, sentences and pages are instances of conceptual classes forming a semantic network. An ontology web language ontology for textual encoding has been developed, capturing structural and grammatical aspects. Different approaches and tools to query the encoded text are investigated.
Findings
resource description framework (RDF) is powerful and expressive enough to fulfil tasks traditionally done in XML as well as to enable new possibilities such as collaborative and distributed textual encoding and the use of ontology‐based reasoning in text processing and querying. While the encoding of overlapping hierarchies through the use of existing approaches is often complex and leads to idiosyncratic solutions, this problem is naturally solved using SW languages.
Research limitations/implications
To make the approach suitable for widespread adoption, further work is required both in ontologies modelling and in applications (e.g. markup editing).
Practical implications
The prototype implementation imports existing encoded texts, transforms them into RDF‐based markups and uses SW query languages to answer cross‐hierarchy queries. Existing tools (reasoners, search and query engines, etc.) can be used immediately.
Originality/value
This methodology enables distributed interoperability and reuse of previous encoded results and opens the way to novel collaborative textual markup scenarios.
Details
Keywords
Mia Høj Mathiasson and Henrik Jochumsen
The purpose of this paper is to report on a new approach for researching public library programs through Facebook events. The term public library programs refers to…
Abstract
Purpose
The purpose of this paper is to report on a new approach for researching public library programs through Facebook events. The term public library programs refers to publicly announced activities and events taking place within or in relation to a public library. In Denmark, programs are an important part of the practices of public libraries and have been growing in both number and variety within recent years.
Design/methodology/approach
The data for the study presented in this paper consists of Facebook events announcing public library programs. In the study of this data, grounded theory is used as a research strategy and methods of web archiving are used for collecting both the textual and the visual content of the Facebook events.
Findings
The combination of Facebook events as data, grounded theory as a research strategy and web archiving as methods for data collection proves to be useful for researching the format and content of public library programs, which have already taken place.
Research limitations/implications
Only a limited number of Facebook events are examined and the context is restricted to one country.
Originality/value
This paper presents a promising approach for researching public library programs through social media content and provides new insights into both methods and data as well as the phenomenon investigated. Thereby, this paper contributes to a conception of an under-developed researched area as well as a new approach for studying it.
Details
Keywords
Christian Bauer and Arno Scharl
Describes an approach automatically to classify and evaluate publicly accessible World Wide Web sites. The suggested methodology is equally valuable for analyzing content…
Abstract
Describes an approach automatically to classify and evaluate publicly accessible World Wide Web sites. The suggested methodology is equally valuable for analyzing content and hypertext structures of commercial, educational and non‐profit organizations. Outlines a research methodology for model building and validation and defines the most relevant attributes of such a process. A set of operational criteria for classifying Web sites is developed. The introduced software tool supports the automated gathering of these parameters, and thereby assures the necessary “critical mass” of empirical data. Based on the preprocessed information, a multi‐methodological approach is chosen that comprises statistical clustering, textual analysis, supervised and non‐supervised neural networks and manual classification for validation purposes.
Details
Keywords
The blogging phenomenon has become a primary mode of mainstream communication for the Web 2.0 era. While previous studies found that campaign web sites did not realise…
Abstract
Purpose
The blogging phenomenon has become a primary mode of mainstream communication for the Web 2.0 era. While previous studies found that campaign web sites did not realise two‐way communication ideals, the current study aims to investigate potential differences in communication patterns between campaign blogs and web sites during Taiwan's 2008 general election, with the aim of exploring whether the blogging phenomenon can improve the process of online political communication.
Design/methodology/approach
The study used a content analysis approach, the web style analysis method, which was designed specifically for analysing web content, and applied it to an online campaign context in a different political culture, using Taiwan's general election as a case study.
Findings
Results indicated that the themes of both campaign blogs and web sites focused on “attacking opponents” rather than focusing on political policies or information on particular issues. However, campaign blogs and web sites significantly differed in all other dimensions, including structural features, functions, interactivity and appeal strategies. Overall, in terms of the online democratic ideal, campaign blogs appeared to allow more democratic, broader, deeper and easier two‐way communication models between candidates and voters or among voters.
Research limitations/implications
The current study focused on candidates' blogs and web sites and did not explore the other vast parts of the online political sphere, particularly independent or citizen‐based blogs, which play significant roles in the decentralised and participant‐networked public spheres.
Originality/value
The study illuminates the role of hyperlinks on campaign blogs. By providing a greater abundance of external links than campaign web sites, campaign blogs allowed more voters, especially younger ones, to share political information in a manner that is quite different from the traditional one‐way communication model. The paper also argues that interactivity measures should be incorporated into the web style analysis method.