Search results

1 – 10 of over 1000
Article
Publication date: 1 November 2005

Mohamed Hammami, Youssef Chahir and Liming Chen

Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering undesirable…

Abstract

Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering undesirable web content. In this paper, we investigate this problem through WebGuard, our automatic machine learning based pornographic website classification and filtering system. Facing the Internet more and more visual and multimedia as exemplified by pornographic websites, we focus here our attention on the use of skin color related visual content based analysis along with textual and structural content based analysis for improving pornographic website filtering. While the most commercial filtering products on the marketplace are mainly based on textual content‐based analysis such as indicative keywords detection or manually collected black list checking, the originality of our work resides on the addition of structural and visual content‐based analysis to the classical textual content‐based analysis along with several major‐data mining techniques for learning and classifying. Experimented on a testbed of 400 websites including 200 adult sites and 200 non pornographic ones, WebGuard, our Web filtering engine scored a 96.1% classification accuracy rate when only textual and structural content based analysis are used, and 97.4% classification accuracy rate when skin color related visual content based analysis is driven in addition. Further experiments on a black list of 12 311 adult websites manually collected and classified by the French Ministry of Education showed that WebGuard scored 87.82% classification accuracy rate when using only textual and structural content‐based analysis, and 95.62% classification accuracy rate when the visual content‐based analysis is driven in addition. The basic framework of WebGuard can apply to other categorization problems of websites which combine, as most of them do today, textual and visual content.

Details

International Journal of Web Information Systems, vol. 1 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 21 November 2008

Mohamed Hammami, Radhouane Guermazi and Abdelmajid Ben Hamadou

The growth of the web and the increasing number of documents electronically available has been paralleled by the emergence of harmful web pages content such as pornography…

Abstract

Purpose

The growth of the web and the increasing number of documents electronically available has been paralleled by the emergence of harmful web pages content such as pornography, violence, racism, etc. This emergence involved the necessity of providing filtering systems designed to secure the internet access. Most of them process mainly the adult content and focus on blocking pornography, marginalizing violence. The purpose of this paper is to propose a violent web content detection and filtering system, which uses textual and structural content‐based analysis.

Design/methodology/approach

The violent web content detection and filtering system uses textual and structural content‐based analysis based on a violent keyword dictionary. The paper focuses on the keyword dictionary preparation, and presents a comparative study of different data mining techniques to block violent content web pages.

Findings

The solution presented in this paper showed its effectiveness by scoring a 89 per cent classification accuracy rate on its test data set.

Research limitations/implications

Many future work directions can be considered. This paper analyzed only the web page, and an additional analysis of the visual content can be one of the directions of future work. Future research is underway to develop effective filtering tools for other types of harmful web pages, such as racist, etc.

Originality/value

The paper's major contributions are first, the study and comparison of several decision tree building algorithms to build a violent web classifier based on a textual and structural content‐based analysis for improving web filtering. Second, showing laborious dictionary building by finding automatically discriminative indicative keywords.

Details

International Journal of Web Information Systems, vol. 4 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 March 2003

P.Y. Lee, S.C. Hui and A.C.M. Fong

With the proliferation of objectionable materials (e.g. pornography, violence, drugs, etc.) available on the WWW, there is an urgent need for effective countermeasures to protect…

1688

Abstract

With the proliferation of objectionable materials (e.g. pornography, violence, drugs, etc.) available on the WWW, there is an urgent need for effective countermeasures to protect children and other unsuspecting users from exposure to such materials. Using pornographic Web pages as a case study, this paper presents a thorough analysis of the distinguishing features of such Web pages. The objective of the study is to gain knowledge on the structure and characteristics of typical pornographic Web pages so that effective Web filtering techniques can be developed to filter them automatically. In this paper, we first survey the existing techniques for Web content filtering. A study on the characteristics of pornographic Web pages is then presented. The implementation of a Web content filtering system that combines the use of an artificial neural network and the knowledge gained in the analysis of pornographic Web pages is also given.

Details

Internet Research, vol. 13 no. 1
Type: Research Article
ISSN: 1066-2243

Keywords

Abstract

Purpose

Ubiquitous web applications (UWA) are a new type of web applications which are accessed in various contexts, i.e. through different devices, by users with various interests, at anytime from anyplace around the globe. For such full‐fledged, complex software systems, a methodologically sound engineering approach in terms of model‐driven engineering (MDE) is crucial. Several modeling approaches have already been proposed that capture the ubiquitous nature of web applications, each of them having different origins, pursuing different goals and providing a pantheon of concepts. This paper aims to give an in‐depth comparison of seven modeling approaches supporting the development of UWAs.

Design/methodology/approach

This methodology is conducted by applying a detailed set of evaluation criteria and by demonstrating its applicability on basis of an exemplary tourism web application. In particular, five commonly found ubiquitous scenarios are investigated, thus providing initial insight into the modeling concepts of each approach as well as to facilitate their comparability.

Findings

The results gained indicate that many modeling approaches lack a proper MDE foundation in terms of meta‐models and tool support. The proposed modeling mechanisms for ubiquity are often limited, since they neither cover all relevant context factors in an explicit, self‐contained, and extensible way, nor allow for a wide spectrum of extensible adaptation operations. The provided modeling concepts frequently do not allow dealing with all different parts of a web application in terms of its content, hypertext, and presentation levels as well as their structural and behavioral features. Finally, current modeling approaches do not reflect the crosscutting nature of ubiquity but rather intermingle context and adaptation issues with the core parts of a web application, thus hampering maintainability and extensibility.

Originality/value

Different from other surveys in the area of modeling web applications, this paper specifically considers modeling concepts for their ubiquitous nature, together with an investigation of available support for MDD in a comprehensive way, using a well‐defined as well as fine‐grained catalogue of more than 30 evaluation criteria.

Details

International Journal of Web Information Systems, vol. 4 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 6 January 2022

Hanan Alghamdi and Ali Selamat

With the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites…

Abstract

Purpose

With the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites. Accordingly, the volume of previous research focused on identifying the techniques and activities of terrorist/extremist groups, as revealed by their sites on the so-called dark web, has also grown.

Design/methodology/approach

This study presents a review of the techniques used to detect and process the content of terrorist/extremist sites on the dark web. Forty of the most relevant data sources were examined, and various techniques were identified among them.

Findings

Based on this review, it was found that methods of feature selection and feature extraction can be used as topic modeling with content analysis and text clustering.

Originality/value

At the end of the review, present the current state-of-the- art and certain open issues associated with Arabic dark Web content analysis.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 12 October 2021

Didem Ölçer and Tuğba Taşkaya Temizel

This paper proposes a framework that automatically assesses content coverage and information quality of health websites for end-users.

Abstract

Purpose

This paper proposes a framework that automatically assesses content coverage and information quality of health websites for end-users.

Design/methodology/approach

The study investigates the impact of textual and content-based features in predicting the quality of health-related texts. Content-based features were acquired using an evidence-based practice guideline in diabetes. A set of textual features inspired by professional health literacy guidelines and the features commonly used for assessing information quality in other domains were also used. In this study, 60 websites about type 2 diabetes were methodically selected for inclusion. Two general practitioners used DISCERN to assess each website in terms of its content coverage and quality.

Findings

The proposed framework outputs were compared with the experts' evaluation scores. The best accuracy was obtained as 88 and 92% with textual features and content-based features for coverage assessment respectively. When both types of features were used, the proposed framework achieved 90% accuracy. For information quality assessment, the content-based features resulted in a higher accuracy of 92% against 88% obtained using the textual features.

Research limitations/implications

The experiments were conducted for websites about type 2 diabetes. As the whole process is costly and requires extensive expert human labelling, the study was carried out in a single domain. However, the methodology is generalizable to other health domains for which evidence-based practice guidelines are available.

Practical implications

Finding high-quality online health information is becoming increasingly difficult due to the high volume of information generated by non-experts in the area. The search engines fail to rank objective health websites higher within the search results. The proposed framework can aid search engine and information platform developers to implement better retrieval techniques, in turn, facilitating end-users' access to high-quality health information.

Social implications

Erroneous, biased or partial health information is a serious problem for end-users who need access to objective information on their health problems. Such information may cause patients to stop their treatments provided by professionals. It might also have adverse financial implications by causing unnecessary expenditures on ineffective treatments. The ability to access high-quality health information has a positive effect on the health of both individuals and the whole society.

Originality/value

The paper demonstrates that automatic assessment of health websites is a domain-specific problem, which cannot be addressed with the general information quality assessment methodologies in the literature. Content coverage of health websites has also been studied in the health domain for the first time in the literature.

Details

Online Information Review, vol. 46 no. 4
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 17 August 2015

Takahiro Komamizu, Toshiyuki Amagasa and Hiroyuki Kitagawa

The purpose of this paper is to extract appropriate terms to summarize the current results in terms of the contents of textual facets. Faceted search on XML data helps users find…

Abstract

Purpose

The purpose of this paper is to extract appropriate terms to summarize the current results in terms of the contents of textual facets. Faceted search on XML data helps users find necessary information from XML data by giving attribute–content pairs (called facet-value pair) about the current search results. However, if most of the contents of a facet have longer texts in average (such facets are called textual facets), it is not easy to overview the current results.

Design/methodology/approach

The proposed approach is based upon subsumption relationships of terms among the contents of a facet. The subsumption relationship can be extracted using co-occurrences of terms among a number of documents (in this paper, a content of a facet is considered as a document). Subsumption relationships compose hierarchies, and the authors utilize the hierarchies to extract facet-values from textual facets. In the faceted search context, users have ambiguous search demands, they expect broader terms. Thus, we extract high-level terms in the hierarchies as facet-values.

Findings

The main findings of this paper are the extracted terms improve users’ search experiences, especially in cases when the search demands are ambiguous.

Originality/value

An originality of this paper is the way to utilize the textual contents of XML data for improving users’ search experiences on faceted search. The other originality is how to design the tasks to evaluate exploratory search like faceted search.

Details

International Journal of Web Information Systems, vol. 11 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 24 October 2008

Nicolas Virtsonis and Sally Harridge‐March

The purpose of this paper is to examine the way in which brand positioning elements are manifested in the business‐to‐business (B2B) online environment.

3589

Abstract

Purpose

The purpose of this paper is to examine the way in which brand positioning elements are manifested in the business‐to‐business (B2B) online environment.

Design/methodology/approach

The UK print industry is used to investigate the web site elements used to communicate positioning elements through the content analysis of corporate web pages of 30 UK print suppliers.

Findings

A framework is developed to show how web site communications are manifested in the online B2B environment.

Research limitations/implications

Because the research vehicle is a sample of websites from only one industry the findings may not be transferable to all industries nor to the whole industry. However, the model is a useful framework for helping managers to plan their online communications.

Practical implications

The paper concludes by giving recommendations about how the framework can be used by practitioners in order to improve the linkage between communications messages and the means for transferring these messages.

Originality/value

This is a novel approach to examining branding elements in the online environment. Comparatively little literature exists which examines branding in the online B2B environment.

Details

Marketing Intelligence & Planning, vol. 26 no. 7
Type: Research Article
ISSN: 0263-4503

Keywords

Article
Publication date: 8 August 2008

Giovanni Tummarello, Christian Morbidoni, Paolo Puliti and Francesco Piazza

The purpose of this paper is to investigate and prove the feasibility of a semantic web (SW) based approach to textual encoding. It aims to discuss benefits and novel…

Abstract

Purpose

The purpose of this paper is to investigate and prove the feasibility of a semantic web (SW) based approach to textual encoding. It aims to discuss benefits and novel possibilities with respect to traditional XML‐based approaches.

Design/methodology/approach

The markup process can be seen as a task of knowledge representation where elements such as words, sentences and pages are instances of conceptual classes forming a semantic network. An ontology web language ontology for textual encoding has been developed, capturing structural and grammatical aspects. Different approaches and tools to query the encoded text are investigated.

Findings

resource description framework (RDF) is powerful and expressive enough to fulfil tasks traditionally done in XML as well as to enable new possibilities such as collaborative and distributed textual encoding and the use of ontology‐based reasoning in text processing and querying. While the encoding of overlapping hierarchies through the use of existing approaches is often complex and leads to idiosyncratic solutions, this problem is naturally solved using SW languages.

Research limitations/implications

To make the approach suitable for widespread adoption, further work is required both in ontologies modelling and in applications (e.g. markup editing).

Practical implications

The prototype implementation imports existing encoded texts, transforms them into RDF‐based markups and uses SW query languages to answer cross‐hierarchy queries. Existing tools (reasoners, search and query engines, etc.) can be used immediately.

Originality/value

This methodology enables distributed interoperability and reuse of previous encoded results and opens the way to novel collaborative textual markup scenarios.

Details

Online Information Review, vol. 32 no. 4
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 21 June 2019

Mia Høj Mathiasson and Henrik Jochumsen

The purpose of this paper is to report on a new approach for researching public library programs through Facebook events. The term public library programs refers to publicly…

1135

Abstract

Purpose

The purpose of this paper is to report on a new approach for researching public library programs through Facebook events. The term public library programs refers to publicly announced activities and events taking place within or in relation to a public library. In Denmark, programs are an important part of the practices of public libraries and have been growing in both number and variety within recent years.

Design/methodology/approach

The data for the study presented in this paper consists of Facebook events announcing public library programs. In the study of this data, grounded theory is used as a research strategy and methods of web archiving are used for collecting both the textual and the visual content of the Facebook events.

Findings

The combination of Facebook events as data, grounded theory as a research strategy and web archiving as methods for data collection proves to be useful for researching the format and content of public library programs, which have already taken place.

Research limitations/implications

Only a limited number of Facebook events are examined and the context is restricted to one country.

Originality/value

This paper presents a promising approach for researching public library programs through social media content and provides new insights into both methods and data as well as the phenomenon investigated. Thereby, this paper contributes to a conception of an under-developed researched area as well as a new approach for studying it.

Details

Journal of Documentation, vol. 75 no. 4
Type: Research Article
ISSN: 0022-0418

Keywords

1 – 10 of over 1000