Search results

1 – 10 of over 47000
Article
Publication date: 30 March 2012

José L. Navarro‐Galindo and José Samos

Nowadays, the use of WCMS (web content management systems) is widespread. The conversion of this infrastructure into its semantic equivalent (semantic WCMS) is a critical issue…

Abstract

Purpose

Nowadays, the use of WCMS (web content management systems) is widespread. The conversion of this infrastructure into its semantic equivalent (semantic WCMS) is a critical issue, as this enables the benefits of the semantic web to be extended. The purpose of this paper is to present a FLERSA (Flexible Range Semantic Annotation) for flexible range semantic annotation.

Design/methodology/approach

A FLERSA is presented as a user‐centred annotation tool for Web content expressed in natural language. The tool has been built in order to illustrate how a WCMS called Joomla! can be converted into its semantic equivalent.

Findings

The development of the tool shows that it is possible to build a semantic WCMS through a combination of semantic components and other resources such as ontologies and emergence technologies, including XML, RDF, RDFa and OWL.

Practical implications

The paper provides a starting‐point for further research in which the principles and techniques of the FLERSA tool can be applied to any WCMS.

Originality/value

The tool allows both manual and automatic semantic annotations, as well as providing enhanced search capabilities. For manual annotation, a new flexible range markup technique is used, based on the RDFa standard, to support the evolution of annotated Web documents more effectively than XPointer. For automatic annotation, a hybrid approach based on machine learning techniques (Vector‐Space Model + n‐grams) is used to determine the concepts that the content of a Web document deals with (from an ontology which provides a taxonomy), based on previous annotations that are used as a training corpus.

Article
Publication date: 20 November 2009

Maria Soledad Pera and Yiu‐Kai Ng

The web provides its users with abundant information. Unfortunately, when a web search is performed, both users and search engines must deal with an annoying problem: the presence…

Abstract

Purpose

The web provides its users with abundant information. Unfortunately, when a web search is performed, both users and search engines must deal with an annoying problem: the presence of spam documents that are ranked among legitimate ones. The mixed results downgrade the performance of search engines and frustrate users who are required to filter out useless information. To improve the quality of web searches, the number of spam documents on the web must be reduced, if they cannot be eradicated entirely. This paper aims to present a novel approach for identifying spam web documents, which have mismatched titles and bodies and/or low percentage of hidden content in markup data structure.

Design/methodology/approach

The paper shows that by considering the degree of similarity among the words in the title and body of a web docuemnt D, which is computed by using their word‐correlation factors; using the percentage of hidden context in the markup data structure within D; and/or considering the bigram or trigram phase‐similarity values of D, it is possible to determine whether D is spam with high accuracy

Findings

By considering the content and markup of web documents, this paper develops a spam‐detection tool that is: reliable, since we can accurately detect 84.5 percent of spam/legitimate web documents; and computational inexpensive, since the word‐correlation factors used for content analysis are pre‐computed.

Research limitations/implications

Since the bigram‐correlation values employed in the spam‐detection approach are computed by using the unigram‐correlation factors, it imposes additional computational time during the spam‐detection process and could generate higher number of misclassified spam web documents.

Originality/value

The paper verifies that the spam‐detection approach outperforms existing anti‐spam methods by at least 3 percent in terms of F‐measure.

Details

International Journal of Web Information Systems, vol. 5 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 23 November 2010

Yongzheng Zhang, Evangelos Milios and Nur Zincir‐Heywood

Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel…

Abstract

Purpose

Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel topic‐based framework to address this problem.

Design/methodology/approach

A two‐stage framework is proposed. The first stage identifies the main topics covered in a web site via clustering and the second stage summarizes each topic separately. The proposed system is evaluated by a user study and compared with the single‐topic summarization approach.

Findings

The user study demonstrates that the clustering‐summarization approach statistically significantly outperforms the plain summarization approach in the multi‐topic web site summarization task. Text‐based clustering based on selecting features with high variance over web pages is reliable; outgoing links are useful if a rich set of cross links is available.

Research limitations/implications

More sophisticated clustering methods than those used in this study are worth investigating. The proposed method should be tested on web content that is less structured than organizational web sites, for example blogs.

Practical implications

The proposed summarization framework can be applied to the effective organization of search engine results and faceted or topical browsing of large web sites.

Originality/value

Several key components are integrated for web site summarization for the first time, including feature selection and link analysis, key phrase and key sentence extraction. Insight into the contributions of links and content to topic‐based summarization was gained. A classification approach is used to minimize the number of parameters.

Details

International Journal of Web Information Systems, vol. 6 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 7 June 2013

Marco A. Palomino, Alexandra Vincenti and Richard Owen

Web‐based information retrieval offers the potential to exploit a vast, continuously updated and widely available repository of emerging information to support horizon scanning

Abstract

Purpose

Web‐based information retrieval offers the potential to exploit a vast, continuously updated and widely available repository of emerging information to support horizon scanning and scenario development. However, the ability to continuously retrieve the most relevant documents from a large, dynamic source of information of varying quality, relevance and credibility is a significant challenge. The purpose of this paper is to describe the initial development of an automated web‐based information retrieval system and its application within horizon scanning for risk analysis support.

Design/methodology/approach

Using an area of recent interest for the insurance industry, namely, space weather — the changing environmental conditions in near‐Earth space — and its potential risks to terrestrial and near‐Earth insurable assets, the authors benchmarked the system against current information retrieval practice within the emerging risks group of a leading global insurance company.

Findings

The results highlight the potential of web‐based horizon scanning to support risk analysis, but also the challenges of undertaking this effectively. The authors addressed these challenges by introducing a process that offers a degree of automation — using an API‐based approach — and improvements in retrieval precision — using keyword combinations within automated queries. This appeared to significantly improve the number of highly relevant documents retrieved and presented to risk analysts when benchmarked against current practice in an insurance context.

Originality/value

Despite the emergence and increasing use of web‐based horizon scanning in recent years as a systematic approach for decision support, the current literature lacks research studies where the approach is benchmarked against current practices in private and public sector organisations. This paper therefore makes an original contribution to this field, discussing the way in which web‐based horizon scanning may offer significant added value for the risk analysts, for what may be only a modest additional investment in time.

Details

Foresight, vol. 15 no. 3
Type: Research Article
ISSN: 1463-6689

Keywords

Article
Publication date: 2 February 2015

Michael Calaresu and Ali Shiri

The purpose of this article is to explore and conceptualize the Semantic Web as a term that has been widely mentioned in the literature of library and information science. More…

2116

Abstract

Purpose

The purpose of this article is to explore and conceptualize the Semantic Web as a term that has been widely mentioned in the literature of library and information science. More specifically, its aim is to shed light on the evolution of the Web and to highlight a previously proposed means of attempting to improve automated manipulation of Web-based data in the context of a rapidly expanding base of both users and digital content.

Design/methodology/approach

The conceptual analysis presented in this paper adopts a three-dimensional model for the discussion of Semantic Web. The first dimension focuses on Semantic Web’s basic nature, purpose and history, as well as the current state and limitations of modern search systems and related software agents. The second dimension focuses on critical knowledge structures such as taxonomies, thesauri and ontologies which are understood as fundamental elements in the creation of a Semantic Web architecture. In the third dimension, an alternative conceptual model is proposed, one, which unlike more commonly prevalent Semantic Web models, offers a greater emphasis on describing the proposed structure from an interpretive viewpoint, rather than a technical one. This paper adopts an interpretive, historical and conceptual approach to the notion of the Semantic Web by reviewing the literature and by analyzing the developments associated with the Web over the past three decades. It proposes a simplified conceptual model for easy understanding.

Findings

The paper provides a conceptual model of the Semantic Web that encompasses four key strata, namely, the body of human users, the body of software applications facilitating creation and consumption of documents, the body of documents themselves and a proposed layer that would improve automated manipulation of Web-based data by the software applications.

Research limitations/implications

This paper will facilitate a better conceptual understanding of the Semantic Web, and thereby contribute, in a small way, to the larger body of discourse surrounding it. The conceptual model will provide a reference point for education and research purposes.

Originality/value

This paper provides an original analysis of both conceptual and technical aspects of Semantic Web. The proposed conceptual model provides a new perspective on this subject.

Details

Library Review, vol. 64 no. 1/2
Type: Research Article
ISSN: 0024-2535

Keywords

Article
Publication date: 21 June 2011

Kemal Efe, Alp Asutay and Arun Lakhotia

Access to related information is a key requirement for exploratory search. The purpose of this research is to understand where related information may be found and how it may be…

Abstract

Purpose

Access to related information is a key requirement for exploratory search. The purpose of this research is to understand where related information may be found and how it may be explored by users.

Design/methodology/approach

Earlier research provides sufficient evidence that web graph neighborhoods of returned search results may contain documents related to users' intended search topic. However, in the literature, no interface mechanisms have been presented to enable exploration of these neighborhoods by users. Based on a modified web graph, this paper proposes tools and methods for displaying and exploring the graph neighborhood of any selected item in the search results list. Important issues that arise when implementing such an exploration model are discussed and utility of the proposed system is evaluated with user experiments.

Findings

In user experiments first, information related to intended search topic was often found in the web neighborhood of search results; second, exploring these graph neighborhoods with the proposed tools improved users' ability to reach the information they sought.

Research limitations/implications

The test participants are computer science graduate students. Their skills may not be representative of the broad user population.

Originality/value

The lessons learned from this research point to a potentially fruitful direction for designing new search engine interfaces that support exploratory search.

Details

International Journal of Web Information Systems, vol. 7 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 October 2005

Tim Finin, Li Ding, Lina Zhou and Anupam Joshi

Aims to investigate the way that the semantic web is being used to represent and process social network information.

5605

Abstract

Purpose

Aims to investigate the way that the semantic web is being used to represent and process social network information.

Design/methodology/approach

The Swoogle semantic web search engine was used to construct several large data sets of Resource Description Framework (RDF) documents with social network information that were encoded using the “Friend of a Friend” (FOAF) ontology. The datasets were analyzed to discover how FOAF is being used and investigate the kinds of social networks found on the web.

Findings

The FOAF ontology is the most widely used domain ontology on the semantic web. People are using it in an open and extensible manner by defining new classes and properties to use with FOAF.

Research limitations/implications

RDF data was only obtained from public RDF documents published on the web. Some RDF FOAF data may be unavailable because it is behind firewalls, on intranets or stored in private databases. The ways in which the semantic web languages RDF and OWL are being used (and abused) are dynamic and still evolving. A similar study done two years from now may show very different results.

Originality/value

This paper describes how social networks are being encoded and used on the world wide web in the form of RDF documents and the FOAF ontology. It provides data on large social networks as well as insights on how the semantic web is being used in 2005.

Details

The Learning Organization, vol. 12 no. 5
Type: Research Article
ISSN: 0969-6474

Keywords

Article
Publication date: 26 June 2007

Q.T. Tho, A.C.M. Fong and S.C. Hui

The semantic web gives meaning to information so that humans and computers can work together better. Ontology is used to represent knowledge on the semantic web. Web services have…

1130

Abstract

Purpose

The semantic web gives meaning to information so that humans and computers can work together better. Ontology is used to represent knowledge on the semantic web. Web services have been introduced to make the knowledge conveyed by the ontology on the semantic web accessible across different applications. This paper seeks to present the use of these latest advances in the context of a scholarly semantic web (or SSWeb) system, which can support advanced search functions such as expert finding and trend detection in addition to basic functions such as document and author search as well as document and author clustering search.

Design/methodology/approach

A distributed architecture of the proposed SSWeb is described, as well as semantic web services that support scholarly information retrieval on the SSWeb.

Findings

Initial experimental results indicate that the proposed method is effective.

Research limitations/implications

The work reported is experimental in nature. More work is needed, but early results are encouraging and the authors wish to make their work known to the research community by publishing this paper so that further progress can be made in this area of research.

Originality/value

The work is presented in the context of scholarly document retrieval, but it could also be adapted to other types of documents, such as medical records, machine‐fault records and legal documents. This is because the basic principles are the same.

Details

Online Information Review, vol. 31 no. 3
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 30 January 2009

Bilel Elayeb, Fabrice Evrard, Montaceur Zaghdoud and Mohamed Ben Ahmed

The purpose of this paper is to make a scientific contribution to web information retrieval (IR).

Abstract

Purpose

The purpose of this paper is to make a scientific contribution to web information retrieval (IR).

Design/methodology/approach

A multiagent system for web IR is proposed based on new technologies: Hierarchical Small‐Worlds (HSW) and Possibilistic Networks (PN). This system is based on a possibilistic qualitative approach which extends the quantitative one.

Findings

The paper finds that the relevance of the order of documents changes while passing from a profile to another. Even if the selected terms tend to select the relevant document, these terms are not the most frequent of the document. This criterion shows the asset of the qualitative approach of the SARIPOD system in the selection of relevant documents. The insertion of the factors of preference between query terms in the calculations of the possibility and the necessity consists in increasing the scores of possibilistic relevance of the documents containing these terms with an aim of penalizing the scores of relevance of the documents not containing them. The penalization and the increase in the scores are proportional to the capacity of the terms to discriminate between the documents of the collection.

Research limitations/implications

It is planned to extend the tests of the SARIPOD system to other grammatical categories, like refining the approach for the substantives by considering for example, the verbal occurrences in names definitions, etc. Also, it is planned to carry out finer measurements of the performances of SARIPOD system by extending the tests with other types of web documents.

Practical implications

The system can be useful to help research students find their relevant scientific papers. It must be located in the document server of any research laboratory.

Originality/value

The paper presents SARIPOD, a new qualitative possibilistic model for web IR using multiagent system.

Details

Interactive Technology and Smart Education, vol. 6 no. 1
Type: Research Article
ISSN: 1741-5659

Keywords

Article
Publication date: 1 May 2006

Koraljka Golub

To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning…

2206

Abstract

Purpose

To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such.

Design/methodology/approach

A range of works dealing with automated classification of full‐text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages.

Findings

Provides major similarities and differences between the three approaches: document pre‐processing and utilization of web‐specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized.

Research limitations/implications

The paper does not attempt to provide an exhaustive bibliography of related resources.

Practical implications

As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities.

Originality/value

To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.

Details

Journal of Documentation, vol. 62 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

1 – 10 of over 47000