Search results

1 – 10 of 347
Article
Publication date: 7 August 2009

F. Canan Pembe and Tunga Güngör

The purpose of this paper is to develop a new summarisation approach, namely structure‐preserving and query‐biased summarisation, to improve the effectiveness of web searching…

Abstract

Purpose

The purpose of this paper is to develop a new summarisation approach, namely structure‐preserving and query‐biased summarisation, to improve the effectiveness of web searching. During web searching, one aid for users is the document summaries provided in the search results. However, the summaries provided by current search engines have limitations in directing users to relevant documents.

Design/methodology/approach

The proposed system consists of two stages: document structure analysis and summarisation. In the first stage, a rule‐based approach is used to identify the sectional hierarchies of web documents. In the second stage, query‐biased summaries are created, making use of document structure both in the summarisation process and in the output summaries.

Findings

In structural processing, about 70 per cent accuracy in identifying document sectional hierarchies is obtained. The summarisation method is tested on a task‐based evaluation method using English and Turkish document collections. The results show that the proposed method is a significant improvement over both unstructured query‐biased summaries and Google snippets in terms of f‐measure.

Practical implications

The proposed summarisation system can be incorporated into search engines. The structural processing technique also has applications in other information systems, such as browsing, outlining and indexing documents.

Originality/value

In the literature on summarisation, the effects of query‐biased techniques and document structure are considered in only a few works and are researched separately. The research reported here differs from traditional approaches by combining these two aspects in a coherent framework. The work is also the first automatic summarisation study for Turkish targeting web search.

Details

Online Information Review, vol. 33 no. 4
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 20 June 2016

Najd Al-Mouh and Hend S. Al-Khalifa

Millions of visually impaired people (VIP) in the world still face difficulties browsing the Web and accessing information. This paper aims to present a proxy service that takes…

Abstract

Purpose

Millions of visually impaired people (VIP) in the world still face difficulties browsing the Web and accessing information. This paper aims to present a proxy service that takes advantage of the concept of context-aware to help contextualizing web pages for visually impaired users.

Design/methodology/approach

The VIP-aware proxy combines five components to utilize the user preferences, adapts the requested web page and reorganizes its content to best match the preferences set by the user. This new scenario will assist VIP in browsing the Web more effectively.

Findings

A preliminary evaluation of the system resulted in general user satisfaction.

Practical implications

The VIP-aware proxy will provide users with a clean, accessible web page, save them time when screen readers examine content related to their preferences and save them money when unnecessary content is not downloaded.

Originality/value

The VIP-aware proxy presented in this paper is the first of its kind targeting VIP.

Details

International Journal of Web Information Systems, vol. 12 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 September 2005

Andrew Blyth and Paula Thomas

One of the problems facing systems administrators and security auditors is that a security test/audit can generate a vast quantity of information that needs to be stored, analysed…

Abstract

Purpose

One of the problems facing systems administrators and security auditors is that a security test/audit can generate a vast quantity of information that needs to be stored, analysed and cross referenced for later use. The current state‐of‐the‐art in security audit tools does not allow for information from multiple different tools to be shared and integrated. This paper aims to develop an Extensible Markup Language (XML)‐based architecture that is capable of encoding information from a variety of disparate heterogeneous sources and then unifying and integrating them into a single SQL database schema.

Design/methodology/approach

The paper demonstrates how, through the application of the architecture, large quantities of security related information can be captured within a single database schema. This database can then be used to ensure that systems are conforming to an organisation's network security policy.

Findings

This type of data integration and data unification within a vulnerability assessment/security audit is currently not possible; this leads to confusion and omissions in the security audit process.

Originality/value

This paper develops a data integration and unification architecture that will allow data from multiple vulnerability assessment tools to be integrated into a single unified picture of the security state of a network of interconnected computer systems.

Details

Information Management & Computer Security, vol. 13 no. 4
Type: Research Article
ISSN: 0968-5227

Keywords

Open Access
Article
Publication date: 14 August 2017

Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Anne H.H. Ngu and Yihong Zhang

This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.

2049

Abstract

Purpose

This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.

Design/methodology/approach

In particular, this study extracts new predicates from four types of data sources, namely, Web texts, Document Object Model (DOM) trees, existing KBs and query stream to augment the ontology of the existing KB (i.e. Freebase). In addition, a graph-based approach to conduct better truth discovery for multi-valued predicates is also proposed.

Findings

Empirical studies demonstrate the effectiveness of the approaches presented in this study and the potential of GrandBase. The future research directions regarding GrandBase construction and extension has also been discussed.

Originality/value

To revolutionize our modern society by using the wisdom of Big Data, considerable KBs have been constructed to feed the massive knowledge-driven applications with Resource Description Framework triples. The important challenges for KB construction include extracting information from large-scale, possibly conflicting and different-structured data sources (i.e. the knowledge extraction problem) and reconciling the conflicts that reside in the sources (i.e. the truth discovery problem). Tremendous research efforts have been contributed on both problems. However, the existing KBs are far from being comprehensive and accurate: first, existing knowledge extraction systems retrieve data from limited types of Web sources; second, existing truth discovery approaches commonly assume each predicate has only one true value. In this paper, the focus is on the problem of generating actionable knowledge from Big Data. A system is proposed, which consists of two phases, namely, knowledge extraction and truth discovery, to construct a broader KB, called GrandBase.

Details

PSU Research Review, vol. 1 no. 2
Type: Research Article
ISSN: 2399-1747

Keywords

Article
Publication date: 29 August 2008

Wilma Penzo

The semantic and structural heterogeneity of large Extensible Markup Language (XML) digital libraries emphasizes the need of supporting approximate queries, i.e. queries where the…

Abstract

Purpose

The semantic and structural heterogeneity of large Extensible Markup Language (XML) digital libraries emphasizes the need of supporting approximate queries, i.e. queries where the matching conditions are relaxed so as to retrieve results that possibly partially satisfy the user's requests. The paper aims to propose a flexible query answering framework which efficiently supports complex approximate queries on XML data.

Design/methodology/approach

To reduce the number of relaxations applicable to a query, the paper relies on the specification of user preferences about the types of approximations allowed. A specifically devised index structure which efficiently supports both semantic and structural approximations, according to the specified user preferences, is proposed. Also, a ranking model to quantify approximations in the results is presented.

Findings

Personalized queries, on one hand, effectively narrow the space of query reformulations, on the other hand, enhance the user query capabilities with a great deal of flexibility and control over requests. As to the quality of results, the retrieval process considerably benefits because of the presence of user preferences in the queries. Experiments demonstrate the effectiveness and the efficiency of the proposal, as well as its scalability.

Research limitations/implications

Future developments concern the evaluation of the effectiveness of personalization on queries through additional examinations of the effects of the variability of parameters expressing user preferences.

Originality/value

The paper is intended for the research community and proposes a novel query model which incorporates user preferences about query relaxations on large heterogeneous XML data collections.

Details

International Journal of Web Information Systems, vol. 4 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 14 August 2007

Yanbo Ru and Ellis Horowitz

Most e‐commerce web sites use HTML forms for user authentication, new user registration, newsletter subscription, and searching for products and services. The purpose of this…

Abstract

Purpose

Most e‐commerce web sites use HTML forms for user authentication, new user registration, newsletter subscription, and searching for products and services. The purpose of this paper is to present a method for automated classification of HTML forms, which is important for search engine applications, e.g. Yahoo Shopping and Google's Froogle, as they can be used to improve the quality of the index and accuracy of search results.

Design/methodology/approach

Describes a technique for classifying HTML forms based on their features. Develops algorithms for automatic feature generation of HTML forms and a neural network to classify them.

Findings

The authors tested their classifier on an e‐commerce data set and a randomly retrieved data set and achieved accuracy of 94.7 and 93.9 per cent respectively. Experimental results show that the classifier is effective and efficient on both test beds, suggesting that it is a promising general purpose method.

Originality/value

The paper is of value to those involved with information management and e‐commerce.

Details

Online Information Review, vol. 31 no. 4
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 1 March 2005

Naicheng Chang

To help to clarify the role of XML tools and standards in supporting transition and migration towards a fully XML‐based environment for managing access to information.

1142

Abstract

Purpose

To help to clarify the role of XML tools and standards in supporting transition and migration towards a fully XML‐based environment for managing access to information.

Design/methodology/approach

The Ching Digital Image Library, built on a three‐tier architecture, is used as a source of examples to illustrate a number of methods of data manipulation for presentation processing. An SQL relational database is implemented in the data tier and Microsoft Internet Information Server (IIS) is used to manage processes and sessions in the middle tier. Extensible Markup Language (XML) is used in the data tier to represent offers and in the presentation tiers to represent screen displays that can be manipulated using the XML Document Object Model (DOM), XML Data Islands, and XSL (eXtensible Stylesheet Language), before being delivered to the web browser as HTML.

Findings

It is demonstrated that, although XML itself is not a database, the XML family provides many, though not all, of the components found in databases. XML coupled with a database gives greater power than the sum of the parts in a web application.

Originality/value

This paper is a digital image library case study with practical generic tutorial elements about the role and function of XML in modern database‐backed web sites.

Details

Program, vol. 39 no. 1
Type: Research Article
ISSN: 0033-0337

Keywords

Article
Publication date: 1 December 2001

Ron Gilmour

The self‐describing nature of data marked up using extensible markup language (XML) allows the XML document itself to act in a manner similar to a database, but without the large…

Abstract

The self‐describing nature of data marked up using extensible markup language (XML) allows the XML document itself to act in a manner similar to a database, but without the large file sizes and proprietary software generally associated with database applications. XML data can be made directly available to users using a variety of methods. This paper explores methods for both server‐side and client‐side processing and display of XML‐encoded data, using an annotated bibliography as an example.

Details

Library Hi Tech, vol. 19 no. 4
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 1 April 2003

Jason Underwood and Alastair Watson

A three year Esprit project – ProCure – is ultimately aiming to take a significant but achievable step forward in the application of available information and communication…

Abstract

A three year Esprit project – ProCure – is ultimately aiming to take a significant but achievable step forward in the application of available information and communication technology (ICT) to the large scale engineering (LSE) construction industry. The ProCure consortium consists of five industrial partners supported by four associated research and expert partners. The project combines leading expertise from three member states to support ICT deployment by three industrial collaborative groups, i.e. UK, Germany and Finland. The basis of the project is in the partners’ belief that sufficient ICT is now available to achieve deployment, with care, in real projects, with an acceptable risk of failure. This paper presents work undertaken within the project to investigate the various metadata standards that exist in order to define a minimum metadata set based on these standards for the implementation of two demonstrators for XML based automated document exchange between a simulation of a corporate document management system and a simulation of a collaborative construction project Web site.

Details

Engineering, Construction and Architectural Management, vol. 10 no. 2
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 1 May 2006

Srinivas Vadrevu, Fatih Gelgi, Saravanakumar Nagarajan and Hasan Davulcu

The purpose of this research is to automatically separate and extract meta‐data and instance information from various link pages in the web, by utilizing presentation and linkage…

Abstract

Purpose

The purpose of this research is to automatically separate and extract meta‐data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.

Design/methodology/approach

Research objectives have been achieved through an information extraction system called semantic partitioner that automatically organizes the content in each web page into a hierarchical structure, and an algorithm that interprets and translates these hierarchical structures into logical statements by distinguishing and representing the meta‐data and their individual data instances.

Findings

Experimental results for the university domain with 12 computer science department web sites, comprising 361 individual faculty and course home pages indicate that the performance of the meta‐data and instance extraction averages 85, 88 percent F‐measure, respectively. Our METEOR system achieves this performance without any domain specific engineering requirement.

Originality/value

Important contributions of the METEOR system presented in this paper are: it performs extraction without the assumption that the object instance pages are template‐driven; it is domain independent and does not require any previously engineered domain ontology; and by interpreting the link pages, it can extract both meta‐data, such as concept and attribute names and their relationships, as well as their instances with high accuracy.

Details

Online Information Review, vol. 30 no. 3
Type: Research Article
ISSN: 1468-4527

Keywords

1 – 10 of 347