Structure‐preserving and query‐biased document summarisation for web searching

F. Canan Pembe (Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey and Department of Computer Engineering, İstanbul Kültür University, Istanbul, Turkey)
Tunga Güngör (Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey)

Online Information Review

ISSN: 1468-4527

Publication date: 7 August 2009

Abstract

Purpose

The purpose of this paper is to develop a new summarisation approach, namely structure‐preserving and query‐biased summarisation, to improve the effectiveness of web searching. During web searching, one aid for users is the document summaries provided in the search results. However, the summaries provided by current search engines have limitations in directing users to relevant documents.

Design/methodology/approach

The proposed system consists of two stages: document structure analysis and summarisation. In the first stage, a rule‐based approach is used to identify the sectional hierarchies of web documents. In the second stage, query‐biased summaries are created, making use of document structure both in the summarisation process and in the output summaries.

Findings

In structural processing, about 70 per cent accuracy in identifying document sectional hierarchies is obtained. The summarisation method is tested on a task‐based evaluation method using English and Turkish document collections. The results show that the proposed method is a significant improvement over both unstructured query‐biased summaries and Google snippets in terms of f‐measure.

Practical implications

The proposed summarisation system can be incorporated into search engines. The structural processing technique also has applications in other information systems, such as browsing, outlining and indexing documents.

Originality/value

In the literature on summarisation, the effects of query‐biased techniques and document structure are considered in only a few works and are researched separately. The research reported here differs from traditional approaches by combining these two aspects in a coherent framework. The work is also the first automatic summarisation study for Turkish targeting web search.

Keywords

Citation

Canan Pembe, F. and Güngör, T. (2009), "Structure‐preserving and query‐biased document summarisation for web searching", Online Information Review, Vol. 33 No. 4, pp. 696-719. https://doi.org/10.1108/14684520910985684

Download as .RIS

Publisher

:

Emerald Group Publishing Limited

Copyright © 2009, Emerald Group Publishing Limited

To read the full version of this content please select one of the options below

You may be able to access this content by logging in via Shibboleth, Open Athens or with your Emerald account.
To rent this content from Deepdyve, please click the button.
If you think you should have access to this content, click the button to contact our support team.