Search results

1 – 10 of 970
Article
Publication date: 23 November 2010

Yongzheng Zhang, Evangelos Milios and Nur Zincir‐Heywood

Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel…

Abstract

Purpose

Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel topic‐based framework to address this problem.

Design/methodology/approach

A two‐stage framework is proposed. The first stage identifies the main topics covered in a web site via clustering and the second stage summarizes each topic separately. The proposed system is evaluated by a user study and compared with the single‐topic summarization approach.

Findings

The user study demonstrates that the clustering‐summarization approach statistically significantly outperforms the plain summarization approach in the multi‐topic web site summarization task. Text‐based clustering based on selecting features with high variance over web pages is reliable; outgoing links are useful if a rich set of cross links is available.

Research limitations/implications

More sophisticated clustering methods than those used in this study are worth investigating. The proposed method should be tested on web content that is less structured than organizational web sites, for example blogs.

Practical implications

The proposed summarization framework can be applied to the effective organization of search engine results and faceted or topical browsing of large web sites.

Originality/value

Several key components are integrated for web site summarization for the first time, including feature selection and link analysis, key phrase and key sentence extraction. Insight into the contributions of links and content to topic‐based summarization was gained. A classification approach is used to minimize the number of parameters.

Details

International Journal of Web Information Systems, vol. 6 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 4 September 2019

Yi-Hung Liu, Xiaolong Song and Sheng-Fong Chen

Whether automatically generated summaries of health social media can aid users in managing their diseases appropriately is an important question. The purpose of this paper is to…

Abstract

Purpose

Whether automatically generated summaries of health social media can aid users in managing their diseases appropriately is an important question. The purpose of this paper is to introduce a novel text summarization approach for acquiring the most informative summaries from online patient posts accurately and effectively.

Design/methodology/approach

The data set regarding diabetes and HIV posts was, respectively, collected from two online disease forums. The proposed summarizer is based on the graph-based method to generate summaries by considering social network features, text sentiment and sentence features. Representative health-related summaries were identified and summarization performance as well as user judgments were analyzed.

Findings

The findings show that awarding sentences without using all the incorporating features decreases summarization performance compared with the classic summarization method and comparison approaches. The proposed summarizer significantly outperformed the comparison baseline.

Originality/value

This study contributes to the literature on health knowledge management by analyzing patients’ experiences and opinions through the health summarization model. The research additionally develops a new mindset to design abstractive summarization weighting schemes from the health user-generated content.

Details

Aslib Journal of Information Management, vol. 71 no. 6
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 15 August 2023

Yi-Hung Liu and Sheng-Fong Chen

Whether automatically generated summaries of health social media can assist users in appropriately managing their diseases and ensuring better communication with health…

Abstract

Purpose

Whether automatically generated summaries of health social media can assist users in appropriately managing their diseases and ensuring better communication with health professionals becomes an important issue. This paper aims to develop a novel deep learning-based summarization approach for obtaining the most informative summaries from online patient reviews accurately and effectively.

Design/methodology/approach

This paper proposes a framework to generate summaries that integrates a domain-specific pre-trained embedding model and a deep neural extractive summary approach by considering content features, text sentiment, review influence and readability features. Representative health-related summaries were identified, and user judgements were analysed.

Findings

Experimental results on the three real-world health forum data sets indicate that awarding sentences without incorporating all the adopted features leads to declining summarization performance. The proposed summarizer significantly outperformed the comparison baseline. User judgement through the questionnaire provides realistic and concrete evidence of crucial features that remarkably influence patient forum review summaries.

Originality/value

This study contributes to health analytics and management literature by exploring users’ expressions and opinions through the health deep learning summarization model. The research also developed an innovative mindset to design summarization weighting methods from user-created content on health topics.

Details

The Electronic Library , vol. 41 no. 5
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 10 June 2014

Pedro Hípola, José A. Senso, Amed Leiva-Mederos and Sandor Domínguez-Velasco

The purpose of this paper is to look into the latest advances in ontology-based text summarization systems, with emphasis on the methodologies of a socio-cognitive approach, the…

Abstract

Purpose

The purpose of this paper is to look into the latest advances in ontology-based text summarization systems, with emphasis on the methodologies of a socio-cognitive approach, the structural discourse models and the ontology-based text summarization systems.

Design/methodology/approach

The paper analyzes the main literature in this field and presents the structure and features of Texminer, a software that facilitates summarization of texts on Port and Coastal Engineering. Texminer entails a combination of several techniques, including: socio-cognitive user models, Natural Language Processing, disambiguation and ontologies. After processing a corpus, the system was evaluated using as a reference various clustering evaluation experiments conducted by Arco (2008) and Hennig et al. (2008). The results were checked with a support vector machine, Rouge metrics, the F-measure and calculation of precision and recall.

Findings

The experiment illustrates the superiority of abstracts obtained through the assistance of ontology-based techniques.

Originality/value

The authors were able to corroborate that the summaries obtained using Texminer are more efficient than those derived through other systems whose summarization models do not use ontologies to summarize texts. Thanks to ontologies, main sentences can be selected with a broad rhetorical structure, especially for a specific knowledge domain.

Details

Library Hi Tech, vol. 32 no. 2
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 21 November 2018

Ahmed Amir Tazibt and Farida Aoughlis

During crises such as accidents or disasters, an enormous volume of information is generated on the Web. Both people and decision-makers often need to identify relevant and timely…

Abstract

Purpose

During crises such as accidents or disasters, an enormous volume of information is generated on the Web. Both people and decision-makers often need to identify relevant and timely content that can help in understanding what happens and take right decisions, as soon it appears online. However, relevant content can be disseminated in document streams. The available information can also contain redundant content published by different sources. Therefore, the need of automatic construction of summaries that aggregate important, non-redundant and non-outdated pieces of information is becoming critical.

Design/methodology/approach

The aim of this paper is to present a new temporal summarization approach based on a popular topic model in the information retrieval field, the Latent Dirichlet Allocation. The approach consists of filtering documents over streams, extracting relevant parts of information and then using topic modeling to reveal their underlying aspects to extract the most relevant and novel pieces of information to be added to the summary.

Findings

The performance evaluation of the proposed temporal summarization approach based on Latent Dirichlet Allocation, performed on the TREC Temporal Summarization 2014 framework, clearly demonstrates its effectiveness to provide short and precise summaries of events.

Originality/value

Unlike most of the state of the art approaches, the proposed method determines the importance of the pieces of information to be added to the summaries solely relying on their representation in the topic space provided by Latent Dirichlet Allocation, without the use of any external source of evidence.

Details

International Journal of Web Information Systems, vol. 15 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 14 June 2021

Farnoush Bayatmakou, Azadeh Mohebi and Abbas Ahmadi

Query-based summarization approaches might not be able to provide summaries compatible with the user’s information need, as they mostly rely on a limited source of information…

Abstract

Purpose

Query-based summarization approaches might not be able to provide summaries compatible with the user’s information need, as they mostly rely on a limited source of information, usually represented as a single query by the user. This issue becomes even more challenging when dealing with scientific documents, as they contain more specific subject-related terms, while the user may not be able to express his/her specific information need in a query with limited terms. This study aims to propose an interactive multi-document text summarization approach that generates an eligible summary that is more compatible with the user’s information need. This approach allows the user to interactively specify the composition of a multi-document summary.

Design/methodology/approach

This approach exploits the user’s opinion in two stages. The initial query is refined by user-selected keywords/keyphrases and complete sentences extracted from the set of retrieved documents. It is followed by a novel method for sentence expansion using the genetic algorithm, and ranking the final set of sentences using the maximal marginal relevance method. Basically, for implementation, the Web of Science data set in the artificial intelligence (AI) category is considered.

Findings

The proposed approach receives feedback from the user in terms of favorable keywords and sentences. The feedback eventually improves the summary as the end. To assess the performance of the proposed system, this paper has asked 45 users who were graduate students in the field of AI to fill out a questionnaire. The quality of the final summary has been also evaluated from the user’s perspective and information redundancy. It has been investigated that the proposed approach leads to higher degrees of user satisfaction compared to the ones with no or only one step of the interaction.

Originality/value

The interactive summarization approach goes beyond the initial user’s query, while it includes the user’s preferred keywords/keyphrases and sentences through a systematic interaction. With respect to these interactions, the system gives the user a more clear idea of the information he/she is looking for and consequently adjusting the final result to the ultimate information need. Such interaction allows the summarization system to achieve a comprehensive understanding of the user’s information needs while expanding context-based knowledge and guiding the user toward his/her information journey.

Details

Information Discovery and Delivery, vol. 50 no. 2
Type: Research Article
ISSN: 2398-6247

Keywords

Article
Publication date: 1 October 2000

Marie‐Francine Moens and Jos Dumortier

Browsing a database of article abstracts is one way to select and buy relevant magazine articles online. Our research contributes to the design and development of text grammars…

Abstract

Browsing a database of article abstracts is one way to select and buy relevant magazine articles online. Our research contributes to the design and development of text grammars for abstracting texts in unlimited subject domains. We developed a system that parses texts based on the text grammar of a specific text type and that extracts sentences and statements which are relevant for inclusion in the abstracts. The system employs knowledge of the discourse patterns that are typical of news stories. The results are encouraging and demonstrate the importance of discourse structures in text summarisation.

Details

Journal of Documentation, vol. 56 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 25 October 2022

Victor Diogho Heuer de Carvalho and Ana Paula Cabral Seixas Costa

This article presents two Brazilian Portuguese corpora collected from different media concerning public security issues in a specific location. The primary motivation is…

Abstract

Purpose

This article presents two Brazilian Portuguese corpora collected from different media concerning public security issues in a specific location. The primary motivation is supporting analyses, so security authorities can make appropriate decisions about their actions.

Design/methodology/approach

The corpora were obtained through web scraping from a newspaper's website and tweets from a Brazilian metropolitan region. Natural language processing was applied considering: text cleaning, lemmatization, summarization, part-of-speech and dependencies parsing, named entities recognition, and topic modeling.

Findings

Several results were obtained based on the methodology used, highlighting some: an example of a summarization using an automated process; dependency parsing; the most common topics in each corpus; the forty named entities and the most common slogans were extracted, highlighting those linked to public security.

Research limitations/implications

Some critical tasks were identified for the research perspective, related to the applied methodology: the treatment of noise from obtaining news on their source websites, passing through textual elements quite present in social network posts such as abbreviations, emojis/emoticons, and even writing errors; the treatment of subjectivity, to eliminate noise from irony and sarcasm; the search for authentic news of issues within the target domain. All these tasks aim to improve the process to enable interested authorities to perform accurate analyses.

Practical implications

The corpora dedicated to the public security domain enable several analyses, such as mining public opinion on security actions in a given location; understanding criminals' behaviors reported in the news or even on social networks and drawing their attitudes timeline; detecting movements that may cause damage to public property and people welfare through texts from social networks; extracting the history and repercussions of police actions, crossing news with records on social networks; among many other possibilities.

Originality/value

The work on behalf of the corpora reported in this text represents one of the first initiatives to create textual bases in Portuguese, dedicated to Brazil's specific public security domain.

Details

Library Hi Tech, vol. 42 no. 4
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 1 May 1978

W.J. Hutchins

The common view of the ‘aboutness’ of documents is that the index entries (or classifications) assigned to documents represent or indicate in some way the total contents of…

3298

Abstract

The common view of the ‘aboutness’ of documents is that the index entries (or classifications) assigned to documents represent or indicate in some way the total contents of documents; indexing and classifying are seen as processes involving the ‘summarization’ of the texts of documents. In this paper an alternative concept of ‘aboutness’ is proposed based on an analysis of the linguistic organization of texts, which is felt to be more appropriate in many indexing environments (particularly in non‐specialized libraries and information services) and which has implications for the evaluation of the effectiveness of indexing systems.

Details

Aslib Proceedings, vol. 30 no. 5
Type: Research Article
ISSN: 0001-253X

Article
Publication date: 15 August 2018

Heng-Li Yang and August F.Y. Chao

The purpose of this paper is to propose sentiment annotation at sentence level to reduce information overloading while reading product/service reviews in the internet.

Abstract

Purpose

The purpose of this paper is to propose sentiment annotation at sentence level to reduce information overloading while reading product/service reviews in the internet.

Design/methodology/approach

The keyword-based sentiment analysis is applied for highlighting review sentences. An experiment is conducted for demonstrating its effectiveness.

Findings

A prototype is built for highlighting tourism review sentences in Chinese with positive or negative sentiment polarity. An experiment results indicates that sentiment annotation can increase information quality and user’s intention to read tourism reviews.

Research limitations/implications

This study has made two major contributions: proposing the approach of adding sentiment annotation at sentence level of review texts for assisting decision-making; validating the relationships among the information quality constructs. However, in this study, sentiment analysis was conducted on a limited corpus; future research may try a larger corpus. Besides, the annotation system was built on the tourism data. Future studies might try to apply to other areas.

Practical implications

If the proposed annotation systems become popular, both tourists and attraction providers would obtain benefits. In this era of smart tourism, tourists could browse through the huge amount of internet information more quickly. Attraction providers could understand what are the strengths and weaknesses of their facilities more easily. The application of this sentiment analysis is possible for other languages, especially for non-spaced languages.

Originality/value

Facing large amounts of data, past researchers were engaged in automatically constructing a compact yet meaningful abstraction of the texts. However, users have different positions and purposes. This study proposes an alternative approach to add sentiment annotation at sentence level for assisting users.

Details

Online Information Review, vol. 42 no. 5
Type: Research Article
ISSN: 1468-4527

Keywords

1 – 10 of 970