Search results

1 – 10 of 44
Article
Publication date: 18 April 2016

Rani Qumsiyeh and Yiu-Kai Ng

The purpose of this paper is to introduce a summarization method to enhance the current web-search approaches by offering a summary of each clustered set of web-search results…

Abstract

Purpose

The purpose of this paper is to introduce a summarization method to enhance the current web-search approaches by offering a summary of each clustered set of web-search results with contents addressing the same topic, which should allow the user to quickly identify the information covered in the clustered search results. Web search engines, such as Google, Bing and Yahoo!, rank the set of documents S retrieved in response to a user query and represent each document D in S using a title and a snippet, which serves as an abstract of D. Snippets, however, are not as useful as they are designed for, i.e. assisting its users to quickly identify results of interest. These snippets are inadequate in providing distinct information and capture the main contents of the corresponding documents. Moreover, when the intended information need specified in a search query is ambiguous, it is very difficult, if not impossible, for a search engine to identify precisely the set of documents that satisfy the user’s intended request without requiring additional information. Furthermore, a document title is not always a good indicator of the content of the corresponding document either.

Design/methodology/approach

The authors propose to develop a query-based summarizer, called QSum, in solving the existing problems of Web search engines which use titles and abstracts in capturing the contents of retrieved documents. QSum generates a concise/comprehensive summary for each cluster of documents retrieved in response to a user query, which saves the user’s time and effort in searching for specific information of interest by skipping the step to browse through the retrieved documents one by one.

Findings

Experimental results show that QSum is effective and efficient in creating a high-quality summary for each cluster to enhance Web search.

Originality/value

The proposed query-based summarizer, QSum, is unique based on its searching approach. QSum is also a significant contribution to the Web search community, as it handles the ambiguous problem of a search query by creating summaries in response to different interpretations of the search which offer a “road map” to assist users to quickly identify information of interest.

Details

International Journal of Web Information Systems, vol. 12 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 15 August 2023

Yi-Hung Liu and Sheng-Fong Chen

Whether automatically generated summaries of health social media can assist users in appropriately managing their diseases and ensuring better communication with health…

Abstract

Purpose

Whether automatically generated summaries of health social media can assist users in appropriately managing their diseases and ensuring better communication with health professionals becomes an important issue. This paper aims to develop a novel deep learning-based summarization approach for obtaining the most informative summaries from online patient reviews accurately and effectively.

Design/methodology/approach

This paper proposes a framework to generate summaries that integrates a domain-specific pre-trained embedding model and a deep neural extractive summary approach by considering content features, text sentiment, review influence and readability features. Representative health-related summaries were identified, and user judgements were analysed.

Findings

Experimental results on the three real-world health forum data sets indicate that awarding sentences without incorporating all the adopted features leads to declining summarization performance. The proposed summarizer significantly outperformed the comparison baseline. User judgement through the questionnaire provides realistic and concrete evidence of crucial features that remarkably influence patient forum review summaries.

Originality/value

This study contributes to health analytics and management literature by exploring users’ expressions and opinions through the health deep learning summarization model. The research also developed an innovative mindset to design summarization weighting methods from user-created content on health topics.

Details

The Electronic Library , vol. 41 no. 5
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 1 May 2006

Shiyan Ou, Christopher S.G. Khoo and Dion H. Goh

The purpose of this research is to develop a method for automatic construction of multi‐document summaries of sets of news articles that might be retrieved by a web search engine…

Abstract

Purpose

The purpose of this research is to develop a method for automatic construction of multi‐document summaries of sets of news articles that might be retrieved by a web search engine in response to a user query.

Design/methodology/approach

Based on the cross‐document discourse analysis, an event‐based framework is proposed for integrating and organizing information extracted from different news articles. It has a hierarchical structure in which the summarized information is presented at the top level and more detailed information given at the lower levels. A tree‐view interface was implemented for displaying a multi‐document summary based on the framework. A preliminary user evaluation was performed by comparing the framework‐based summaries against the sentence‐based summaries.

Findings

In a small evaluation, all the human subjects preferred the framework‐based summaries to the sentence‐based summaries. It indicates that the event‐based framework is an effective way to summarize a set of news articles reporting an event or a series of relevant events.

Research limitations/implications

Limited to event‐based news articles only, not applicable to news critiques and other kinds of news articles. A summarization system based on the event‐based framework is being implemented.

Practical implications

Multi‐document summarization of news articles can adopt the proposed event‐based framework.

Originality/value

An event‐based framework for summarizing sets of news articles was developed and evaluated using a tree‐view interface for displaying such summaries.

Details

Aslib Proceedings, vol. 58 no. 3
Type: Research Article
ISSN: 0001-253X

Keywords

Article
Publication date: 15 February 2024

Songlin Bao, Tiantian Li and Bin Cao

In the era of big data, various industries are generating large amounts of text data every day. Simplifying and summarizing these data can effectively serve users and improve…

Abstract

Purpose

In the era of big data, various industries are generating large amounts of text data every day. Simplifying and summarizing these data can effectively serve users and improve efficiency. Recently, zero-shot prompting in large language models (LLMs) has demonstrated remarkable performance on various language tasks. However, generating a very “concise” multi-document summary is a difficult task for it. When conciseness is specified in the zero-shot prompting, the generated multi-document summary still contains some unimportant information, even with the few-shot prompting. This paper aims to propose a LLMs prompting for multi-document summarization task.

Design/methodology/approach

To overcome this challenge, the authors propose chain-of-event (CoE) prompting for multi-document summarization (MDS) task. In this prompting, the authors take events as the center and propose a four-step summary reasoning process: specific event extraction; event abstraction and generalization; common event statistics; and summary generation. To further improve the performance of LLMs, the authors extend CoE prompting with the example of summary reasoning.

Findings

Summaries generated by CoE prompting are more abstractive, concise and accurate. The authors evaluate the authors’ proposed prompting on two data sets. The experimental results over ChatGLM2-6b show that the authors’ proposed CoE prompting consistently outperforms other typical promptings across all data sets.

Originality/value

This paper proposes CoE prompting to solve MDS tasks by the LLMs. CoE prompting can not only identify the key events but also ensure the conciseness of the summary. By this method, users can access the most relevant and important information quickly, improving their decision-making processes.

Details

International Journal of Web Information Systems, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 14 June 2021

Farnoush Bayatmakou, Azadeh Mohebi and Abbas Ahmadi

Query-based summarization approaches might not be able to provide summaries compatible with the user’s information need, as they mostly rely on a limited source of information…

Abstract

Purpose

Query-based summarization approaches might not be able to provide summaries compatible with the user’s information need, as they mostly rely on a limited source of information, usually represented as a single query by the user. This issue becomes even more challenging when dealing with scientific documents, as they contain more specific subject-related terms, while the user may not be able to express his/her specific information need in a query with limited terms. This study aims to propose an interactive multi-document text summarization approach that generates an eligible summary that is more compatible with the user’s information need. This approach allows the user to interactively specify the composition of a multi-document summary.

Design/methodology/approach

This approach exploits the user’s opinion in two stages. The initial query is refined by user-selected keywords/keyphrases and complete sentences extracted from the set of retrieved documents. It is followed by a novel method for sentence expansion using the genetic algorithm, and ranking the final set of sentences using the maximal marginal relevance method. Basically, for implementation, the Web of Science data set in the artificial intelligence (AI) category is considered.

Findings

The proposed approach receives feedback from the user in terms of favorable keywords and sentences. The feedback eventually improves the summary as the end. To assess the performance of the proposed system, this paper has asked 45 users who were graduate students in the field of AI to fill out a questionnaire. The quality of the final summary has been also evaluated from the user’s perspective and information redundancy. It has been investigated that the proposed approach leads to higher degrees of user satisfaction compared to the ones with no or only one step of the interaction.

Originality/value

The interactive summarization approach goes beyond the initial user’s query, while it includes the user’s preferred keywords/keyphrases and sentences through a systematic interaction. With respect to these interactions, the system gives the user a more clear idea of the information he/she is looking for and consequently adjusting the final result to the ultimate information need. Such interaction allows the summarization system to achieve a comprehensive understanding of the user’s information needs while expanding context-based knowledge and guiding the user toward his/her information journey.

Details

Information Discovery and Delivery, vol. 50 no. 2
Type: Research Article
ISSN: 2398-6247

Keywords

Article
Publication date: 23 November 2010

Yongzheng Zhang, Evangelos Milios and Nur Zincir‐Heywood

Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel…

Abstract

Purpose

Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel topic‐based framework to address this problem.

Design/methodology/approach

A two‐stage framework is proposed. The first stage identifies the main topics covered in a web site via clustering and the second stage summarizes each topic separately. The proposed system is evaluated by a user study and compared with the single‐topic summarization approach.

Findings

The user study demonstrates that the clustering‐summarization approach statistically significantly outperforms the plain summarization approach in the multi‐topic web site summarization task. Text‐based clustering based on selecting features with high variance over web pages is reliable; outgoing links are useful if a rich set of cross links is available.

Research limitations/implications

More sophisticated clustering methods than those used in this study are worth investigating. The proposed method should be tested on web content that is less structured than organizational web sites, for example blogs.

Practical implications

The proposed summarization framework can be applied to the effective organization of search engine results and faceted or topical browsing of large web sites.

Originality/value

Several key components are integrated for web site summarization for the first time, including feature selection and link analysis, key phrase and key sentence extraction. Insight into the contributions of links and content to topic‐based summarization was gained. A classification approach is used to minimize the number of parameters.

Details

International Journal of Web Information Systems, vol. 6 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 4 September 2019

Yi-Hung Liu, Xiaolong Song and Sheng-Fong Chen

Whether automatically generated summaries of health social media can aid users in managing their diseases appropriately is an important question. The purpose of this paper is to…

Abstract

Purpose

Whether automatically generated summaries of health social media can aid users in managing their diseases appropriately is an important question. The purpose of this paper is to introduce a novel text summarization approach for acquiring the most informative summaries from online patient posts accurately and effectively.

Design/methodology/approach

The data set regarding diabetes and HIV posts was, respectively, collected from two online disease forums. The proposed summarizer is based on the graph-based method to generate summaries by considering social network features, text sentiment and sentence features. Representative health-related summaries were identified and summarization performance as well as user judgments were analyzed.

Findings

The findings show that awarding sentences without using all the incorporating features decreases summarization performance compared with the classic summarization method and comparison approaches. The proposed summarizer significantly outperformed the comparison baseline.

Originality/value

This study contributes to the literature on health knowledge management by analyzing patients’ experiences and opinions through the health summarization model. The research additionally develops a new mindset to design abstractive summarization weighting schemes from the health user-generated content.

Details

Aslib Journal of Information Management, vol. 71 no. 6
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 1 October 1999

G.G. Chowdhury and Sudatta Chowdhury

Digital library research has attracted much attention in the most developed, and in a number of developing, countries. While many digital library research projects are funded by…

4565

Abstract

Digital library research has attracted much attention in the most developed, and in a number of developing, countries. While many digital library research projects are funded by government agencies and national and international bodies, some are run by specific academic and research institutions and libraries, either individually or collaboratively. While some digital library projects, such as the ELINOR project in the UK, the first two phases of the eLib (Electronic Libraries) Programme in the UK, and the first phase of DLI (Digital Library Initiative) in the US, are now over, a number of other projects are currently under way in different parts of the world. Beginning with the definitions and characteristics of digital libraries, as proposed by various researchers, this paper provides brief accounts of some major digital library projects that are currently in progress, or are just completed, in different parts of the world. There follows a review of digital library research under sixteen major headings. Literature for this review has been identified through a search on LISA CD‐ROM database, and a Dialog search on library and information science databases, and the resulting output has been supplemented by a scan of the various issues of D‐Lib Magazine and Ariadne, and the websites of various organisations and institutions engaged in digital library research. The review indicates that we have learned a lot through digital library research within a short span of time. However, a number of issues are yet to be resolved. The paper ends with an indication of the research issues that need to be addressed and resolved in the near future in order to bring the digital library from the researcher‘s laboratory to the real life environment.

Details

Journal of Documentation, vol. 55 no. 4
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 19 June 2019

Prafulla Bafna, Dhanya Pramod, Shailaja Shrwaikar and Atiya Hassan

Document management is growing in importance proportionate to the growth of unstructured data, and its applications are increasing from process benchmarking to customer…

Abstract

Purpose

Document management is growing in importance proportionate to the growth of unstructured data, and its applications are increasing from process benchmarking to customer relationship management and so on. The purpose of this paper is to improve important components of document management that is keyword extraction and document clustering. It is achieved through knowledge extraction by updating the phrase document matrix. The objective is to manage documents by extending the phrase document matrix and achieve refined clusters. The study achieves consistency in cluster quality in spite of the increasing size of data set. Domain independence of the proposed method is tested and compared with other methods.

Design/methodology/approach

In this paper, a synset-based phrase document matrix construction method is proposed where semantically similar phrases are grouped to reduce the dimension curse. When a large collection of documents is to be processed, it includes some documents that are very much related to the topic of interest known as model documents and also the documents that deviate from the topic of interest. These non-relevant documents may affect the cluster quality. The first step in knowledge extraction from the unstructured textual data is converting it into structured form either as term frequency-inverse document frequency matrix or as phrase document matrix. Once in structured form, a range of mining algorithms from classification to clustering can be applied.

Findings

In the enhanced approach, the model documents are used to extract key phrases with synset groups, whereas the other documents participate in the construction of the feature matrix. It gives a better feature vector representation and improved cluster quality.

Research limitations/implications

Various applications that require managing of unstructured documents can use this approach by specifically incorporating the domain knowledge with a thesaurus.

Practical implications

Experiment pertaining to the academic domain is presented that categorizes research papers according to the context and topic, and this will help academicians to organize and build knowledge in a better way. The grouping and feature extraction for resume data can facilitate the candidate selection process.

Social implications

Applications like knowledge management, clustering of search engine results, different recommender systems like hotel recommender, task recommender, and so on, will benefit from this study. Hence, the study contributes to improving document management in business domains or areas of interest of its users from various strata’s of society.

Originality/value

The study proposed an improvement to document management approach that can be applied in various domains. The efficacy of the proposed approach and its enhancement is validated on three different data sets of well-articulated documents from data sets such as biography, resume and research papers. These results can be used for benchmarking further work carried out in these areas.

Details

Benchmarking: An International Journal, vol. 26 no. 6
Type: Research Article
ISSN: 1463-5771

Keywords

Article
Publication date: 27 April 2018

Tai-Chia Huang, Chia-Hsuan Hsieh and Hei-Chia Wang

Producing meeting documents requires an instantaneous recorder during meetings, which costs extra human resources and takes time to amend the file. However, a high-quality meeting…

Abstract

Purpose

Producing meeting documents requires an instantaneous recorder during meetings, which costs extra human resources and takes time to amend the file. However, a high-quality meeting document can enable users to recall the meeting content efficiently. The paper aims to discuss these issues.

Design/methodology/approach

An application based on this framework is developed to help the users find topics and obtain summarizations of meeting contents without extra effort. This app uses the Bluemix speech recognizer to obtain speech transcripts. It then combines latent Dirichlet allocation and a TextTiling algorithm with the speech script of meetings to detect boundaries between different topics and evaluate the topics in each segment. TextTeaser, an open API based on a feature-based approach, is then used to summarize the speech transcripts.

Findings

The results indicate that the summaries generated by the machine are 85 percent similar to the records written by humankind.

Originality/value

To reduce the human effort in generating meeting reports, this paper presents a framework to record and analyze meeting contents automatically by voice recognition, topic detection, and extractive summarization.

Details

Data Technologies and Applications, vol. 52 no. 3
Type: Research Article
ISSN: 2514-9288

Keywords

1 – 10 of 44