Search results

1 – 10 of over 21000
Article
Publication date: 23 November 2010

Yongzheng Zhang, Evangelos Milios and Nur Zincir‐Heywood

Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a…

Abstract

Purpose

Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel topic‐based framework to address this problem.

Design/methodology/approach

A two‐stage framework is proposed. The first stage identifies the main topics covered in a web site via clustering and the second stage summarizes each topic separately. The proposed system is evaluated by a user study and compared with the single‐topic summarization approach.

Findings

The user study demonstrates that the clustering‐summarization approach statistically significantly outperforms the plain summarization approach in the multi‐topic web site summarization task. Text‐based clustering based on selecting features with high variance over web pages is reliable; outgoing links are useful if a rich set of cross links is available.

Research limitations/implications

More sophisticated clustering methods than those used in this study are worth investigating. The proposed method should be tested on web content that is less structured than organizational web sites, for example blogs.

Practical implications

The proposed summarization framework can be applied to the effective organization of search engine results and faceted or topical browsing of large web sites.

Originality/value

Several key components are integrated for web site summarization for the first time, including feature selection and link analysis, key phrase and key sentence extraction. Insight into the contributions of links and content to topic‐based summarization was gained. A classification approach is used to minimize the number of parameters.

Details

International Journal of Web Information Systems, vol. 6 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 February 2021

Narasimhulu K, Meena Abarna KT and Sivakumar B

The purpose of the paper is to study multiple viewpoints which are required to access the more informative similarity features among the tweets documents, which is useful…

Abstract

Purpose

The purpose of the paper is to study multiple viewpoints which are required to access the more informative similarity features among the tweets documents, which is useful for achieving the robust tweets data clustering results.

Design/methodology/approach

Let “N” be the number of tweets documents for the topics extraction. Unwanted texts, punctuations and other symbols are removed, tokenization and stemming operations are performed in the initial tweets pre-processing step. Bag-of-features are determined for the tweets; later tweets are modelled with the obtained bag-of-features during the process of topics extraction. Approximation of topics features are extracted for every tweet document. These set of topics features of N documents are treated as multi-viewpoints. The key idea of the proposed work is to use multi-viewpoints in the similarity features computation. The following figure illustrates multi-viewpoints based cosine similarity computation of the five tweets documents (here N = 5) and corresponding documents are defined in projected space with five viewpoints, say, v1,v2, v3, v4, and v5. For example, similarity features between two documents (viewpoints v1, and v2) are computed concerning the other three multi-viewpoints (v3, v4, and v5), unlike a single viewpoint in traditional cosine metric.

Findings

Healthcare problems with tweets data. Topic models play a crucial role in the classification of health-related tweets with finding topics (or health clusters) instead of finding term frequency and inverse document frequency (TF–IDF) for unlabelled tweets.

Originality/value

Topic models play a crucial role in the classification of health-related tweets with finding topics (or health clusters) instead of finding TF-IDF for unlabelled tweets.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 14 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 3 April 2009

Maria Soledad Pera and Yiu‐Kai Ng

Tens of thousands of news articles are posted online each day, covering topics from politics to science to current events. To better cope with this overwhelming volume of…

Abstract

Purpose

Tens of thousands of news articles are posted online each day, covering topics from politics to science to current events. To better cope with this overwhelming volume of information, RSS (news) feeds are used to categorize newly posted articles. Nonetheless, most RSS users must filter through many articles within the same or different RSS feeds to locate articles pertaining to their particular interests. Due to the large number of news articles in individual RSS feeds, there is a need for further organizing articles to aid users in locating non‐redundant, informative, and related articles of interest quickly. This paper aims to address these issues.

Design/methodology/approach

The paper presents a novel approach which uses the word‐correlation factors in a fuzzy set information retrieval model to: filter out redundant news articles from RSS feeds; shed less‐informative articles from the non‐redundant ones; and cluster the remaining informative articles according to the fuzzy equivalence classes on the news articles.

Findings

The clustering approach requires little overhead or computational costs, and experimental results have shown that it outperforms other existing, well‐known clustering approaches.

Research limitations/implications

The clustering approach as proposed in this paper applies only to RSS news articles; however, it can be extended to other application domains.

Originality/value

The developed clustering tool is highly efficient and effective in filtering and classifying RSS news articles and does not employ any labor‐intensive user‐feedback strategy. Therefore, it can be implemented in real‐world RSS feeds to aid users in locating RSS news articles of interest.

Details

International Journal of Web Information Systems, vol. 5 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 21 October 2021

Noorullah Renigunta Mohammed and Moulana Mohammed

The purpose of this study for eHealth text mining domains, cosine-based visual methods (VM) assess the clusters more accurately than Euclidean; which are recommended for…

Abstract

Purpose

The purpose of this study for eHealth text mining domains, cosine-based visual methods (VM) assess the clusters more accurately than Euclidean; which are recommended for tweet data models for clusters assessment. Such VM determines the clusters concerning a single viewpoint or none, which are less informative. Multi-viewpoints (MVP) were used for addressing the more informative clusters assessment of health-care tweet documents and to demonstrate visual analysis of cluster tendency.

Design/methodology/approach

In this paper, the authors proposed MVP-based VM by using traditional topic models with visual techniques to find cluster tendency, partitioning for cluster validity to propose health-care recommendations based on tweets. The authors demonstrated the effectiveness of proposed methods on different real-time Twitter health-care data sets in the experimental study. The authors also did a comparative analysis of proposed models with existing visual assessment tendency (VAT) and cVAT models by using cluster validity indices and computational complexities; the examples suggest that MVP VM were more informative.

Findings

In this paper, the authors proposed MVP-based VM by using traditional topic models with visual techniques to find cluster tendency, partitioning for cluster validity to propose health-care recommendations based on tweets.

Originality/value

In this paper, the authors proposed multi-viewpoints distance metric in topic model cluster tendency for the first time and visual representation using VAT images using hybrid topic models to find cluster tendency, partitioning for cluster validity to propose health-care recommendations based on tweets.

Details

International Journal of Pervasive Computing and Communications, vol. 18 no. 1
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 7 August 2017

Daniel Carnerud

The purpose of this paper is to explore and describe research presented in the International Journal of Quality & Reliability Management (IJQRM), thereby creating an…

Abstract

Purpose

The purpose of this paper is to explore and describe research presented in the International Journal of Quality & Reliability Management (IJQRM), thereby creating an increased understanding of how the areas of research have evolved through the years. An additional purpose is to show how text mining methodology can be used as a tool for exploration and description of research publications.

Design/methodology/approach

The study applies text mining methodologies to explore and describe the digital library of IJQRM from 1984 up to 2014. To structure and condense the data, k-means clustering and probabilistic topic modeling with latent Dirichlet allocation is applied. The data set consists of research paper abstracts.

Findings

The results support the suggestion of the occurrence of trends, fads and fashion in research publications. Research on quality function deployment (QFD) and reliability management are noted to be on the downturn whereas research on Six Sigma with a focus on lean, innovation, performance and improvement on the rise. Furthermore, the study confirms IJQRM as a scientific journal with quality and reliability management as primary areas of coverage, accompanied by specific topics such as total quality management, service quality, process management, ISO, QFD and Six Sigma. The study also gives an insight into how text mining can be used as a way to efficiently explore and describe large quantities of research paper abstracts.

Research limitations/implications

The study focuses on abstracts of research papers, thus topics and categories that could be identified via other journal publications, such as book reviews; general reviews; secondary articles; editorials; guest editorials; awards for excellence (notifications); introductions or summaries from conferences; notes from the publisher; and articles without an abstract, are excluded.

Originality/value

There do not seem to be any prior text mining studies that apply cluster modeling and probabilistic topic modeling to research article abstracts in the IJQRM. This study therefore offers a unique perspective on the journal’s content.

Details

International Journal of Quality & Reliability Management, vol. 34 no. 7
Type: Research Article
ISSN: 0265-671X

Keywords

Article
Publication date: 6 November 2017

Akiyo Nadamoto and Keigo Sakai

Recently, people usually use the internet to obtain travel information, when they plan their travel. They especially want to obtain sightseeing spot information from…

Abstract

Purpose

Recently, people usually use the internet to obtain travel information, when they plan their travel. They especially want to obtain sightseeing spot information from reviews, but there are huge amounts of reviews of sightseeing spots. Users therefore cannot obtain important information from the reviews easily. As described herein, this paper aims to propose a system that automatically extracts and presents welcome news for sightseeing spots from reviews. This proposed Welcome-news is a “useful information” and “unexpected information” related to travel.

Design/methodology/approach

The flow for extracting Welcome-news from reviews is simple: A user inputs a sightseeing spot about which to get information; the system obtains reviews of the sightseeing spot and divides each sentence into reviews; the system extracts sentences including Welcome-news keyword(s), and the sentences become useful information; the system extracts unexpected information from useful information based on clustering, and it becomes Welcome-news; and the system presents all Welcome-news to the user.

Findings

This paper reports three findings: extraction of useful information for sightseeing spots based on Welcome-news keywords extracted by our experiment and using support vector machine (SVM); extraction of unexpected information for sightseeing spots by clustering; and automatic presentation of Welcome-news.

Originality/value

Numerous studies have extracted information from reviews based on some keywords. This proposed extraction of Welcome-news for travel not only uses keywords but also clusters based on topics. Furthermore, the proposed keywords include general keywords and unique keywords. The former appears for all kinds of sightseeing spots. The latter appears only for sightseeing spot. The authors extracted general keywords manually, and unique keywords using SVM.

Details

International Journal of Web Information Systems, vol. 13 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 3 April 2018

Yu Suzuki, Hiromitsu Ohara and Akiyo Nadamoto

This paper aims to propose a method for summarizing the topics of tweets using the Wikipedia category structure as common knowledge for supplementing the understanding of…

Abstract

Purpose

This paper aims to propose a method for summarizing the topics of tweets using the Wikipedia category structure as common knowledge for supplementing the understanding of the Twitter user’s interests. There are many topics in the tweets, and the topics can be treated as a tree structure. However, when the topic hierarchy is constructed using existing hierarchal clustering approach, the granularity of tweet groups differs for each user. For summarizing the topics, identification of the topics which are heterogeneous and which are not is necessary because it is easy to understand if several groups are categorized into parent groups. However, if the group units are different for each user, a number of users’ interests cannot be summarized. If some tweets are grouped into the presidential election, and the others are into Donald Trump, there cannot be a count of how many users are interested in Donald Trump.

Design/methodology/approach

One solution of this issue is to construct topic structures by mapping one common tree structure. In this paper, a method is proposed for constructing the topic structure using the Wikipedia category tree similar to a common tree structure. The tweets are categorized, mapped to titles of articles in the Wikipedia category tree and then visualized as the hierarchal structure to the users.

Findings

The effectiveness of the proposed hierarchal topic structure is confirmed. In theme “politics”, the proposed method works well. The main reason is that there are many technical terms about politics in the Wikipedia categories and articles. It was found that a number of the terms of politics do not have multiple meanings, multiple semantics. However, in theme “sports”, the proposed method does not perform well. The main reason for this case is that there are a number of names of people present as topic names.

Originality/value

One important feature of the proposed method is that it is easy to grasp not only about the topics which are heterogeneous or homogeneous with each other but also consider the missing time when extracting topics. Another feature is that the topic structures for multiple users are easy to compare with each other.

Details

International Journal of Pervasive Computing and Communications, vol. 14 no. 1
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 23 August 2011

Xudong Zhu and Zhi‐Jing Liu

The purpose of this paper is to address the problem of profiling human behaviour patterns captured in surveillance videos for the application of online normal behaviour…

Abstract

Purpose

The purpose of this paper is to address the problem of profiling human behaviour patterns captured in surveillance videos for the application of online normal behaviour recognition and anomaly detection.

Design/methodology/approach

A novel framework is developed for automatic behaviour profiling and online anomaly detection without any manual labeling of the training dataset.

Findings

Experimental results demonstrate the effectiveness and robustness of the authors' approach using noisy and sparse datasets collected from one real surveillance scenario.

Originality/value

To discover the topics, co‐clustering topic model not only captures the correlation between words, but also models the correlations between topics. The major difference between the conventional co‐clustering algorithms and the proposed CCMT is that CCMT shows a major improvement in terms of recall, i.e. interpretability.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 4 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 29 March 2013

Yuki Hattori and Akiyo Nadamoto

The information of social media is not often written in ordinary web pages. Nevertheless, it is difficult to extract such information from social media because such…

1363

Abstract

Purpose

The information of social media is not often written in ordinary web pages. Nevertheless, it is difficult to extract such information from social media because such services include so much information. Furthermore, various topics are mixed in social media communities. The authors designate such important and unique information related to social media as tip information. In this paper, they aim to propose a method to extract tip information that has been classified by topic from social networking services as a first step in extracting tip information from social media.

Design/methodology/approach

Themes of many kinds exist in a social media community because users write contents freely. Then the authors first detect the topics from the community and cluster the comment based on the topics. Subsequently, they extract tip information from each cluster. In this time, the tip information is include a user's experience and it has common important words.

Findings

The authors used an experiment to confirm that their proposed method can extract appropriate tip information from a community that a user specifies. The average precision is 69 per cent. A comparison of the authors' proposed method and baseline which is without detection of topic and clustering, the average precision obtained using the authors' proposed method is 18 per cent greater than the baseline.

Originality/value

The authors have three points to extract tip information from social media: topic detection and clustering from the social media using LDA method; extracting an author's actual experiences; and creation of a tip keyword dictionary from user experiments.

Details

International Journal of Web Information Systems, vol. 9 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 20 July 2021

Eymen Çağatay Bilge and Hakan Yaman

This study aims to identify the trends that have changed in the field of construction management over the last 20 years.

Abstract

Purpose

This study aims to identify the trends that have changed in the field of construction management over the last 20 years.

Design/methodology/approach

In this study, 3,335 journal articles published in the years 2000–2020 were collected from the Web of Science database in construction management. The authors applied bibliometric analysis first and then detected topics with the latent Dirichlet allocation (LDA) topic detection method.

Findings

In this context, 20 clusters from cluster analysis were found and the topics were extracted in clusters with the LDA topic detection method. The results show “building information modeling” and “information management” are the most studied subjects, even though they have emerged in the last 15 years “building information modeling,” “information management,” “scheduling and cost optimization,” “lean construction,” “agile approach” and “megaprojects” are the trend topics in the construction management literature.

Research limitations/implications

This study uses bibliometric analysis. The authors accept that the co-citation and co-authorship relationship in the data is ethical. They accept that honorary authorship, self-citation or honorary citation do not change the pattern of the construction management research domain.

Originality/value

There has been no study conducted in the last 20 years to examine research trends in construction management. Although bibliometric analysis, systematic literature reviews and text mining methods are used separately as a methodology for extracting research trends, no study has used enhanced bibliometric analysis and the LDA topic detection text mining method.

Details

Engineering, Construction and Architectural Management, vol. 29 no. 8
Type: Research Article
ISSN: 0969-9988

Keywords

1 – 10 of over 21000