Search results

1 – 6 of 6
Article
Publication date: 9 August 2011

Lin‐Chih Chen

Web‐snippet clustering has recently attracted a lot of attention as a means to provide users with a succinct overview of relevant results compared with traditional search…

Abstract

Purpose

Web‐snippet clustering has recently attracted a lot of attention as a means to provide users with a succinct overview of relevant results compared with traditional search results. This paper seeks to research the building of a web‐snippet clustering system, based on a mixed clustering method.

Design/methodology/approach

This paper proposes a mixed clustering method to organise all returned snippets into a hierarchical tree. The method accomplishes two main tasks: one is to construct the cluster labels and the other is to build a hierarchical tree.

Findings

Five measures were used to measure the quality of clustering results. Based on the results of the experiments, it was concluded that the performance of the system is better than current commercial and academic systems.

Originality/value

A high performance system is presented, based on the clustering method. A divisive hierarchical clustering algorithm is also developed to organise all returned snippets into a hierarchical tree.

Article
Publication date: 22 February 2011

Lin‐Chih Chen

Term suggestion is a very useful information retrieval technique that tries to suggest relevant terms for users' queries, to help advertisers find more appropriate terms…

Abstract

Purpose

Term suggestion is a very useful information retrieval technique that tries to suggest relevant terms for users' queries, to help advertisers find more appropriate terms relevant to their target market. This paper aims to focus on the problem of using several semantic analysis methods to implement a term suggestion system.

Design/methodology/approach

Three semantic analysis techniques are adopted – latent semantic indexing (LSI), probabilistic latent semantic indexing (PLSI), and a keyword relationship graph (KRG) – to implement a term suggestion system.

Findings

This paper shows that using multiple semantic analysis techniques can give significant performance improvements.

Research limitations/implications

The suggested terms returned from the system may be out of date, since the system uses a batch processing mode to update the training parameter.

Originality/value

The paper shows that the benefit of the techniques is to overcome the problems of synonymy and polysemy over the information retrieval field, by using a vector space model. Moreover, an intelligent stopping strategy is proposed to save the required number of iterations for probabilistic latent semantic indexing.

Details

Online Information Review, vol. 35 no. 1
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 24 July 2020

Thanh-Tho Quan, Duc-Trung Mai and Thanh-Duy Tran

This paper proposes an approach to identify categorical influencers (i.e. influencers is the person who is active in the targeted categories) in social media channels…

Abstract

Purpose

This paper proposes an approach to identify categorical influencers (i.e. influencers is the person who is active in the targeted categories) in social media channels. Categorical influencers are important for media marketing but to automatically detect them remains a challenge.

Design/methodology/approach

We deployed the emerging deep learning approaches. Precisely, we used word embedding to encode semantic information of words occurring in the common microtext of social media and used variational autoencoder (VAE) to approximate the topic modeling process, through which the active categories of influencers are automatically detected. We developed a system known as Categorical Influencer Detection (CID) to realize those ideas.

Findings

The approach of using VAE to simulate the Latent Dirichlet Allocation (LDA) process can effectively handle the task of topic modeling on the vast dataset of microtext on social media channels.

Research limitations/implications

This work has two major contributions. The first one is the detection of topics on microtexts using deep learning approach. The second is the identification of categorical influencers in social media.

Practical implications

This work can help brands to do digital marketing on social media effectively by approaching appropriate influencers. A real case study is given to illustrate it.

Originality/value

In this paper, we discuss an approach to automatically identify the active categories of influencers by performing topic detection from the microtext related to the influencers in social media channels. To do so, we use deep learning to approximate the topic modeling process of the conventional approaches (such as LDA).

Details

Online Information Review, vol. 44 no. 5
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 21 January 2019

Issa Alsmadi and Keng Hoon Gan

Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify…

Abstract

Purpose

Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type of document based on their content has a significant implication in many applications. The need to classify these documents in relevant classes according to their text contents should be interested in many practical reasons. Short-text classification is an essential step in many applications, such as spam filtering, sentiment analysis, Twitter personalization, customer review and many other applications related to social networks. Reviews on short text and its application are limited. Thus, this paper aims to discuss the characteristics of short text, its challenges and difficulties in classification. The paper attempt to introduce all stages in principle classification, the technique used in each stage and the possible development trend in each stage.

Design/methodology/approach

The paper as a review of the main aspect of short-text classification. The paper is structured based on the classification task stage.

Findings

This paper discusses related issues and approaches to these problems. Further research could be conducted to address the challenges in short texts and avoid poor accuracy in classification. Problems in low performance can be solved by using optimized solutions, such as genetic algorithms that are powerful in enhancing the quality of selected features. Soft computing solution has a fuzzy logic that makes short-text problems a promising area of research.

Originality/value

Using a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. Current solutions still have low performance, implying the need for improvement. This paper discusses related issues and approaches to these problems.

Details

International Journal of Web Information Systems, vol. 15 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 29 April 2021

Heng-Yang Lu, Yi Zhang and Yuntao Du

Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet…

Abstract

Purpose

Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet Allocation may suffer from the sparsity problem when dealing with short texts, which mostly come from the Web. These models also exist the readability problem when displaying the discovered topics. The purpose of this paper is to propose a novel model called the Sense Unit based Phrase Topic Model (SenU-PTM) for both the sparsity and readability problems.

Design/methodology/approach

SenU-PTM is a novel phrase-based short-text topic model under a two-phase framework. The first phase introduces a phrase-generation algorithm by exploiting word embeddings, which aims to generate phrases with the original corpus. The second phase introduces a new concept of sense unit, which consists of a set of semantically similar tokens for modeling topics with token vectors generated in the first phase. Finally, SenU-PTM infers topics based on the above two phases.

Findings

Experimental results on two real-world and publicly available datasets show the effectiveness of SenU-PTM from the perspectives of topical quality and document characterization. It reveals that modeling topics on sense units can solve the sparsity of short texts and improve the readability of topics at the same time.

Originality/value

The originality of SenU-PTM lies in the new procedure of modeling topics on the proposed sense units with word embeddings for short-text topic discovery.

Details

Data Technologies and Applications, vol. 55 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 14 April 2014

Valentina Franzoni and Alfredo Milani

In this work, a new general framework is proposed to guide navigation over a collaborative concept network, in order to discover paths between concepts. Finding semantic…

Abstract

Purpose

In this work, a new general framework is proposed to guide navigation over a collaborative concept network, in order to discover paths between concepts. Finding semantic chains between concepts over a semantic network is an issue of great interest for many applications, such as explanation generation and query expansion. Collaborative concept networks over the web tend to have features such as large dimensions, high connectivity degree, dynamically evolution over the time, which represent special challenges for efficient graph search methods, since they result in huge memory requirements, high branching factors, unknown dimensions and high cost for accessing nodes. The paper aims to discuss these issues.

Design/methodology/approach

The proposed framework is based on the novel notion of heuristic semantic walk (HSW). In the HSW framework, a semantic proximity measure among concepts, reflecting the collective knowledge embedded in search engines or other statistical sources, is used as a heuristic in order to guide the search in the collaborative network. Different search strategies, information sources and proximity measures, can be used to adapt HSW to the collaborative semantic network under consideration.

Findings

Experiments held on the Wikipedia network and Bing search engine on a range of different semantic measures show that the proposed HSW approach with weighted randomized walk strategy outperforms state-of-the-art search methods.

Originality/value

To the best of the authors' knowledge, the proposed HSW model is the first approach which uses search engine-based proximity measures as heuristic for semantic search.

1 – 6 of 6