To read this content please select one of the options below:

THC-DAT: a document analysis tool based on topic hierarchy and context information

Jing Chen (School of Information Management, Central China Normal University, Wuhan, P.R. China)
Tian Tian Wang (School of Information Management, Central China Normal University, Wuhan, P.R. China)
Quan Lu (Center for Studies of Information Resources, Wuhan University, Wuhan, P.R. China)

Library Hi Tech

ISSN: 0737-8831

Article publication date: 21 March 2016

557

Abstract

Purpose

The purpose of this paper is to propose a novel within-document analysis tool (DAT) topic hierarchy and context-based document analysis tool (THC-DAT) which enables users to interactively analyze any multi-topic document based on fine-grained and hierarchical topics automatically extracted from it. THC-DAT used hierarchical latent Dirichlet allocation method and took the context information into account so that it can reveal the relationships between latent topics and related texts in a document.

Design/methodology/approach

The methodology is a case study. The authors reviewed the related literature first, then utilized a general “build and test” research model. After explaining the model, interface and functions of THC-DAT, a case study was presented using a scholarly paper that was analyzed with the tool.

Findings

THC-DAT can organize and serve document topics and texts hierarchically and context based, which overcomes the drawbacks of traditional DATs. The navigation, browse, search and comparison functions of THC-DAT enable users to read, search and analyze multi-topic document efficiently and effectively.

Practical implications

It can improve the document organization and services in digital libraries or e-readers, by helping users to interactively read, search and analyze documents efficiently and effectively, exploringly learn about unfamiliar topics with little cognitive burden, or deepen their understanding of a document.

Originality/value

This paper designs a tool THC-DAT to analyze document in a THC way. It contributes to overcoming the coarse-analysis drawbacks of existing within-DATs.

Keywords

Acknowledgements

The authors gratefully acknowledge the financial support for this work provided by National Natural Science Foundation of China (No:71303089, 71273195 and 71420107026) and the National Basic Research Program of China (973 Program, No: 904171200).

Citation

Chen, J., Wang, T.T. and Lu, Q. (2016), "THC-DAT: a document analysis tool based on topic hierarchy and context information", Library Hi Tech, Vol. 34 No. 1, pp. 64-86. https://doi.org/10.1108/LHT-07-2015-0074

Publisher

:

Emerald Group Publishing Limited

Copyright © 2016, Emerald Group Publishing Limited

Related articles