To read this content please select one of the options below:

Facet-value extraction scheme from textual contents in XML data

Takahiro Komamizu (Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan)
Toshiyuki Amagasa (Faculty of Engineering, Information and Systems, University of Tsukuba, Tsukuba, Japan)
Hiroyuki Kitagawa (Faculty of Engineering, Information and Systems, University of Tsukuba, Tsukuba, Japan)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 17 August 2015




The purpose of this paper is to extract appropriate terms to summarize the current results in terms of the contents of textual facets. Faceted search on XML data helps users find necessary information from XML data by giving attribute–content pairs (called facet-value pair) about the current search results. However, if most of the contents of a facet have longer texts in average (such facets are called textual facets), it is not easy to overview the current results.


The proposed approach is based upon subsumption relationships of terms among the contents of a facet. The subsumption relationship can be extracted using co-occurrences of terms among a number of documents (in this paper, a content of a facet is considered as a document). Subsumption relationships compose hierarchies, and the authors utilize the hierarchies to extract facet-values from textual facets. In the faceted search context, users have ambiguous search demands, they expect broader terms. Thus, we extract high-level terms in the hierarchies as facet-values.


The main findings of this paper are the extracted terms improve users’ search experiences, especially in cases when the search demands are ambiguous.


An originality of this paper is the way to utilize the textual contents of XML data for improving users’ search experiences on faceted search. The other originality is how to design the tasks to evaluate exploratory search like faceted search.



This research was partly supported by the Grant-in-Aid for Scientific Research (B) (#26280037) and the program Research and Development on Real World Big Data Integration and Analysis of the Ministry of Education, Culture, Sports, Science and Technology, Japan.


Komamizu, T., Amagasa, T. and Kitagawa, H. (2015), "Facet-value extraction scheme from textual contents in XML data", International Journal of Web Information Systems, Vol. 11 No. 3, pp. 270-290.



Emerald Group Publishing Limited

Copyright © 2015, Emerald Group Publishing Limited

Related articles