The purpose of this paper is to extract appropriate terms to summarize the current results in terms of the contents of textual facets. Faceted search on XML data helps users find necessary information from XML data by giving attribute–content pairs (called facet-value pair) about the current search results. However, if most of the contents of a facet have longer texts in average (such facets are called textual facets), it is not easy to overview the current results.
The proposed approach is based upon subsumption relationships of terms among the contents of a facet. The subsumption relationship can be extracted using co-occurrences of terms among a number of documents (in this paper, a content of a facet is considered as a document). Subsumption relationships compose hierarchies, and the authors utilize the hierarchies to extract facet-values from textual facets. In the faceted search context, users have ambiguous search demands, they expect broader terms. Thus, we extract high-level terms in the hierarchies as facet-values.
The main findings of this paper are the extracted terms improve users’ search experiences, especially in cases when the search demands are ambiguous.
An originality of this paper is the way to utilize the textual contents of XML data for improving users’ search experiences on faceted search. The other originality is how to design the tasks to evaluate exploratory search like faceted search.
This research was partly supported by the Grant-in-Aid for Scientific Research (B) (#26280037) and the program Research and Development on Real World Big Data Integration and Analysis of the Ministry of Education, Culture, Sports, Science and Technology, Japan.
Komamizu, T., Amagasa, T. and Kitagawa, H. (2015), "Facet-value extraction scheme from textual contents in XML data", International Journal of Web Information Systems, Vol. 11 No. 3, pp. 270-290. https://doi.org/10.1108/IJWIS-04-2015-0012Download as .RIS
Emerald Group Publishing Limited
Copyright © 2015, Emerald Group Publishing Limited