To read this content please select one of the options below:

Large-scale analysis of query logs to profile users for dataset search

Romina Sharifpour (School of Computing Technologies, RMIT University, Melbourne, Australia)
Mingfang Wu (ARDC, Caulfield East, Australia)
Xiuzhen Zhang (School of Computing Technologies, RMIT University, Melbourne, Australia)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 27 April 2022

Issue publication date: 10 January 2023




With an explosion of datasets available on the Web, dataset search has gained attention as an emerging research domain. Understanding users' dataset behaviour is imperative for providing effective data discovery services. In this paper, the authors present a study on users' dataset search behaviour through the analysis of search logs from a research data discovery portal.


Using query and session based features, the authors apply cluster analysis to discover distinct user profiles with different search behaviours. One particular behavioural construct of our interest is users' expertise that the authors generate via computing semantic similarity between users' search queries and the title of metadata records in the displayed search results.


The findings revealed that there are six distinct classes of user behaviours for dataset search, namely; Expert Research, Expert Search, Expert Explore, Novice Research, Novice Search and Novice Explore.

Research limitations/implications

The user profiles are derived based on analysis of the search log of the research data catalogue in this study. Further research is needed to generalise the user profiles to other dataset search settings. Future research can take on a confirmatory approach to verify these user groups and establish a deeper understanding of their information needs.

Practical implications

The findings in this paper have implications for designing search systems that tailor search results matching the diverse information needs of different user groups.


We propose for the first time a taxonomy of users for dataset search based on their domain expertise and search behaviour.



The authors thank the Australian Research Data Commons for making their search log dataset available for the study; special thanks to Mr. Joel Benn, for extracting and helping clean up the search log dataset.


Sharifpour, R., Wu, M. and Zhang, X. (2023), "Large-scale analysis of query logs to profile users for dataset search", Journal of Documentation, Vol. 79 No. 1, pp. 66-85.



Emerald Publishing Limited

Copyright © 2022, Emerald Publishing Limited

Related articles