Search results
1 – 10 of over 2000Xiaojuan Zhang, Shuguang Han and Wei Lu
The purpose of this paper is to predict news intent by exploring contextual and temporal features directly mined from a general search engine query log.
Abstract
Purpose
The purpose of this paper is to predict news intent by exploring contextual and temporal features directly mined from a general search engine query log.
Design/methodology/approach
First, a ground-truth data set with correctly marked news and non-news queries was built. Second, a detailed analysis of the search goals and topics distribution of news/non-news queries was conducted. Third, three news features, that is, the relationship between entity and contextual words extended from query sessions, topical similarity among clicked results and temporal burst point were obtained. Finally, to understand the utilities of the new features and prior features, extensive prediction experiments on SogouQ (a Chinese search engine query log) were conducted.
Findings
News intent can be predicted with high accuracy by using the proposed contextual and temporal features, and the macro average F1 of classification is around 0.8677. Contextual features are more effective than temporal features. All the three new features are useful and significant in improving the accuracy of news intent prediction.
Originality/value
This paper provides a new and different perspective in recognizing queries with news intent without use of such large corpora as social media (e.g. Wikipedia, Twitter and blogs) and news data sets. The research will be helpful for general-purpose search engines to address search intents for news events. In addition, the authors believe that the approaches described here in this paper are general enough to apply to other verticals with dynamic content and interest, such as blog or financial data.
Details
Keywords
Ashish Kathuria, Bernard J. Jansen, Carolyn Hafernik and Amanda Spink
Web search engines are frequently used by people to locate information on the Internet. However, not all queries have an informational goal. Instead of information, some people…
Abstract
Purpose
Web search engines are frequently used by people to locate information on the Internet. However, not all queries have an informational goal. Instead of information, some people may be looking for specific web sites or may wish to conduct transactions with web services. This paper aims to focus on automatically classifying the different user intents behind web queries.
Design/methodology/approach
For the research reported in this paper, 130,000 web search engine queries are categorized as informational, navigational, or transactional using a k‐means clustering approach based on a variety of query traits.
Findings
The research findings show that more than 75 percent of web queries (clustered into eight classifications) are informational in nature, with about 12 percent each for navigational and transactional. Results also show that web queries fall into eight clusters, six primarily informational, and one each of primarily transactional and navigational.
Research limitations/implications
This study provides an important contribution to web search literature because it provides information about the goals of searchers and a method for automatically classifying the intents of the user queries. Automatic classification of user intent can lead to improved web search engines by tailoring results to specific user needs.
Practical implications
The paper discusses how web search engines can use automatically classified user queries to provide more targeted and relevant results in web searching by implementing a real time classification method as presented in this research.
Originality/value
This research investigates a new application of a method for automatically classifying the intent of user queries. There has been limited research to date on automatically classifying the user intent of web queries, even though the pay‐off for web search engines can be quite beneficial.
Details
Keywords
Xiaojuan Zhang, Xixi Jiang and Jiewen Qin
The purpose of this study is to generate diversified results for temporally ambiguous queries and the candidate queries are ensured to have a high coverage of subtopics, which are…
Abstract
Purpose
The purpose of this study is to generate diversified results for temporally ambiguous queries and the candidate queries are ensured to have a high coverage of subtopics, which are derived from different temporal periods.
Design/methodology/approach
Two novel time-aware query suggestion diversification models are developed by integrating semantics and temporality information involved in queries into two state-of-the-art explicit diversification algorithms (i.e. IA-select and xQuaD), respectively, and then specifying the components on which these two models rely on. Most importantly, first explored is how to explicitly determine query subtopics for each unique query from the query log or clicked documents and then modeling the subtopics into query suggestion diversification. The discussion on how to mine temporal intent behind a query from query log is also followed. Finally, to verify the effectiveness of the proposal, experiments on a real-world query log are conducted.
Findings
Preliminary experiments demonstrate that the proposed method can significantly outperform the existing state-of-the-art methods in terms of producing the candidate query suggestion for temporally ambiguous queries.
Originality/value
This study reports the first attempt to generate query suggestions indicating diverse interested time points to the temporally ambiguous (input) queries. The research will be useful in enhancing users’ search experience through helping them to formulate accurate queries for their search tasks. In addition, the approaches investigated in the paper are general enough to be used in many domains; that is, experimental information retrieval systems, Web search engines, document archives and digital libraries.
Details
Keywords
Purpose — Since a couple of years, we are confronted with the phenomenon of information overload. In particular, the web provides a rich source of a variety of information mainly…
Abstract
Purpose — Since a couple of years, we are confronted with the phenomenon of information overload. In particular, the web provides a rich source of a variety of information mainly in textual, i.e. unstructured form. Thus, web search faces new challenges that are how to make the user aware of the variety of content available and how to satisfy users best with such manifold content.
Methodology — This variety of content is considered as diversity, i.e. the reflection of a result set's coverage of multiple interpretations of a query. Diversification within web search aims on the one hand at adapting the ranking in a way that the top results are diverse. Increasingly important becomes on the other hand the organization and classification of content within diversification.
Findings — Various approaches to diversification are available or currently focus on research activities. They range from an adapted ranking by means of similarity measures or diversity scores to a comprehensive diversity analysis which determines topics and classifies text according to opinions etc.
Implications — Given the high diversity of web content, approaches for diversification are extremely important. Web search tries to address this problem from different perspectives. For the future, combination with image search result diversification is important. Further, benchmarks and standard data sets for evaluations need to be established to ensure comparability of results from various approaches.
Originality/value — This chapter provides an overview on diversity in web search from two directions: (a) Diversity is introduced with its notions and dimensions. (b) Methods to assess diversity within web search are presented.
Details
Keywords
Jana Besser, Martha Larson and Katja Hofmann
This research aims to identify users' goals and strategies when searching for podcasts and their impact on the design of podcast retrieval technology. In particular, the paper…
Abstract
Purpose
This research aims to identify users' goals and strategies when searching for podcasts and their impact on the design of podcast retrieval technology. In particular, the paper seeks to explore the potential to address user goals with indexing based on podcast metadata and automatic speech recognition (ASR) transcripts.
Design/methodology/approach
The paper conducted a user study to obtain an overview of podcast search behaviour and goals, using a multi‐method approach of an online survey, a diary study, and contextual interviews. In a subsequent podcast retrieval experiment, the paper investigated the retrieval performance of the two choices of indexing features for search goals identified during the study.
Findings
The paper found that study participants used a variety of search strategies, partially influenced by available tools and their perceptions of these tools. Furthermore the experimental results revealed that retrieval using ASR transcripts performed significantly better than metadata‐based searching. However, a detailed result analysis suggested that the efficacy of the indexing methods was search‐goal dependent.
Research limitations/implications
The research constitutes a step towards a future framework for investigating user needs and addressing them in an experimental set‐up. It was primarily qualitative and exploratory in nature.
Practical implications
Podcast search engines require evidence about suitable indexing methods in order to make an informed decision concerning whether it is worth the resources to generate speech recognition transcripts.
Originality/value
Systematic studies of podcast searching have not previously been reported. Investigations of this kind hold the potential to optimise podcast retrieval in the long term.
Details
Keywords
The purpose of this paper is to test major web search engines on their performance on navigational queries, i.e. searches for homepages.
Abstract
Purpose
The purpose of this paper is to test major web search engines on their performance on navigational queries, i.e. searches for homepages.
Design/methodology/approach
In total, 100 user queries are posed to six search engines (Google, Yahoo!, MSN, Ask, Seekport, and Exalead). Users described the desired pages, and the results position of these was recorded. Measured success and mean reciprocal rank are calculated.
Findings
The performance of the major search engines Google, Yahoo!, and MSN was found to be the best, with around 90 per cent of queries answered correctly. Ask and Exalead performed worse but received good scores as well.
Research limitations/implications
All queries were in German, and the German‐language interfaces of the search engines were used. Therefore, the results are only valid for German queries.
Practical implications
When designing a search engine to compete with the major search engines, care should be taken on the performance on navigational queries. Users can be influenced easily in their quality ratings of search engines based on this performance.
Originality/value
This study systematically compares the major search engines on navigational queries and compares the findings with studies on the retrieval effectiveness of the engines on informational queries.
Details
Keywords
Search engines and web applications have evolved to be more tailored toward individual user’s needs, including the individual’s personal preferences and geographic location. By…
Abstract
Purpose
Search engines and web applications have evolved to be more tailored toward individual user’s needs, including the individual’s personal preferences and geographic location. By integrating the free Google Maps Application Program Interface with locally stored metadata, the author created an interactive map search for users to locate, and navigate to, destinations on the University of New Mexico (UNM) campus. The purpose of this paper is to identify the characteristics of UNM map search queries, the options and prioritization of the metadata augmentation, and the usefulness and possible improvement of the interface.
Design/methodology/approach
Queries, search date/time, and the number of results found were logged and examined. Queries’ search frequency and characteristics were analyzed and categorized.
Findings
From November 1, 2012 to September 15, 2013, the author had a total 14,097 visits to the SearchUNM Maps page (http://search.unm.edu/maps/). There were total 5,868 searches (41 percent of all the page visits), and out of all the search instances, 2,297 of them (39 percent) did not retrieve any results. By analyzing the failed queries, the author was able to develop a strategy to increase successful searches.
Originality/value
Many academic institutions have implemented interactive map searches for users to find locations and navigate on campus. However, to date there is no related research on how users conduct their searches in such a scope. Based on the query analysis, this paper identifies user’s search behavior and discusses the strategies of improving searches results of campus interactive maps.
Details
Keywords
Ahmet Uyar and Farouk Musa Aliyu
The purpose of this paper is to better understand three main aspects of semantic web search engines of Google Knowledge Graph and Bing Satori. The authors investigated: coverage…
Abstract
Purpose
The purpose of this paper is to better understand three main aspects of semantic web search engines of Google Knowledge Graph and Bing Satori. The authors investigated: coverage of entity types, the extent of their support for list search services and the capabilities of their natural language query interfaces.
Design/methodology/approach
The authors manually submitted selected queries to these two semantic web search engines and evaluated the returned results. To test the coverage of entity types, the authors selected the entity types from Freebase database. To test the capabilities of natural language query interfaces, the authors used a manually developed query data set about US geography.
Findings
The results indicate that both semantic search engines cover only the very common entity types. In addition, the list search service is provided for a small percentage of entity types. Moreover, both search engines support queries with very limited complexity and with limited set of recognised terms.
Research limitations/implications
Both companies are continually working to improve their semantic web search engines. Therefore, the findings show their capabilities at the time of conducting this research.
Practical implications
The results show that in the near future the authors can expect both semantic search engines to expand their entity databases and improve their natural language interfaces.
Originality/value
As far as the authors know, this is the first study evaluating any aspect of newly developing semantic web search engines. It shows the current capabilities and limitations of these semantic web search engines. It provides directions to researchers by pointing out the main problems for semantic web search engines.
Details
Keywords
Users' search logs are implicit feedbacks on how searchers interact with online information retrieval (IR) systems. The purpose of this paper is to analyze search query…
Abstract
Purpose
Users' search logs are implicit feedbacks on how searchers interact with online information retrieval (IR) systems. The purpose of this paper is to analyze search query reformulation (SQR) patterns of University of Dar es Salaam remote OPAC users.
Design/methodology/approach
Qualitative and quantitative analysis of transaction logs were employed to ascertain the characteristics of search queries and the patterns in which remote OPAC users reformulate their search queries. The study covered a period of six months, commencing from January to June 2019.
Findings
A total of 30,474 search hits were submitted by remote OPAC users during the period under study. Individuals from academic and research institutions, computing consortia, and telecommunication companies are the main users of the system. Most of the searches originated from North America and Europe, with few searches coming from China and India. Besides improving search results, SQRs are linked with the existence of multiple information demands as manifested by the use of heterogeneous headwords within individual search episodes.
Research limitations/implications
Data collected covered only six months. Similarly, it was however not possible to analyze users' search query formulation within specific contexts such as task-based information searching.
Practical implications
A query recommendation system should be integrated into the OPAC functionalities to improve users' search experiences. Alternatively, there should be a migration to a new system that offers more advanced search features and functionalities.
Originality/value
The study has contributed new insights in SQR studies particularly on how non-institutional affiliated users translate their information needs into search queries during information searching processes.
Peer review
The peer review history for this article is available at: https://publons.com/publon/10.1108/OIR-09-2020-0389
Details
Keywords
Web search is more and more moving into mobile contexts. However, screen size of mobile devices is limited and search engine result pages face a trade-off between offering…
Abstract
Purpose
Web search is more and more moving into mobile contexts. However, screen size of mobile devices is limited and search engine result pages face a trade-off between offering informative snippets and optimal use of space. One factor clearly influencing this trade-off is snippet length. The purpose of this paper is to find out what snippet size to use in mobile web search.
Design/methodology/approach
For this purpose, an eye-tracking experiment was conducted showing participants search interfaces with snippets of one, three or five lines on a mobile device to analyze 17 dependent variables. In total, 31 participants took part in the study. Each of the participants solved informational and navigational tasks.
Findings
Results indicate a strong influence of page fold on scrolling behavior and attention distribution across search results. Regardless of query type, short snippets seem to provide too little information about the result, so that search performance and subjective measures are negatively affected. Long snippets of five lines lead to better performance than medium snippets for navigational queries, but to worse performance for informational queries.
Originality/value
Although space in mobile search is limited, this study shows that longer snippets improve usability and user experience. It further emphasizes that page fold plays a stronger role in mobile than in desktop search for attention distribution.
Details