Web Search Engine Research: Volume 4

Purpose — The purpose of this chapter is to give an overview of the context of Web search and search engine related research, as well as to introduce the reader to the sections and chapters of the book.

Methodology/approach — We review literature dealing with various aspects of search engines, with special emphasis on emerging areas of Web searching, search engine evaluation going beyond traditional methods and new perspectives on Web searching.

Findings — The approaches to studying Web search engines are manifold. Given the importance of Web search engines for knowledge acquisition, research from different perspectives needs to be integrated into a more cohesive perspective.

Research limitations/implications — The chapter suggests a basis for research in the field and also introduces further research directions.

Originality/value of paper — The chapter gives a concise overview of the topics dealt within the book and also shows directions for researchers interested in Web search engines.

Purpose — This chapter illustrates and explains the ambiguity and vagueness of the term social search and aims at describing and classifying the heterogeneous landscape of social search implementations on the WWW.

Methodology/approach — We have looked at different definitions as well as the context of social search by carrying out an extensive literature review, and tried to unify and enhance existing ideas and concepts. Our definition of social search is illustrated by a general review of existing social search engines, which are analyzed and described by their specific features and social aspects.

Findings — The chapter presents a discussion of social search as well as a comparison of existing social search engines.

Social implications — The definition of social search and the comparison of social search engines summarize the many ways people can search the web together and allow for an assessment of future developments in this area.

Originality/value of paper — Although different attempts to define social search have been made in the past, we present an argumentation that unifies some existing definitions and which is different from other interpretations of the social search concept. We present an overview and a comparison of the different genres of social search engines.

Purpose — To provide a theoretical background to understand current local search engines as an aspect of specialized search, and understand the data sources and used technologies.

Design/methodology/approach — Selected local search engines are examined and compared toward their use of geographic information retrieval (GIR) technologies, data sources, available entity information, processing, and interfaces. An introduction to the field of GIR is given and its use in the selected systems is discussed.

Findings — All selected commercial local search engines utilize GIR technology in varying degrees for information preparation and presentation. It is also starting to be used in regular Web search. However, major differences can be found between the different search engines.

Research limitations/implications — This study is not exhaustive and only uses informal comparisons without definitive ranking. Due to the unavailability of hard data, informed guesses were made based on available public interfaces and literature.

Practical implications — A source of background information for understanding the results of local search engines, their provenance, and their potential.

Originality/value — An overview of GIR technology in the context of commercial search engines integrates research efforts and commercial systems and helps to understand both sides better.

Purpose — The chapter presents the practical applications of web search statistics analysis. The process description highlights the potential use of search queries and statistical data and how they could be used in various forecasting situations. The presented case is an example of applied computational intelligence and the main focus is oriented towards the decision support offered by the software mechanism and its capabilities to automatically gather, process and analyse data.

Methodology/approach — The statistics of the search queries as a source of prognostic information are analysed in a step-by-step process, starting from their content and scope, their processing and applications, and concluding with usage in a software-based intelligent framework.

Research implications — The analysis of search engine trends offers a great opportunity for many areas of research. Into the future, deploying this information in the prognosis will further develop intelligent data processing.

Practical implications — This functionality offers a unique possibility, impossible until now, to observe, estimate and predict various processes using wide, precise and accurate behaviour observations. The scope and quality of data allow practitioners to successfully use it in various prognostic problems (i.e. political, medical, or economic).

Originality/value of paper — The chapter presents practical implications of technology. The chapter then highlights potential areas that would benefit from the analysis of queries statistics. Moreover, it introduces ‘WebPerceiver’, an intelligent platform, built to make the analysis and usage of search trends easier and more generally available to a wide audience, including non-skilled users.

Purpose — The overall quality of an information retrieval system depends on many different aspects of the system and its users' information seeking behaviour, such as the speed of the system, the user interface, the query language and the features provided by the engine. One of the most important aspects is the effectiveness of the retrieval system, i.e. its ability to retrieve items that are relevant to the information need of an end user. This chapter focuses on methods for measuring effectiveness, in particular focusing on recent work that more directly models the utility of an engine to its users.

Methodology/approach — We discuss traditional approaches to effectiveness evaluation based on test collections, then transition to approaches based on test collections along with explicit models of user interaction with search results. We contrast this with approaches for which the user is ‘in the loop’, such as user studies and online evaluations.

Research limitations/implications — If it were possible to model users perfectly, we could directly estimate the utility of a search engine to its users; this would undoubtedly have a transformative effect on information retrieval and web search research. In practice, this goal will never be achievable because users exhibit far too much variability in how they approach the search engine, and furthermore provide valuable feedback that models and simulations cannot provide. Nevertheless, better models of user interaction will help develop better web search engines for a wider variety of tasks more rapidly.

Originality/value of paper — This is the first work that surveys recent work on user model-based evaluation and places it in a context with traditional evaluation based on the Cranfield paradigm.

Purpose — Since a couple of years, we are confronted with the phenomenon of information overload. In particular, the web provides a rich source of a variety of information mainly in textual, i.e. unstructured form. Thus, web search faces new challenges that are how to make the user aware of the variety of content available and how to satisfy users best with such manifold content.

Methodology — This variety of content is considered as diversity, i.e. the reflection of a result set's coverage of multiple interpretations of a query. Diversification within web search aims on the one hand at adapting the ranking in a way that the top results are diverse. Increasingly important becomes on the other hand the organization and classification of content within diversification.

Findings — Various approaches to diversification are available or currently focus on research activities. They range from an adapted ranking by means of similarity measures or diversity scores to a comprehensive diversity analysis which determines topics and classifies text according to opinions etc.

Implications — Given the high diversity of web content, approaches for diversification are extremely important. Web search tries to address this problem from different perspectives. For the future, combination with image search result diversification is important. Further, benchmarks and standard data sets for evaluations need to be established to ensure comparability of results from various approaches.

Originality/value — This chapter provides an overview on diversity in web search from two directions: (a) Diversity is introduced with its notions and dimensions. (b) Methods to assess diversity within web search are presented.

Purpose — To develop methodologies to evaluate search engines according to an individual's preference in an easy and reliable manner, and to formulate user-oriented metrics to compare freshness and duplication in search results.

Design/methodology/approach — A personalised evaluation model for comparing search engines is designed as a hierarchy of weighted parameters. These commonly found search engine features and performance measures are given quantitative and qualitative ratings by an individual user. Furthermore, three performance measurement metrics are formulated and presented as histograms for visual inspection. A methodology is introduced to quantitatively compare and recognise the different histogram patterns within the context of search engine performance.

Findings — Precision and recall are the fundamental measures used in many search engine evaluations due to their simplicity, fairness and reliability. Most recent evaluation models are user oriented and focus on relevance issues. Identifiable statistical patterns are found in performance measures of search engines.

Research limitations/implications — The specific parameters used in the evaluation model could be further refined. A larger scale user study would confirm the validity and usefulness of the model. The three performance measures presented give a reasonably informative overview of the characteristics of a search engine. However, additional performance parameters and their resulting statistical patterns would make the methodology more valuable to the users.

Practical implications — The easy-to-use personalised search engine evaluation model can be tailored to an individual's preference and needs simply by changing the weights and modifying the features considered. A user is able to get an idea of the characteristics of a search engine quickly using the quantitative measure of histogram patterns that represent the search performance metrics introduced.

Originality/value — The presented work is considered original as one of the first search engine evaluation models that can be personalised. This enables a Web searcher to choose an appropriate search engine for his/her needs and hence finding the right information in the shortest time with the least effort.

Purpose — Ranking is a natural task for a search engine; a search engine result page is the most common example. This chapter aims at illustrating the motivations and the concepts of rank correlation in a practical way for the researchers active in the different domains of search engines.

Methodology/approach — To this end, this chapter provides a survey according to a topic-oriented basis of the search engine evaluation literature specifically devoted to or based on rank correlation; the chapter explains and illustrates how statistics is the only approach to rank correlation.

Findings/research limitations/implications — The chapter introduces the pros and cons of rank correlation measures through a light-weight formal description and a number of concrete examples to find the measure that better fit a context.

Practical implications — This chapter provides a blueprint for the application of rank correlation within scientific experimentation or item/service recommendation.

Social implications — Rank correlation analyses impact on the success or failure of a search engine in performing the tasks for which it has been designed and hence on the people's daily life activities.

Originality/value of paper — This chapter places rank correlation within a scientific research perspective and in particular connects to and complements documentation on search engine evaluation.

Purpose — We assert that researchers developing new web interaction tools should consider an array of user motives beyond query-based information retrieval. This chapter reports on two probes used to investigate user activities that go beyond search as traditionally conceived.

Design/methodology — This chapter reviews research on user experiences with search engines and general web use. It then describes the design and case study of cards and pebbles, two search engine-based probes developed to help elicit new concepts for web-based experiences.

Findings — Participants reflect on their experiences with the probes and offer ideas regarding how to incrementally shift the traditional search paradigm and conceive of the web in new ways.

Implications/value — This investigation serves as a starting point by offering criteria that should be considered when designing new ‘beyond search’ tools.

Purpose — To provide an overview of recent research that examined how search engine users evaluate and select Web search results and how alternative search engine interfaces can support Web users' credibility assessment of Web search results.

Design/methodology/approach — As theoretical background, Information Foraging Theory (Pirolli, 2007; Pirolli & Card, 1999) from cognitive science and Prominence-Interpretation-Theory (Fogg, 2003) from communication and persuasion research are presented. Furthermore, a range of recent empirical research that investigated the effects of alternative SERP layouts on searchers' information quality or credibility assessments of search results are reviewed and approaches that aim at automatically classifying search results according to specific genre categories are reported.

Findings — The chapter reports on findings that Web users often rely heavily on the ranking provided by the search engines without paying much attention to the reliability or trustworthiness of the Web pages. Furthermore, the chapter outlines how alternative search engine interfaces that display search results in a format different from a list and/or provide prominent quality-related cues in the SERPs can foster searchers' credibility evaluations.

Research limitations/implications — The reported empirical studies, search engine interfaces, and Web page classification systems are not an exhaustive list.

Originality/value — The chapter provides insights for researchers, search engine developers, educators, and students on how the development and use of alternative search engine interfaces might affect Web users' search and evaluation strategies during Web search as well as their search outcomes in terms of retrieving high-quality, credible information.

Purpose — The purpose of this discussion is, first, to review the concept of truth claim and how it forms the framework for four research traditions: science, social science, law, and judgments of excellence. Then, the operational mechanisms of networks are reviewed. The discussion concludes by introducing three philosophic perspectives that might deepen the meanings nascent in the concept of “search.”

Methodology/approach — The methodology includes a historical approach to outline brief but sufficient definitions for how truth claims are built in four established research traditions. Each tradition is then analyzed with a view to testing its methods. The tests suggest a number of pathways to reframe search engine results in order to evaluate their relationship to the previously established types of truth claims.

Findings — The findings constitute an outline of the research traditions in the four areas of science, social science, law, and judgments of excellence. These are followed by a review of the current configurations of networks, their infrastructures, and their capabilities, including a brief section on the importance of search engine mechanisms. Crawling, indexing, and then ranking form the operational mechanisms that search engines employ in delivering search results. It is clear that each operation introduces logical problems. Then, the final sections outline three widely ranging philosophic perspectives on the nature of search: (1) an aesthetic theory of indexing, (2) understanding search from the psychology of learning, and (3) an exploration of the relationship between performativity and recent economic models of how data accumulates in today's world.

Research implications — It is suggested that exploration of a deeper philosophical perspective will assist library and information science (LIS) scholars to reframe Web search in ways that allow linkages to the established research traditions.

Originality/value of the paper — The idea of testing the “truth claim” as connected to traditional research methods was presented initially by Rall (2002, 2004). This area has been neglected in the literature as many Internet scholars find that the philosophy of research methodologies remains outside of their knowledge base. Overall, LIS scholars have focused on information seekers, on the politics of search engines, as well as documenting the computational problems that are present in search engine results. The consideration of how truth claims are formed and subsequently tested will allow LIS researchers to explore the linkages between their current studies and the established frameworks of scholarly research.

Dirk Ahlers studied computer science at the Carl-von-Ossietzky-University Oldenburg, Germany. In 2005, he started working as a research assistant at the OFFIS Institute for Information Technology, Oldenburg where he conducted projects in mobility and geospatial retrieval. While working at OFFIS, he also pursued his PhD with a topic in geographic information retrieval. He is currently working at UNITEC, a private university in Tegucigalpa, Honduras where he researches potential for local search for the case of a country with little Web coverage, challenging informal address schemes, and uncertain location data. His research interests are geospatial Web information retrieval, search engines, location-based services, Web technology, mobility, and everything geo. E-mail: dirk@dhere.de

