Search results
1 – 10 of over 6000Isak Taksa, Sarah Zelikovitz and Amanda Spink
The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users.
Abstract
Purpose
The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users.
Design/methodology/approach
The authors describe an iterative system that begins with a small set of manually labeled terms, which are used to label queries from the log. A set of background knowledge related to these labeled queries is acquired by combining web search results on these queries. This background set is used to obtain many terms that are related to the classification task. The system then ranks each of the related terms, choosing those that most fit the personal properties of the users. These terms are then used to begin the next iteration.
Findings
The authors identify the difficulties of classifying web logs, by approaching this problem from a machine learning perspective. By applying the approach developed, the authors are able to show that many queries in a large query log can be classified.
Research limitations/implications
Testing results in this type of classification work is difficult, as the true personal properties of web users are unknown. Evaluation of the classification results in terms of the comparison of classified queries to well known age‐related sites is a direction that is currently being exploring.
Practical implications
This research is background work that can be incorporated in search engines or other web‐based applications, to help marketing companies and advertisers.
Originality/value
This research enhances the current state of knowledge in short‐text classification and query log learning.
Details
Keywords
Ashish Kathuria, Bernard J. Jansen, Carolyn Hafernik and Amanda Spink
Web search engines are frequently used by people to locate information on the Internet. However, not all queries have an informational goal. Instead of information, some people…
Abstract
Purpose
Web search engines are frequently used by people to locate information on the Internet. However, not all queries have an informational goal. Instead of information, some people may be looking for specific web sites or may wish to conduct transactions with web services. This paper aims to focus on automatically classifying the different user intents behind web queries.
Design/methodology/approach
For the research reported in this paper, 130,000 web search engine queries are categorized as informational, navigational, or transactional using a k‐means clustering approach based on a variety of query traits.
Findings
The research findings show that more than 75 percent of web queries (clustered into eight classifications) are informational in nature, with about 12 percent each for navigational and transactional. Results also show that web queries fall into eight clusters, six primarily informational, and one each of primarily transactional and navigational.
Research limitations/implications
This study provides an important contribution to web search literature because it provides information about the goals of searchers and a method for automatically classifying the intents of the user queries. Automatic classification of user intent can lead to improved web search engines by tailoring results to specific user needs.
Practical implications
The paper discusses how web search engines can use automatically classified user queries to provide more targeted and relevant results in web searching by implementing a real time classification method as presented in this research.
Originality/value
This research investigates a new application of a method for automatically classifying the intent of user queries. There has been limited research to date on automatically classifying the user intent of web queries, even though the pay‐off for web search engines can be quite beneficial.
Details
Keywords
The purpose of this paper is to examine the way in which end user searching on the web has become the primary method of locating digital images for many people. This paper seeks…
Abstract
Purpose
The purpose of this paper is to examine the way in which end user searching on the web has become the primary method of locating digital images for many people. This paper seeks to investigate how users structure these image queries.
Design/methodology/approach
This study investigates the structure and formation of image queries on the web by mapping a sample of web queries to three known query classification schemes for image searching (i.e. Enser and McGregor, Jörgensen, and Chen).
Findings
The results indicate that the features and attributes of web image queries differ relative to image queries utilized on other information retrieval systems and by other user populations. This research points to the need for five additional attributes (i.e. collections, pornography, presentation, URL, and cost) in order to classify web image queries, which were not present in any of the three prior classification schemes.
Research limitations/implications
Patterns in web searching for image content do emerge that inform the design of web‐based multimedia systems, namely, that there is a high interest in locating image collections by web searchers. Objects and people images are the predominant interest for web searchers. Cost is a factor for web searching. This knowledge of the structure of web image queries has implications for the design of image information retrieval systems and repositories, especially in the area of automatic tagging of images with metadata.
Originality/value
This is the first research that examines whether or not one can apply image query classifications schemes to web image queries.
Details
Keywords
Engines have been built that execute queries against XML data. The aim of this paper is to describe a novel technique that can be used to improve the speed of execution of the…
Abstract
Purpose
Engines have been built that execute queries against XML data. The aim of this paper is to describe a novel technique that can be used to improve the speed of execution of the queries based on semantics of the data in the XML document.
Design/methodology/approach
The paper formally introduces algorithms for optimizing XML queries, implement the algorithms, and through experimentation demonstrate the improvement in speed.
Findings
Three possible semantic query optimizations based on the values of elements were introduced and these demonstrate that two of the three optimizations improve query performance but the third does not. It is hypothesized why this is the case.
Research limitations/implications
A limitation is obviously the query engine and how it works. Future work includes, executing the experiments on a different engine and comparing results, building a system to automatically generate the characteristics that are necessary to do the optimization, describing the best way to represent and maintain the characteristics once they are found, compare the results of optimizations based on content with optimizations based on structure.
Practical implications
The optimizations could be incorporated into new query engines.
Originality/value
Novel algorithms for query optimization have been developed and proven to work. They are of value to people who are building database systems for XML data.
Details
Keywords
Majdi A. Maabreh, Mohammed N. Al‐Kabi and Izzat M. Alsmadi
This study is an attempt to develop an automatic identification method for Arabic web queries and divide them into several query types using data mining. In addition, it seeks to…
Abstract
Purpose
This study is an attempt to develop an automatic identification method for Arabic web queries and divide them into several query types using data mining. In addition, it seeks to evaluate the impact of the academic environment on using the internet.
Design/methodology/approach
The web log files were collected from one of the higher institute's servers over a one‐month period. A special program was designed and implemented to extract web search queries from these files and also to automatically classify Arabic queries into three query types (i.e. Navigational, Transactional, and Informational queries) based on predefined specifications for each type.
Findings
The results indicate that students are slowly and gradually using the internet for more relevant academic purposes. Tests showed that it is possible to automatically classify Arabic queries based on query terms, with 80.6 per cent to 80.2 per cent accuracy for the two phases of the test respectively. In their future strategies, Jordanian universities should apply methods to encourage university students to use the internet for academic purposes. Web search engines in general and Arabic search engines in particular may benefit from the proposed classification method in order to improve the effectiveness and relevancy of their results in accordance with users' needs.
Originality/value
Studying internet web logs has been the subject of many papers. However, the particular domain, and the specific focuses on this research are what can distinguish it from the others.
Details
Keywords
Xiaojuan Zhang, Shuguang Han and Wei Lu
The purpose of this paper is to predict news intent by exploring contextual and temporal features directly mined from a general search engine query log.
Abstract
Purpose
The purpose of this paper is to predict news intent by exploring contextual and temporal features directly mined from a general search engine query log.
Design/methodology/approach
First, a ground-truth data set with correctly marked news and non-news queries was built. Second, a detailed analysis of the search goals and topics distribution of news/non-news queries was conducted. Third, three news features, that is, the relationship between entity and contextual words extended from query sessions, topical similarity among clicked results and temporal burst point were obtained. Finally, to understand the utilities of the new features and prior features, extensive prediction experiments on SogouQ (a Chinese search engine query log) were conducted.
Findings
News intent can be predicted with high accuracy by using the proposed contextual and temporal features, and the macro average F1 of classification is around 0.8677. Contextual features are more effective than temporal features. All the three new features are useful and significant in improving the accuracy of news intent prediction.
Originality/value
This paper provides a new and different perspective in recognizing queries with news intent without use of such large corpora as social media (e.g. Wikipedia, Twitter and blogs) and news data sets. The research will be helpful for general-purpose search engines to address search intents for news events. In addition, the authors believe that the approaches described here in this paper are general enough to apply to other verticals with dynamic content and interest, such as blog or financial data.
Details
Keywords
A research project is reported in which techniques for the automatic classification of book material were investigated. Attention was focussed on three fundamental issues, namely…
Abstract
A research project is reported in which techniques for the automatic classification of book material were investigated. Attention was focussed on three fundamental issues, namely: the computer‐based surrogation of monographic material, the clustering of book surrogates on the basis of content association, and the evaluation of the resultant classifications. A test collection of 250 books, which was assembled on behalf of the project, is described together with its surrogation by means of the complete back‐of‐the‐book index, table of contents, title and Dewey classification code(s) of each volume. Some properties of hierarchic and non‐hierarchic automatic classifications of the test collection are discussed, followed by their evaluation with reference to a small set of queries and relevance judgements. Finally, a less formal evaluation of the classifications in terms of the logical appeal of the cluster membership is reported. The work has shown that, on a small experimental scale and in the context of the test data used, automatic classifications of book material represented by index list can be produced which are superior, on the basis of a generalized measure of effectiveness, to a conventional library classification of the same material.
Anne Chardonnens, Ettore Rizza, Mathias Coeckelbergs and Seth van Hooland
Advanced usage of web analytics tools allows to capture the content of user queries. Despite their relevant nature, the manual analysis of large volumes of user queries is…
Abstract
Purpose
Advanced usage of web analytics tools allows to capture the content of user queries. Despite their relevant nature, the manual analysis of large volumes of user queries is problematic. The purpose of this paper is to address the problem of named entity recognition in digital library user queries.
Design/methodology/approach
The paper presents a large-scale case study conducted at the Royal Library of Belgium in its online historical newspapers platform BelgicaPress. The object of the study is a data set of 83,854 queries resulting from 29,812 visits over a 12-month period. By making use of information extraction methods, knowledge bases (KBs) and various authority files, this paper presents the possibilities and limits to identify what percentage of end users are looking for person and place names.
Findings
Based on a quantitative assessment, the method can successfully identify the majority of person and place names from user queries. Due to the specific character of user queries and the nature of the KBs used, a limited amount of queries remained too ambiguous to be treated in an automated manner.
Originality/value
This paper demonstrates in an empirical manner how user queries can be extracted from a web analytics tool and how named entities can then be mapped with KBs and authority files, in order to facilitate automated analysis of their content. Methods and tools used are generalisable and can be reused by other collection holders.
Details
Keywords
Bernard J. Jansen, Mimi Zhang and Amanda Spink
To investigate and identify the patterns of interaction between searchers and search engine during web searching.
Abstract
Purpose
To investigate and identify the patterns of interaction between searchers and search engine during web searching.
Design/methodology/approach
The authors examined 2,465,145 interactions from 534,507 users of Dogpile.com submitted on May 6, 2005, and compared query reformulation patterns. They investigated the type of query modifications and query modification transitions within sessions.
Findings
The paper identifies three strong query reformulation transition patterns: between specialization and generalization; between video and audio, and between content change and system assistance. In addition, the findings show that web and images content were the most popular media collections.
Originality/value
This research sheds light on the more complex aspects of web searching involving query modifications.
Details
Keywords
This work aims to investigate the sensitivity of ranking performance with respect to the topic distribution of queries selected for ranking evaluation.
Abstract
Purpose
This work aims to investigate the sensitivity of ranking performance with respect to the topic distribution of queries selected for ranking evaluation.
Design/methodology/approach
The authors reweight queries used in two TREC tasks to make them match three real background topic distributions, and show that the performance rankings of retrieval systems are quite different.
Findings
It is found that search engines tend to perform similarly on queries about the same topic; and search engine performance is sensitive to the topic distribution of queries used in evaluation.
Originality/value
Using experiments with multiple real‐world query logs, the paper demonstrates weaknesses in the current evaluation model of retrieval systems.
Details