Search results

1 – 10 of over 6000
Article
Publication date: 20 December 2007

Isak Taksa, Sarah Zelikovitz and Amanda Spink

The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users.

483

Abstract

Purpose

The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users.

Design/methodology/approach

The authors describe an iterative system that begins with a small set of manually labeled terms, which are used to label queries from the log. A set of background knowledge related to these labeled queries is acquired by combining web search results on these queries. This background set is used to obtain many terms that are related to the classification task. The system then ranks each of the related terms, choosing those that most fit the personal properties of the users. These terms are then used to begin the next iteration.

Findings

The authors identify the difficulties of classifying web logs, by approaching this problem from a machine learning perspective. By applying the approach developed, the authors are able to show that many queries in a large query log can be classified.

Research limitations/implications

Testing results in this type of classification work is difficult, as the true personal properties of web users are unknown. Evaluation of the classification results in terms of the comparison of classified queries to well known age‐related sites is a direction that is currently being exploring.

Practical implications

This research is background work that can be incorporated in search engines or other web‐based applications, to help marketing companies and advertisers.

Originality/value

This research enhances the current state of knowledge in short‐text classification and query log learning.

Details

International Journal of Web Information Systems, vol. 3 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 19 October 2010

Ashish Kathuria, Bernard J. Jansen, Carolyn Hafernik and Amanda Spink

Web search engines are frequently used by people to locate information on the Internet. However, not all queries have an informational goal. Instead of information, some people…

1357

Abstract

Purpose

Web search engines are frequently used by people to locate information on the Internet. However, not all queries have an informational goal. Instead of information, some people may be looking for specific web sites or may wish to conduct transactions with web services. This paper aims to focus on automatically classifying the different user intents behind web queries.

Design/methodology/approach

For the research reported in this paper, 130,000 web search engine queries are categorized as informational, navigational, or transactional using a k‐means clustering approach based on a variety of query traits.

Findings

The research findings show that more than 75 percent of web queries (clustered into eight classifications) are informational in nature, with about 12 percent each for navigational and transactional. Results also show that web queries fall into eight clusters, six primarily informational, and one each of primarily transactional and navigational.

Research limitations/implications

This study provides an important contribution to web search literature because it provides information about the goals of searchers and a method for automatically classifying the intents of the user queries. Automatic classification of user intent can lead to improved web search engines by tailoring results to specific user needs.

Practical implications

The paper discusses how web search engines can use automatically classified user queries to provide more targeted and relevant results in web searching by implementing a real time classification method as presented in this research.

Originality/value

This research investigates a new application of a method for automatically classifying the intent of user queries. There has been limited research to date on automatically classifying the user intent of web queries, even though the pay‐off for web search engines can be quite beneficial.

Details

Internet Research, vol. 20 no. 5
Type: Research Article
ISSN: 1066-2243

Keywords

Article
Publication date: 18 January 2008

Bernard J. Jansen

The purpose of this paper is to examine the way in which end user searching on the web has become the primary method of locating digital images for many people. This paper seeks…

1995

Abstract

Purpose

The purpose of this paper is to examine the way in which end user searching on the web has become the primary method of locating digital images for many people. This paper seeks to investigate how users structure these image queries.

Design/methodology/approach

This study investigates the structure and formation of image queries on the web by mapping a sample of web queries to three known query classification schemes for image searching (i.e. Enser and McGregor, Jörgensen, and Chen).

Findings

The results indicate that the features and attributes of web image queries differ relative to image queries utilized on other information retrieval systems and by other user populations. This research points to the need for five additional attributes (i.e. collections, pornography, presentation, URL, and cost) in order to classify web image queries, which were not present in any of the three prior classification schemes.

Research limitations/implications

Patterns in web searching for image content do emerge that inform the design of web‐based multimedia systems, namely, that there is a high interest in locating image collections by web searchers. Objects and people images are the predominant interest for web searchers. Cost is a factor for web searching. This knowledge of the structure of web image queries has implications for the design of image information retrieval systems and repositories, especially in the area of automatic tagging of images with metadata.

Originality/value

This is the first research that examines whether or not one can apply image query classifications schemes to web image queries.

Details

Journal of Documentation, vol. 64 no. 1
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 29 August 2008

Ke Geng and Gillian Dobbie

Engines have been built that execute queries against XML data. The aim of this paper is to describe a novel technique that can be used to improve the speed of execution of the…

Abstract

Purpose

Engines have been built that execute queries against XML data. The aim of this paper is to describe a novel technique that can be used to improve the speed of execution of the queries based on semantics of the data in the XML document.

Design/methodology/approach

The paper formally introduces algorithms for optimizing XML queries, implement the algorithms, and through experimentation demonstrate the improvement in speed.

Findings

Three possible semantic query optimizations based on the values of elements were introduced and these demonstrate that two of the three optimizations improve query performance but the third does not. It is hypothesized why this is the case.

Research limitations/implications

A limitation is obviously the query engine and how it works. Future work includes, executing the experiments on a different engine and comparing results, building a system to automatically generate the characteristics that are necessary to do the optimization, describing the best way to represent and maintain the characteristics once they are found, compare the results of optimizations based on content with optimizations based on structure.

Practical implications

The optimizations could be incorporated into new query engines.

Originality/value

Novel algorithms for query optimization have been developed and proven to work. They are of value to people who are building database systems for XML data.

Details

International Journal of Web Information Systems, vol. 4 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 20 April 2012

Majdi A. Maabreh, Mohammed N. Al‐Kabi and Izzat M. Alsmadi

This study is an attempt to develop an automatic identification method for Arabic web queries and divide them into several query types using data mining. In addition, it seeks to…

1188

Abstract

Purpose

This study is an attempt to develop an automatic identification method for Arabic web queries and divide them into several query types using data mining. In addition, it seeks to evaluate the impact of the academic environment on using the internet.

Design/methodology/approach

The web log files were collected from one of the higher institute's servers over a one‐month period. A special program was designed and implemented to extract web search queries from these files and also to automatically classify Arabic queries into three query types (i.e. Navigational, Transactional, and Informational queries) based on predefined specifications for each type.

Findings

The results indicate that students are slowly and gradually using the internet for more relevant academic purposes. Tests showed that it is possible to automatically classify Arabic queries based on query terms, with 80.6 per cent to 80.2 per cent accuracy for the two phases of the test respectively. In their future strategies, Jordanian universities should apply methods to encourage university students to use the internet for academic purposes. Web search engines in general and Arabic search engines in particular may benefit from the proposed classification method in order to improve the effectiveness and relevancy of their results in accordance with users' needs.

Originality/value

Studying internet web logs has been the subject of many papers. However, the particular domain, and the specific focuses on this research are what can distinguish it from the others.

Details

Program, vol. 46 no. 2
Type: Research Article
ISSN: 0033-0337

Keywords

Article
Publication date: 5 November 2018

Xiaojuan Zhang, Shuguang Han and Wei Lu

The purpose of this paper is to predict news intent by exploring contextual and temporal features directly mined from a general search engine query log.

204

Abstract

Purpose

The purpose of this paper is to predict news intent by exploring contextual and temporal features directly mined from a general search engine query log.

Design/methodology/approach

First, a ground-truth data set with correctly marked news and non-news queries was built. Second, a detailed analysis of the search goals and topics distribution of news/non-news queries was conducted. Third, three news features, that is, the relationship between entity and contextual words extended from query sessions, topical similarity among clicked results and temporal burst point were obtained. Finally, to understand the utilities of the new features and prior features, extensive prediction experiments on SogouQ (a Chinese search engine query log) were conducted.

Findings

News intent can be predicted with high accuracy by using the proposed contextual and temporal features, and the macro average F1 of classification is around 0.8677. Contextual features are more effective than temporal features. All the three new features are useful and significant in improving the accuracy of news intent prediction.

Originality/value

This paper provides a new and different perspective in recognizing queries with news intent without use of such large corpora as social media (e.g. Wikipedia, Twitter and blogs) and news data sets. The research will be helpful for general-purpose search engines to address search intents for news events. In addition, the authors believe that the approaches described here in this paper are general enough to apply to other verticals with dynamic content and interest, such as blog or financial data.

Details

The Electronic Library, vol. 36 no. 5
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 1 March 1985

P.G.B. ENSER

A research project is reported in which techniques for the automatic classification of book material were investigated. Attention was focussed on three fundamental issues, namely…

Abstract

A research project is reported in which techniques for the automatic classification of book material were investigated. Attention was focussed on three fundamental issues, namely: the computer‐based surrogation of monographic material, the clustering of book surrogates on the basis of content association, and the evaluation of the resultant classifications. A test collection of 250 books, which was assembled on behalf of the project, is described together with its surrogation by means of the complete back‐of‐the‐book index, table of contents, title and Dewey classification code(s) of each volume. Some properties of hierarchic and non‐hierarchic automatic classifications of the test collection are discussed, followed by their evaluation with reference to a small set of queries and relevance judgements. Finally, a less formal evaluation of the classifications in terms of the logical appeal of the cluster membership is reported. The work has shown that, on a small experimental scale and in the context of the test data used, automatic classifications of book material represented by index list can be produced which are superior, on the basis of a generalized measure of effectiveness, to a conventional library classification of the same material.

Details

Journal of Documentation, vol. 41 no. 3
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 14 May 2018

Anne Chardonnens, Ettore Rizza, Mathias Coeckelbergs and Seth van Hooland

Advanced usage of web analytics tools allows to capture the content of user queries. Despite their relevant nature, the manual analysis of large volumes of user queries is…

Abstract

Purpose

Advanced usage of web analytics tools allows to capture the content of user queries. Despite their relevant nature, the manual analysis of large volumes of user queries is problematic. The purpose of this paper is to address the problem of named entity recognition in digital library user queries.

Design/methodology/approach

The paper presents a large-scale case study conducted at the Royal Library of Belgium in its online historical newspapers platform BelgicaPress. The object of the study is a data set of 83,854 queries resulting from 29,812 visits over a 12-month period. By making use of information extraction methods, knowledge bases (KBs) and various authority files, this paper presents the possibilities and limits to identify what percentage of end users are looking for person and place names.

Findings

Based on a quantitative assessment, the method can successfully identify the majority of person and place names from user queries. Due to the specific character of user queries and the nature of the KBs used, a limited amount of queries remained too ambiguous to be treated in an automated manner.

Originality/value

This paper demonstrates in an empirical manner how user queries can be extracted from a web analytics tool and how named entities can then be mapped with KBs and authority files, in order to facilitate automated analysis of their content. Methods and tools used are generalisable and can be reused by other collection holders.

Details

Journal of Documentation, vol. 74 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 20 December 2007

Bernard J. Jansen, Mimi Zhang and Amanda Spink

To investigate and identify the patterns of interaction between searchers and search engine during web searching.

524

Abstract

Purpose

To investigate and identify the patterns of interaction between searchers and search engine during web searching.

Design/methodology/approach

The authors examined 2,465,145 interactions from 534,507 users of Dogpile.com submitted on May 6, 2005, and compared query reformulation patterns. They investigated the type of query modifications and query modification transitions within sessions.

Findings

The paper identifies three strong query reformulation transition patterns: between specialization and generalization; between video and audio, and between content change and system assistance. In addition, the findings show that web and images content were the most popular media collections.

Originality/value

This research sheds light on the more complex aspects of web searching involving query modifications.

Details

International Journal of Web Information Systems, vol. 3 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 29 November 2011

Na Dai and Brian D. Davison

This work aims to investigate the sensitivity of ranking performance with respect to the topic distribution of queries selected for ranking evaluation.

Abstract

Purpose

This work aims to investigate the sensitivity of ranking performance with respect to the topic distribution of queries selected for ranking evaluation.

Design/methodology/approach

The authors reweight queries used in two TREC tasks to make them match three real background topic distributions, and show that the performance rankings of retrieval systems are quite different.

Findings

It is found that search engines tend to perform similarly on queries about the same topic; and search engine performance is sensitive to the topic distribution of queries used in evaluation.

Originality/value

Using experiments with multiple real‐world query logs, the paper demonstrates weaknesses in the current evaluation model of retrieval systems.

1 – 10 of over 6000