Search results

1 – 10 of over 14000
To view the access options for this content please click here
Article
Publication date: 14 November 2016

Bahjat Fatima, Huma Ramzan and Sohail Asghar

The purpose of this paper is to critically analyze the state-of-the-art session identification techniques used in web usage mining (WUM) process in terms of their…

Abstract

Purpose

The purpose of this paper is to critically analyze the state-of-the-art session identification techniques used in web usage mining (WUM) process in terms of their limitations, features, and methodologies.

Design/methodology/approach

In this research, systematic literature review has been conducted using review protocol approach. The methodology consisted of a comprehensive search for relevant literature over the period of 2005-2015, using four online database repositories (i.e. IEEE, Springer, ACM Digital Library, and ScienceDirect).

Findings

The findings revealed that this research area is still immature and existing literature lacks the critical review of recent session identification techniques used in WUM process.

Originality/value

The contribution of this study is to provide a structured overview of the research developments, to critically review the existing session identification techniques, highlight their limitations and associated challenges and identify areas where further improvements are required so as to complement the performance of existing techniques.

Details

Online Information Review, vol. 40 no. 7
Type: Research Article
ISSN: 1468-4527

Keywords

To view the access options for this content please click here
Article
Publication date: 5 September 2008

Seda Ozmutlu and Gencer C. Cosar

Identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. Recently, various studies have focused on…

Abstract

Purpose

Identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. Recently, various studies have focused on new topic identification/session identification of search engine transaction logs, and several problems regarding the estimation of topic shifts and continuations were observed in these studies. This study aims to analyze the reasons for the problems that were encountered as a result of applying automatic new topic identification.

Design/methodology/approach

Measures, such as cleaning the data of common words and analyzing the errors of automatic new topic identification, are applied to eliminate the problems in estimating topic shifts and continuations.

Findings

The findings show that the resulting errors of automatic new topic identification have a pattern, and further research is required to improve the performance of automatic new topic identification.

Originality/value

Improving the performance of automatic new topic identification would be valuable to search engine designers, so that they can develop new clustering and query recommendation algorithms, as well as custom‐tailored graphical user interfaces for search engine users.

Details

Library Hi Tech, vol. 26 no. 3
Type: Research Article
ISSN: 0737-8831

Keywords

To view the access options for this content please click here
Article
Publication date: 8 June 2010

Guillermo Navarro‐Arribas and Vicenç Torra

The purpose of this paper is to anonymize web server log files used in e‐commerce web mining processes.

Downloads
1443

Abstract

Purpose

The purpose of this paper is to anonymize web server log files used in e‐commerce web mining processes.

Design/methodology/approach

The paper has applied statistical disclosure control (SDC) techniques to achieve its goal. More precisely, it has introduced the micro‐aggregation of web access logs.

Findings

The experiments show that the proposed technique provides good results in general, but it is especially outstanding when dealing with relatively small websites.

Research limitations/implications

As in all SDC techniques there is always a trade‐off between privacy and utility or, in other words, between disclosure risk and information loss. In this proposal, it has borne this issue in mind, providing k‐anonymity, while preserving acceptable information accuracy.

Practical implications

Web server logs are valuable information used nowadays for user profiling and general data‐mining analysis of a website in e‐commerce and e‐services. This proposal allows anonymizing such logs, so they can be safely outsourced to other companies for marketing purposes, stored for further analysis, or made publicly available, without risking customer privacy.

Originality/value

Current solutions to the problem presented here are very poor and scarce. They are normally reduced to the elimination of sensitive information from query strings of URLs in general. Moreover, to its knowledge, the use of SDC techniques has never been applied to the anonymization of web logs.

Details

Internet Research, vol. 20 no. 3
Type: Research Article
ISSN: 1066-2243

Keywords

To view the access options for this content please click here
Article
Publication date: 6 September 2021

Sivaraman Eswaran, Vakula Rani, Daniel D., Jayabrabu Ramakrishnan and Sadhana Selvakumar

In the recent era, banking infrastructure constructs various remotely handled platforms for users. However, the security risk toward the banking sector has also elevated…

Abstract

Purpose

In the recent era, banking infrastructure constructs various remotely handled platforms for users. However, the security risk toward the banking sector has also elevated, as it is visible from the rising number of reported attacks against these security systems. Intelligence shows that cyberattacks of the crawlers are increasing. Malicious crawlers can crawl the Web pages, crack the passwords and reap the private data of the users. Besides, intrusion detection systems in a dynamic environment provide more false positives. The purpose of this research paper is to propose an efficient methodology to sense the attacks for creating low levels of false positives.

Design/methodology/approach

In this research, the authors have developed an efficient approach for malicious crawler detection and correlated the security alerts. The behavioral features of the crawlers are examined for the recognition of the malicious crawlers, and a novel methodology is proposed to improvise the bank user portal security. The authors have compared various machine learning strategies including Bayesian network, support sector machine (SVM) and decision tree.

Findings

This proposed work stretches in various aspects. Initially, the outcomes are stated for the mixture of different kinds of log files. Then, distinct sites of various log files are selected for the construction of the acceptable data sets. Session identification, attribute extraction, session labeling and classification were held. Moreover, this approach clustered the meta-alerts into higher level meta-alerts for fusing multistages of attacks and the various types of attacks.

Originality/value

This methodology used incremental clustering techniques and analyzed the probability of existing topologies in SVM classifiers for more deterministic classification. It also enhanced the taxonomy for various domains.

Details

International Journal of Pervasive Computing and Communications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1742-7371

Keywords

To view the access options for this content please click here
Article
Publication date: 6 March 2009

Christine Merk, Frank Scholze and Nils Windisch

The purpose of this paper is to present how the JISC Usage Statistics Review Project aims to formulate a fundamental scheme for recording usage data and to propose a…

Downloads
857

Abstract

Purpose

The purpose of this paper is to present how the JISC Usage Statistics Review Project aims to formulate a fundamental scheme for recording usage data and to propose a standard for its aggregation to provide meaningful and comparable item‐level usage statistics for electronic documents such as, for example, research papers and scientific resources.

Design/methodology/approach

A core element of the project has been a stakeholder workshop. This workshop was held in Berlin, 7/8 July 2008. Representatives of key stakeholder groups (repositories, libraries, COUNTER, IRStats, JISC, LogEc, MESUR, OA‐Statistics and other Open Access projects) were invited. During the workshop a fundamental scheme for the recording and the exchange of log files was discussed as well as the normalization of data collected.

Findings

The following mandatory elements describing usage events were agreed during the stakeholder workshop: Who – identification of user/session, What – item identification and type of request performed (e.g. full‐text, front‐page, including failed/partially fulfilled requests), When – date and time, usage event ID. The following elements were regarded as optional: From where – referrer/the referring entity and identity of the service. Usage events should be exchanged in the form of OpenURL Context Objects using OAI. Automated access (e.g. robots) should be tagged. The definition of automated access has to be straightforward with an option of gradual refinement. Users have to be identified unambiguously, but without recording personal data to avoid conflicts with privacy laws. Policies on statistics should be formulated for the repository community as well as the publishing community. Information about statistics policies should be available on services like OpenDOAR and RoMEO.

Originality/value

The paper is based on the detailed project report to the JISC, available at http://ie‐repository.jisc.ac.uk/250/

Details

Library Hi Tech, vol. 27 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

To view the access options for this content please click here
Article
Publication date: 1 August 2005

Baoyao Zhou, Siu Cheung Hui and Alvis C. M. Fong

With the explosive growth of information available on the World Wide Web, it has become much more difficult to access relevant information from the Web. One possible…

Abstract

With the explosive growth of information available on the World Wide Web, it has become much more difficult to access relevant information from the Web. One possible approach to solve this problem is web personalization. In this paper, we propose a novel WUL (Web Usage Lattice) based mining approach for mining association access pattern rules for personalized web recommendations. The proposed approach aims to mine a reduced set of effective association pattern rules for enhancing the online performance of web recommendations. We have incorporated the proposed approach into a personalized web recommender system known as AWARS. The performance of the proposed approach is evaluated based on the efficiency and the quality. In the efficiency evaluation, we measure the number of generated rules and the runtime for online recommendations. In the quality evaluation, we measure the quality of the recommendation service based on precision, satisfactory and applicability. This paper will discuss the proposed WUL‐based mining approach, and give the performance of the proposed approach in comparison with the Apriori‐based algorithms.

Details

International Journal of Web Information Systems, vol. 1 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

To view the access options for this content please click here
Article
Publication date: 1 August 1998

Shahana Sen, Balaji Padmanabhan, Alexander Tuzhilin, Norman H. White and Roger Stein

Investigates the information needs of marketers on the WWW for consumer analysis purposes, and examines how these can be met. Begins by briefly highlighting the…

Downloads
2563

Abstract

Investigates the information needs of marketers on the WWW for consumer analysis purposes, and examines how these can be met. Begins by briefly highlighting the attractiveness of the Web as a medium for communication. Next, summarizes the bases for consumer analysis and segmentation in the context of the traditional marketing communication media. Based on this framework, examines consumer analysis needs in the context of the WWW medium and proposes analysis variables relevant to this medium. Finally, discusses how the information needs for consumer analysis may be met from the different information sources available to a marketer, including the Web logfiles which are generated as a result of tracking the interactions of visitors accessing information from a company’s Web site.

Details

European Journal of Marketing, vol. 32 no. 7/8
Type: Research Article
ISSN: 0309-0566

Keywords

To view the access options for this content please click here
Article
Publication date: 23 October 2007

David Nicholas, Paul Huntington and Hamid R. Jamali

The purpose of this research is to examine the impact on usage of the journal Nucleic Acids Research (NAR) moving to an open access model. A major objective was to examine…

Downloads
1368

Abstract

Purpose

The purpose of this research is to examine the impact on usage of the journal Nucleic Acids Research (NAR) moving to an open access model. A major objective was to examine the impact of open access in the context of other initiatives that have improved accessibility to scholarly journals. The study also aims to demonstrate the potential of deep log analysis for monitoring change in usage over time.

Design/methodology/approach

Data were gathered from the logs for the period 2003‐June 2005 and analysed using deep log methods. The data were analysed to provide the following information on use: type of item viewed; usage over time; usage for individual journal issues; usage per type of article; age of article. Usage analyses were further examined with regard to the following user characteristics: subscriber/non‐subscriber; referrer link employed, organisational affiliation; geographical location.

Findings

The analysis showed that the rise in use of NAR over the survey period (140 per cent) could largely be attributed to the opening up of the site to search engines and that the move to OA had a relatively small influence on driving usage up further (less than 10 per cent).

Originality/value

The study for the first time thoroughly analyses the usage data of a significant experimental open access journal and reveals the huge impact of search engines on driving up usage.

Details

Journal of Documentation, vol. 63 no. 6
Type: Research Article
ISSN: 0022-0418

Keywords

To view the access options for this content please click here
Article
Publication date: 20 February 2009

Jin Zhang and Dietmar Wolfram

The purpose of this article is to investigate obesity‐related queries from a public health portal (HealthLink) transaction log.

Abstract

Purpose

The purpose of this article is to investigate obesity‐related queries from a public health portal (HealthLink) transaction log.

Design/methodology/approach

Multidimensional scaling (MDS) was applied to each of five obesity‐related focus keywords and their co‐occurring terms in submitted queries. After the transaction log data were collected and cleaned, and query terms were extracted and parsed, relationships between a focus keyword and its co‐occurring terms were established. Clustering relationships between focus keywords and their co‐occurring terms were identified and analysed in the MDS visual context.

Findings

The MDS analysis produced satisfactory outcomes for all five focus keywords. The term “placements”, in the visual configurations revealed strong grouping tendencies of three to five clusters for each focus keyword.

Originality/value

The findings of this study provide insights into health consumers' internet‐based information‐seeking behaviour on obesity‐related topics. These findings could be used to enhance online search system design and health‐related thesaurus construction.

Details

Online Information Review, vol. 33 no. 1
Type: Research Article
ISSN: 1468-4527

Keywords

To view the access options for this content please click here
Article
Publication date: 9 August 2013

Vinodh Krishnaraju and Saji K. Mathew

Web personalization has been studied in different streams of research such as Marketing, Human Computer Interaction and Computer Science. However, an information systems…

Abstract

Purpose

Web personalization has been studied in different streams of research such as Marketing, Human Computer Interaction and Computer Science. However, an information systems perspective of web personalization research is very scarcely visible in this body of knowledge. This research review seeks to address two important questions: how has web personalization evolved as an integrative discipline? How has web personalization been treated in IS literature and where should researchers focus next?

Design/methodology/approach

The paper intently follows an information systems perspective in its thematic classification of web personalization research which is consistent with the early conceptualization of information systems by logically mapping IS categories into web personalization research streams. Articles from 100+ journals were analyzed and important concepts related to web personalization were classified from an information systems perspective.

Findings

Surrounding the theme of web personalization two parallel streams of research evolved. First stream consisted of internet business models, computer science algorithms and web mining. Second stream focussed on human computer Interaction studies, user modelling and targeted marketing. Future information systems researchers in web personalization must focus on four important areas of social media, web development methodologies, emerging Internet accessing gadgets and domains other than e‐Commerce.

Originality/value

Web personalization has been studied previously in separate research streams. But no integrated view from different research streams exists. Although research interest in web mining has been growing as evidenced by growing number of publications an information systems perspective of web personalization research is very scarcely visible in the body of knowledge. The authors intently follow an information systems perspective in their thematic classification of web personalization research which is consistent with the early conceptualization of information systems by logically mapping IS categories into web personalization research streams. This thematic segregation of different research streams into IS framework makes our study distinct from other early reviews. They also identify four important areas where future IS researchers should focus on.

Details

Journal of Systems and Information Technology, vol. 15 no. 3
Type: Research Article
ISSN: 1328-7265

Keywords

1 – 10 of over 14000