Search results

1 – 10 of over 17000
Article
Publication date: 14 November 2016

Bahjat Fatima, Huma Ramzan and Sohail Asghar

The purpose of this paper is to critically analyze the state-of-the-art session identification techniques used in web usage mining (WUM) process in terms of their limitations…

Abstract

Purpose

The purpose of this paper is to critically analyze the state-of-the-art session identification techniques used in web usage mining (WUM) process in terms of their limitations, features, and methodologies.

Design/methodology/approach

In this research, systematic literature review has been conducted using review protocol approach. The methodology consisted of a comprehensive search for relevant literature over the period of 2005-2015, using four online database repositories (i.e. IEEE, Springer, ACM Digital Library, and ScienceDirect).

Findings

The findings revealed that this research area is still immature and existing literature lacks the critical review of recent session identification techniques used in WUM process.

Originality/value

The contribution of this study is to provide a structured overview of the research developments, to critically review the existing session identification techniques, highlight their limitations and associated challenges and identify areas where further improvements are required so as to complement the performance of existing techniques.

Details

Online Information Review, vol. 40 no. 7
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 5 September 2008

Seda Ozmutlu and Gencer C. Cosar

Identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. Recently, various studies have focused on new topic…

Abstract

Purpose

Identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. Recently, various studies have focused on new topic identification/session identification of search engine transaction logs, and several problems regarding the estimation of topic shifts and continuations were observed in these studies. This study aims to analyze the reasons for the problems that were encountered as a result of applying automatic new topic identification.

Design/methodology/approach

Measures, such as cleaning the data of common words and analyzing the errors of automatic new topic identification, are applied to eliminate the problems in estimating topic shifts and continuations.

Findings

The findings show that the resulting errors of automatic new topic identification have a pattern, and further research is required to improve the performance of automatic new topic identification.

Originality/value

Improving the performance of automatic new topic identification would be valuable to search engine designers, so that they can develop new clustering and query recommendation algorithms, as well as custom‐tailored graphical user interfaces for search engine users.

Details

Library Hi Tech, vol. 26 no. 3
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 8 June 2010

Guillermo Navarro‐Arribas and Vicenç Torra

The purpose of this paper is to anonymize web server log files used in e‐commerce web mining processes.

1484

Abstract

Purpose

The purpose of this paper is to anonymize web server log files used in e‐commerce web mining processes.

Design/methodology/approach

The paper has applied statistical disclosure control (SDC) techniques to achieve its goal. More precisely, it has introduced the micro‐aggregation of web access logs.

Findings

The experiments show that the proposed technique provides good results in general, but it is especially outstanding when dealing with relatively small websites.

Research limitations/implications

As in all SDC techniques there is always a trade‐off between privacy and utility or, in other words, between disclosure risk and information loss. In this proposal, it has borne this issue in mind, providing k‐anonymity, while preserving acceptable information accuracy.

Practical implications

Web server logs are valuable information used nowadays for user profiling and general data‐mining analysis of a website in e‐commerce and e‐services. This proposal allows anonymizing such logs, so they can be safely outsourced to other companies for marketing purposes, stored for further analysis, or made publicly available, without risking customer privacy.

Originality/value

Current solutions to the problem presented here are very poor and scarce. They are normally reduced to the elimination of sensitive information from query strings of URLs in general. Moreover, to its knowledge, the use of SDC techniques has never been applied to the anonymization of web logs.

Details

Internet Research, vol. 20 no. 3
Type: Research Article
ISSN: 1066-2243

Keywords

Article
Publication date: 6 September 2021

Sivaraman Eswaran, Vakula Rani, Daniel D., Jayabrabu Ramakrishnan and Sadhana Selvakumar

In the recent era, banking infrastructure constructs various remotely handled platforms for users. However, the security risk toward the banking sector has also elevated, as it is…

Abstract

Purpose

In the recent era, banking infrastructure constructs various remotely handled platforms for users. However, the security risk toward the banking sector has also elevated, as it is visible from the rising number of reported attacks against these security systems. Intelligence shows that cyberattacks of the crawlers are increasing. Malicious crawlers can crawl the Web pages, crack the passwords and reap the private data of the users. Besides, intrusion detection systems in a dynamic environment provide more false positives. The purpose of this research paper is to propose an efficient methodology to sense the attacks for creating low levels of false positives.

Design/methodology/approach

In this research, the authors have developed an efficient approach for malicious crawler detection and correlated the security alerts. The behavioral features of the crawlers are examined for the recognition of the malicious crawlers, and a novel methodology is proposed to improvise the bank user portal security. The authors have compared various machine learning strategies including Bayesian network, support sector machine (SVM) and decision tree.

Findings

This proposed work stretches in various aspects. Initially, the outcomes are stated for the mixture of different kinds of log files. Then, distinct sites of various log files are selected for the construction of the acceptable data sets. Session identification, attribute extraction, session labeling and classification were held. Moreover, this approach clustered the meta-alerts into higher level meta-alerts for fusing multistages of attacks and the various types of attacks.

Originality/value

This methodology used incremental clustering techniques and analyzed the probability of existing topologies in SVM classifiers for more deterministic classification. It also enhanced the taxonomy for various domains.

Details

International Journal of Pervasive Computing and Communications, vol. 18 no. 1
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 6 March 2009

Christine Merk, Frank Scholze and Nils Windisch

The purpose of this paper is to present how the JISC Usage Statistics Review Project aims to formulate a fundamental scheme for recording usage data and to propose a standard for…

888

Abstract

Purpose

The purpose of this paper is to present how the JISC Usage Statistics Review Project aims to formulate a fundamental scheme for recording usage data and to propose a standard for its aggregation to provide meaningful and comparable item‐level usage statistics for electronic documents such as, for example, research papers and scientific resources.

Design/methodology/approach

A core element of the project has been a stakeholder workshop. This workshop was held in Berlin, 7/8 July 2008. Representatives of key stakeholder groups (repositories, libraries, COUNTER, IRStats, JISC, LogEc, MESUR, OA‐Statistics and other Open Access projects) were invited. During the workshop a fundamental scheme for the recording and the exchange of log files was discussed as well as the normalization of data collected.

Findings

The following mandatory elements describing usage events were agreed during the stakeholder workshop: Who – identification of user/session, What – item identification and type of request performed (e.g. full‐text, front‐page, including failed/partially fulfilled requests), When – date and time, usage event ID. The following elements were regarded as optional: From where – referrer/the referring entity and identity of the service. Usage events should be exchanged in the form of OpenURL Context Objects using OAI. Automated access (e.g. robots) should be tagged. The definition of automated access has to be straightforward with an option of gradual refinement. Users have to be identified unambiguously, but without recording personal data to avoid conflicts with privacy laws. Policies on statistics should be formulated for the repository community as well as the publishing community. Information about statistics policies should be available on services like OpenDOAR and RoMEO.

Originality/value

The paper is based on the detailed project report to the JISC, available at http://ie‐repository.jisc.ac.uk/250/

Details

Library Hi Tech, vol. 27 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 1 August 2005

Baoyao Zhou, Siu Cheung Hui and Alvis C. M. Fong

With the explosive growth of information available on the World Wide Web, it has become much more difficult to access relevant information from the Web. One possible approach to…

Abstract

With the explosive growth of information available on the World Wide Web, it has become much more difficult to access relevant information from the Web. One possible approach to solve this problem is web personalization. In this paper, we propose a novel WUL (Web Usage Lattice) based mining approach for mining association access pattern rules for personalized web recommendations. The proposed approach aims to mine a reduced set of effective association pattern rules for enhancing the online performance of web recommendations. We have incorporated the proposed approach into a personalized web recommender system known as AWARS. The performance of the proposed approach is evaluated based on the efficiency and the quality. In the efficiency evaluation, we measure the number of generated rules and the runtime for online recommendations. In the quality evaluation, we measure the quality of the recommendation service based on precision, satisfactory and applicability. This paper will discuss the proposed WUL‐based mining approach, and give the performance of the proposed approach in comparison with the Apriori‐based algorithms.

Details

International Journal of Web Information Systems, vol. 1 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 28 October 2021

Husna Sarirah Husin, James Thom and Xiuzhen Zhang

The purpose of the study is to use web serer logs in analyzing the changes of user behavior in reading online news, in terms of desktop and mobile users. Advances in mobile…

205

Abstract

Purpose

The purpose of the study is to use web serer logs in analyzing the changes of user behavior in reading online news, in terms of desktop and mobile users. Advances in mobile technology and social media have paved the way for online news consumption to evolve. There is an absence of research into the changes of user behavior in terms of desktop versus mobile users, particularly by analyzing the server logs.

Design/methodology/approach

In this paper, the authors investigate the evolution of user behavior using logs from the Malaysian newspaper Berita Harian Online in April 2012 and April 2017. Web usage mining techniques were used for pre-processing the logs and identifying user sessions. A Markov model is used to analyze navigation flows, and association rule mining is used to analyze user behavior within sessions.

Findings

It was found that page accesses have increased tremendously, particularly from Android phones, and about half of the requests in 2017 are referred from Facebook. Navigation flow between the main page, articles and section pages has changed from 2012 to 2017; while most users started navigation with the main page in 2012, readers often started with an article in 2017. Based on association rules, National and Sports are the most frequent section pages in 2012 and 2017 for desktop and mobile. However, based on the lift and conviction, these two sections are not read together in the same session as frequently as might be expected. Other less popular items have higher probability of being read together in a session.

Research limitations/implications

The localized data set is from Berita Harian Online; although unique to this particular newspaper, the findings and the methodology for investigating user behavior can be applied to other online news. On another note, the data set could be extended to be more than a month. Although initially data for the year 2012 was collected, unfortunately only the data for April 2012 is complete. Other months have missing days. Therefore, to make an impartial comparison for the evolution of user behavior in five years, the Web server logs for April 2017 were used.

Originality/value

The user behavior in 2012 and 2017 was compared using association rules and Markov flow. Different from existing studies analyzing online newspaper Web server logs, this paper uniquely investigates changes in user behavior as a result of mobile phones becoming a mainstream technology for accessing the Web.

Details

International Journal of Web Information Systems, vol. 18 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 August 1998

Shahana Sen, Balaji Padmanabhan, Alexander Tuzhilin, Norman H. White and Roger Stein

Investigates the information needs of marketers on the WWW for consumer analysis purposes, and examines how these can be met. Begins by briefly highlighting the attractiveness of…

2634

Abstract

Investigates the information needs of marketers on the WWW for consumer analysis purposes, and examines how these can be met. Begins by briefly highlighting the attractiveness of the Web as a medium for communication. Next, summarizes the bases for consumer analysis and segmentation in the context of the traditional marketing communication media. Based on this framework, examines consumer analysis needs in the context of the WWW medium and proposes analysis variables relevant to this medium. Finally, discusses how the information needs for consumer analysis may be met from the different information sources available to a marketer, including the Web logfiles which are generated as a result of tracking the interactions of visitors accessing information from a company’s Web site.

Details

European Journal of Marketing, vol. 32 no. 7/8
Type: Research Article
ISSN: 0309-0566

Keywords

Article
Publication date: 23 October 2007

David Nicholas, Paul Huntington and Hamid R. Jamali

The purpose of this research is to examine the impact on usage of the journal Nucleic Acids Research (NAR) moving to an open access model. A major objective was to examine the…

1408

Abstract

Purpose

The purpose of this research is to examine the impact on usage of the journal Nucleic Acids Research (NAR) moving to an open access model. A major objective was to examine the impact of open access in the context of other initiatives that have improved accessibility to scholarly journals. The study also aims to demonstrate the potential of deep log analysis for monitoring change in usage over time.

Design/methodology/approach

Data were gathered from the logs for the period 2003‐June 2005 and analysed using deep log methods. The data were analysed to provide the following information on use: type of item viewed; usage over time; usage for individual journal issues; usage per type of article; age of article. Usage analyses were further examined with regard to the following user characteristics: subscriber/non‐subscriber; referrer link employed, organisational affiliation; geographical location.

Findings

The analysis showed that the rise in use of NAR over the survey period (140 per cent) could largely be attributed to the opening up of the site to search engines and that the move to OA had a relatively small influence on driving usage up further (less than 10 per cent).

Originality/value

The study for the first time thoroughly analyses the usage data of a significant experimental open access journal and reveals the huge impact of search engines on driving up usage.

Details

Journal of Documentation, vol. 63 no. 6
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 20 February 2009

Jin Zhang and Dietmar Wolfram

The purpose of this article is to investigate obesity‐related queries from a public health portal (HealthLink) transaction log.

Abstract

Purpose

The purpose of this article is to investigate obesity‐related queries from a public health portal (HealthLink) transaction log.

Design/methodology/approach

Multidimensional scaling (MDS) was applied to each of five obesity‐related focus keywords and their co‐occurring terms in submitted queries. After the transaction log data were collected and cleaned, and query terms were extracted and parsed, relationships between a focus keyword and its co‐occurring terms were established. Clustering relationships between focus keywords and their co‐occurring terms were identified and analysed in the MDS visual context.

Findings

The MDS analysis produced satisfactory outcomes for all five focus keywords. The term “placements”, in the visual configurations revealed strong grouping tendencies of three to five clusters for each focus keyword.

Originality/value

The findings of this study provide insights into health consumers' internet‐based information‐seeking behaviour on obesity‐related topics. These findings could be used to enhance online search system design and health‐related thesaurus construction.

Details

Online Information Review, vol. 33 no. 1
Type: Research Article
ISSN: 1468-4527

Keywords

1 – 10 of over 17000