Search results

1 – 10 of over 3000
Article
Publication date: 8 June 2010

Guillermo Navarro‐Arribas and Vicenç Torra

The purpose of this paper is to anonymize web server log files used in e‐commerce web mining processes.

1489

Abstract

Purpose

The purpose of this paper is to anonymize web server log files used in e‐commerce web mining processes.

Design/methodology/approach

The paper has applied statistical disclosure control (SDC) techniques to achieve its goal. More precisely, it has introduced the micro‐aggregation of web access logs.

Findings

The experiments show that the proposed technique provides good results in general, but it is especially outstanding when dealing with relatively small websites.

Research limitations/implications

As in all SDC techniques there is always a trade‐off between privacy and utility or, in other words, between disclosure risk and information loss. In this proposal, it has borne this issue in mind, providing k‐anonymity, while preserving acceptable information accuracy.

Practical implications

Web server logs are valuable information used nowadays for user profiling and general data‐mining analysis of a website in e‐commerce and e‐services. This proposal allows anonymizing such logs, so they can be safely outsourced to other companies for marketing purposes, stored for further analysis, or made publicly available, without risking customer privacy.

Originality/value

Current solutions to the problem presented here are very poor and scarce. They are normally reduced to the elimination of sensitive information from query strings of URLs in general. Moreover, to its knowledge, the use of SDC techniques has never been applied to the anonymization of web logs.

Details

Internet Research, vol. 20 no. 3
Type: Research Article
ISSN: 1066-2243

Keywords

Article
Publication date: 31 July 2007

Alesia Zuccala, Mike Thelwall, Charles Oppenheim and Rajveen Dhiensa

The purpose of this paper is to explore the use of LexiURL as a Web intelligence tool for collecting and analysing links to digital libraries, focusing specifically on the…

2119

Abstract

Purpose

The purpose of this paper is to explore the use of LexiURL as a Web intelligence tool for collecting and analysing links to digital libraries, focusing specifically on the National electronic Library for Health (NeLH).

Design/methodology/approach

The Web intelligence techniques in this study are a combination of link analysis (web structure mining), web server log file analysis (web usage mining), and text analysis (web content mining), utilizing the power of commercial search engines and drawing upon the information science fields of bibliometrics and webometrics. LexiURL is a computer program designed to calculate summary statistics for lists of links or URLs. Its output is a series of standard reports, for example listing and counting all of the different domain names in the data.

Findings

Link data, when analysed together with user transaction log files (i.e. Web referring domains) can provide insights into who is using a digital library and when, and who could be using the digital library if they are “surfing” a particular part of the Web; in this case any site that is linked to or colinked with the NeLH. This study found that the NeLH was embedded in a multifaceted Web context, including many governmental, educational, commercial and organisational sites, with the most interesting being sites from the.edu domain, representing American Universities. Not many links directed to the NeLH were followed on September 25, 2005 (the date of the log file analysis and link extraction analysis), which means that users who access the digital library have been arriving at the site via only a few select links, bookmarks and search engine searches, or non‐electronic sources.

Originality/value

A number of studies concerning digital library users have been carried out using log file analysis as a research tool. Log files focus on real‐time user transactions; while LexiURL can be used to extract links and colinks associated with a digital library's growing Web network. This Web network is not recognized often enough, and can be a useful indication of where potential users are surfing, even if they have not yet specifically visited the NeLH site.

Article
Publication date: 26 September 2008

Khaled A. Mohamed and Ahmed Hassan

This paper aims to examine the behaviour of the Egyptian scholars while accessing electronic resources through two federated search tools. The main purpose of this article is to…

Abstract

Purpose

This paper aims to examine the behaviour of the Egyptian scholars while accessing electronic resources through two federated search tools. The main purpose of this article is to provide guidance for federated search tool technicians and support teams about user issues, including the need for training.

Design/methodology/approach

Log files were exploited to examine the behaviour of users of information retrieval systems. This study examined two log files extracted from federated search tools available to the Egyptian scholars' community for accessing electronic resources. A data mining approach was implemented to investigate user behaviour through deep analysis of these logs.

Findings

Results show that: none of the available tools provide error messages for dummy queries; most of the Egyptian scholars had short queries; Boolean operators are not used in about 50 per cent of the queries; federated search tools do not provide techniques for query reformation; the optimal days for system maintenance are the non‐weekend vacations; and early morning is the best time for maintenance.

Practical implications

To maximise the value of the federated search tools by understanding user trends when utilising federated search tools. The study shows that more attention should be given to the search capabilities through ongoing training and awareness in order to maximise the benefit from the available resources and tools.

Originality/value

The hypothetical value of the federated search tools has not been previously examined and analysed to understand user trends.

Details

Program, vol. 42 no. 4
Type: Research Article
ISSN: 0033-0337

Keywords

Article
Publication date: 1 October 2006

Gi Woong Yun, Jay Ford, Robert P. Hawkins, Suzanne Pingree, Fiona McTavish, David Gustafson and Haile Berhe

This paper seeks to discuss measurement units by comparing the internet use and the traditional media use, and to understand internet use from the traditional media use…

Abstract

Purpose

This paper seeks to discuss measurement units by comparing the internet use and the traditional media use, and to understand internet use from the traditional media use perspective.

Design/methodology/approach

Benefits and shortcomings of two log file types will be carefully and exhaustively examined. Client‐side and server‐side log files will be analyzed and compared with proposed units of analysis.

Findings

Server‐side session time calculation was remarkably reliable and valid based on the high correlation with the client‐side time calculation. The analysis result revealed that the server‐side log file session time measurement seems more promising than the researchers previously speculated.

Practical implications

An ability to identify each individual user and low caching problems were strong advantages for the analysis. Those web design implementations and web log data analysis scheme are recommended for future web log analysis research.

Originality/value

This paper examined the validity of the client‐side and the server‐side web log data. As a result of the triangulation of two datasets, research designs and propose analysis schemes could be recommended.

Details

Internet Research, vol. 16 no. 5
Type: Research Article
ISSN: 1066-2243

Keywords

Article
Publication date: 28 September 2012

Jiann‐Cherng Shieh

In the digital library era, library websites are recognized as the extension of library services. The usability and findability of library websites are growing more and more…

1353

Abstract

Purpose

In the digital library era, library websites are recognized as the extension of library services. The usability and findability of library websites are growing more and more important to patrons. No matter how these websites have been built, they should offer the capability that patrons can find their required information quickly and intuitively. The website logs keep tracks of users' factual behaviors of finding their required information. Based on the evidences, the author attempts to reconstruct the websites to promote their internal findability.

Design/methodology/approach

In the past, the card sorting method has generally been applied to reconstruct websites to improve their internal findability. Alternately, in this research, a first attempt is made to try to use website log data to implement website reconstruction. The website log data was cleaned and user sub‐sessions were extracted according to their respective critical time of session navigation. Each sub‐session's threshold time of target page was then calculated with different weights to determine its navigating parent pages. The different weighted parent pages were utilized to reconstruct various websites. A task‐oriented experiment of four tasks and 25 participants was conducted to measure the effects of findability between the constructed websites.

Findings

By analysis of the variance of time to complete the tasks, it is shown that the reconstructed websites have better findability performance in the time spent to complete the tasks than the current one, if focusing much more on the target pages. The result clearly explores that when the parent pages of a page are selected, whether it is a target page is the most important issue to improve website findability. The target page plays a critical role in website reconstruction. Furthermore, the traditional card sorting method is applied to the case website to reconstruct it. The findability experiment is then conducted and its time to complete the tasks is compared to those of websites that are reconstructed. The approach proposed here has better effects than card sorting.

Originality/value

Mining web log data to discover user behaviors on the library website, this research applies a heuristic method to analyze the data collected to reconstruct websites. Focusing on the target pages, the reconstructed websites will have better findability. Besides traditional card sorting techniques, this paper provides an alternative way to reconstruct websites such that users can find what they need more conveniently and intuitively.

Details

The Electronic Library, vol. 30 no. 5
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 1 August 2005

Baoyao Zhou, Siu Cheung Hui and Alvis C. M. Fong

With the explosive growth of information available on the World Wide Web, it has become much more difficult to access relevant information from the Web. One possible approach to…

Abstract

With the explosive growth of information available on the World Wide Web, it has become much more difficult to access relevant information from the Web. One possible approach to solve this problem is web personalization. In this paper, we propose a novel WUL (Web Usage Lattice) based mining approach for mining association access pattern rules for personalized web recommendations. The proposed approach aims to mine a reduced set of effective association pattern rules for enhancing the online performance of web recommendations. We have incorporated the proposed approach into a personalized web recommender system known as AWARS. The performance of the proposed approach is evaluated based on the efficiency and the quality. In the efficiency evaluation, we measure the number of generated rules and the runtime for online recommendations. In the quality evaluation, we measure the quality of the recommendation service based on precision, satisfactory and applicability. This paper will discuss the proposed WUL‐based mining approach, and give the performance of the proposed approach in comparison with the Apriori‐based algorithms.

Details

International Journal of Web Information Systems, vol. 1 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 July 2003

Jeong Yong Ahn, Seok Ki Kim and Kyung Soo Han

In the past few years, information technology has stimulated several innovations in the business and marketing fields, and advances in the technology are changing the research…

13386

Abstract

In the past few years, information technology has stimulated several innovations in the business and marketing fields, and advances in the technology are changing the research surrounding those fields. Recently, focusing topics in the management and marketing field are electronic customer relationship management (CRM) and the practical use of marketing data and information technology. The goal of this article is not to provide an all‐inclusive tutorial on CRM but rather to provide fundamental concepts behind CRM and some aspects of the system development process. This article provides a comprehensive review of CRM and marketing data sources, and consider some design concepts for creating an effective CRM system from the viewpoint of practical use of the data sources.

Details

Industrial Management & Data Systems, vol. 103 no. 5
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 14 November 2016

Bahjat Fatima, Huma Ramzan and Sohail Asghar

The purpose of this paper is to critically analyze the state-of-the-art session identification techniques used in web usage mining (WUM) process in terms of their limitations…

Abstract

Purpose

The purpose of this paper is to critically analyze the state-of-the-art session identification techniques used in web usage mining (WUM) process in terms of their limitations, features, and methodologies.

Design/methodology/approach

In this research, systematic literature review has been conducted using review protocol approach. The methodology consisted of a comprehensive search for relevant literature over the period of 2005-2015, using four online database repositories (i.e. IEEE, Springer, ACM Digital Library, and ScienceDirect).

Findings

The findings revealed that this research area is still immature and existing literature lacks the critical review of recent session identification techniques used in WUM process.

Originality/value

The contribution of this study is to provide a structured overview of the research developments, to critically review the existing session identification techniques, highlight their limitations and associated challenges and identify areas where further improvements are required so as to complement the performance of existing techniques.

Details

Online Information Review, vol. 40 no. 7
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 1 June 2003

San‐Yih Hwang, Wen‐Chiang Hsiung and Wan‐Shiou Yang

This article describes a service for providing literature recommendations, which is part of a networked digital library project whose principal goal is to develop technologies for…

Abstract

This article describes a service for providing literature recommendations, which is part of a networked digital library project whose principal goal is to develop technologies for supporting digital services. The proposed literature recommendation system makes use of the Web usage logs of a literature digital library. The recommendation framework consists of three sequential steps: data preparation of the Web usage log, discovery of article associations, and article recommendations. We discuss several design alternatives for conducting these steps. These alternatives are evaluated using the Web logs of our university’s electronic thesis and dissertation (ETD) system. The proposed literature recommendation system has been incorporated into our university’s ETD system, and is currently operational.

Details

Online Information Review, vol. 27 no. 3
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 13 February 2009

B.S. Sirisha, V.K.J. Jeevan, R.V. Raja Kumar and A. Goswami

The purpose of this paper is to describe the development of a personalised information support system to help faculty members to search various portals and e‐resources without…

Abstract

Purpose

The purpose of this paper is to describe the development of a personalised information support system to help faculty members to search various portals and e‐resources without typing the search terms in different interfaces and to obtain results re‐ordered without human intervention.

Design/methodology/approach

After a careful survey of various tools and techniques available for computerised client‐centred information services, the study selected to apply web usage mining, proxy level data collection and a vector space retrieval model to develop the personalised information support for teaching and research in a higher education institution.

Findings

There are practical constraints in the implementation stage. There is considerable difficulty in getting real and correct user interests and mapping them effectively into the products and services offered by the library. Also the interests of users change continuously. If multiple users share the same PC, it is difficult to identify the user as there is no one‐to‐one mapping between user and IP address.

Research limitations/implications

The paper has not considered cases for all the faculty members due to time constraints. The results obtained from the system also need correlation with the sources actually used by the faculty to test its efficacy in a highly fluid research situation like higher academics.

Practical implications

A pragmatic client‐centred information support prototype described in this paper may find use in other institutions needing similar information support.

Originality/value

This paper demonstrates the pragmatic application of ICT for linking users and e‐resources in an academic library.

Details

Program, vol. 43 no. 1
Type: Research Article
ISSN: 0033-0337

Keywords

1 – 10 of over 3000