Search results
1 – 10 of over 2000Gi Woong Yun, Jay Ford, Robert P. Hawkins, Suzanne Pingree, Fiona McTavish, David Gustafson and Haile Berhe
This paper seeks to discuss measurement units by comparing the internet use and the traditional media use, and to understand internet use from the traditional media use…
Abstract
Purpose
This paper seeks to discuss measurement units by comparing the internet use and the traditional media use, and to understand internet use from the traditional media use perspective.
Design/methodology/approach
Benefits and shortcomings of two log file types will be carefully and exhaustively examined. Client‐side and server‐side log files will be analyzed and compared with proposed units of analysis.
Findings
Server‐side session time calculation was remarkably reliable and valid based on the high correlation with the client‐side time calculation. The analysis result revealed that the server‐side log file session time measurement seems more promising than the researchers previously speculated.
Practical implications
An ability to identify each individual user and low caching problems were strong advantages for the analysis. Those web design implementations and web log data analysis scheme are recommended for future web log analysis research.
Originality/value
This paper examined the validity of the client‐side and the server‐side web log data. As a result of the triangulation of two datasets, research designs and propose analysis schemes could be recommended.
Details
Keywords
C.I. Ezeife, Jingyu Dong and A.K. Aggarwal
The purpose of this paper is to propose a web intrusion detection system (IDS), SensorWebIDS, which applies data mining, anomaly and misuse intrusion detection on web environment.
Abstract
Purpose
The purpose of this paper is to propose a web intrusion detection system (IDS), SensorWebIDS, which applies data mining, anomaly and misuse intrusion detection on web environment.
Design/methodology/approach
SensorWebIDS has three main components: the network sensor for extracting parameters from real‐time network traffic, the log digger for extracting parameters from web log files and the audit engine for analyzing all web request parameters for intrusion detection. To combat web intrusions like buffer‐over‐flow attack, SensorWebIDS utilizes an algorithm based on standard deviation (δ) theory's empirical rule of 99.7 percent of data lying within 3δ of the mean, to calculate the possible maximum value length of input parameters. Association rule mining technique is employed for mining frequent parameter list and their sequential order to identify intrusions.
Findings
Experiments show that proposed system has higher detection rate for web intrusions than SNORT and mod security for such classes of web intrusions like cross‐site scripting, SQL‐Injection, session hijacking, cookie poison, denial of service, buffer overflow, and probes attacks.
Research limitations/implications
Future work may extend the system to detect intrusions implanted with hacking tools and not through straight HTTP requests or intrusions embedded in non‐basic resources like multimedia files and others, track illegal web users with their prior web‐access sequences, implement minimum and maximum values for integer data, and automate the process of pre‐processing training data so that it is clean and free of intrusion for accurate detection results.
Practical implications
Web service security, as a branch of network security, is becoming more important as more business and social activities are moved online to the web.
Originality/value
Existing network IDSs are not directly applicable to web intrusion detection, because these IDSs are mostly sitting on the lower (network/transport) level of network model while web services are running on the higher (application) level. Proposed SensorWebIDS detects XSS and SQL‐Injection attacks through signatures, while other types of attacks are detected using association rule mining and statistics to compute frequent parameter list order and their maximum value lengths.
Details
Keywords
Ka I. Pun, Yain Whar Si and Kin Chan Pau
Intensive traffic often occurs in web‐enabled business processes hosted by travel industry and government portals. An extreme case for intensive traffic is flash crowd situations…
Abstract
Purpose
Intensive traffic often occurs in web‐enabled business processes hosted by travel industry and government portals. An extreme case for intensive traffic is flash crowd situations when the number of web users spike within a short time due to unexpected events caused by political unrest or extreme weather conditions. As a result, the servers hosting these business processes can no longer handle overwhelming service requests. To alleviate this problem, process engineers usually analyze audit trail data collected from the application server and reengineer their business processes to withstand unexpected surge in the visitors. However, such analysis can only reveal the performance of the application server from the internal perspective. This paper aims to investigate this issue.
Design/methodology/approach
This paper proposes an approach for analyzing key performance indicators of traffic intensive web‐enabled business processes from audit trail data, web server logs, and stress testing logs.
Findings
The key performance indicators identified in the study's approach can be used to understand the behavior of traffic intensive web‐enabled business processes and the underlying factors that affect the stability of the web server.
Originality/value
The proposed analysis also provides an internal as well as an external view of the performance. Moreover, the calculated key performance indicators can be used by the process engineers for locating potential bottlenecks, reengineering business processes, and implementing contingency measures for traffic intensive situations.
Details
Keywords
Most electronic journals are now Web‐based. This paper introduces the method of WWW server log file analysis and its application to evaluating electronic journals services and in…
Abstract
Most electronic journals are now Web‐based. This paper introduces the method of WWW server log file analysis and its application to evaluating electronic journals services and in monitoring their usage. Following a short description on the method and its possible application, the main results of a study of WWW server log file analysis of the electronic journal “Review of Information Science” will be presented and discussed. Finally, several concluding remarks will be given.
S.E. Kruck, Faye Teer and William A. Christian
The purpose of this paper is to describe a new software tool that graphically depicts analysis of visitor traffic. This new tool is the graph‐based server log analysis program…
Abstract
Purpose
The purpose of this paper is to describe a new software tool that graphically depicts analysis of visitor traffic. This new tool is the graph‐based server log analysis program (GSLAP).
Design/methodology/approach
Discovering hidden and meaningful information about web users' patterns of usage is critical to optimization of the web server. The authors designed and developed GSLAP. Presented in this paper is an example of GSLAP in the context of an analysis of the web site of a small fictitious company. Also included is an explanation of current literature that supports graphical display of data as a cognitive aid to understanding data.
Findings
GSLAP is shown to provide a visual server log analysis that is a great improvement on the textual server log.
Research limitations/implications
The benefits of the output from GSLAP are compared with the typical textual output.
Originality/value
The paper describes a software tool that helps the analysis of usage patterns of web traffic.
Details
Keywords
To study the use of “Quick Links”, a common navigational element, in the context of an academic library website.
Abstract
Purpose
To study the use of “Quick Links”, a common navigational element, in the context of an academic library website.
Design/methodology/approach
Transaction log files and web server logs are analyzed over a four‐year period to detect patterns in Quick Link usage.
Findings
Provides information about what Quick Links have been used over time, as well as the relationship of Quick Link usage to the rest of the library website. Finds generally that Quick Link usage is prevalent, tilted toward a few of the choices, and is drawn largely from the library homepage as referral source.
Research limitations/implications
Log analysis does not include IP referral data, which limits the ability to determine different patterns of use by specific locations including services desks, off‐campus, and in‐house library usage.
Practical implications
This paper is useful for website usability in terms of design decisions and log analysis.
Originality/value
This paper targets a specific website usability issue over time.
Details
Keywords
David Nicholas, Paul Huntington, Peter Williams, Nat Lievesley, Tom Dobrowolski and Richard Withey
There is a general dearth of trustworthy information on who is using the web and how they use it. Such information is of vital concern to web managers and their advertisers yet…
Abstract
There is a general dearth of trustworthy information on who is using the web and how they use it. Such information is of vital concern to web managers and their advertisers yet the systems for delivering such data, where in place, generally cannot supply accurate enough data. Nor have web managers the expertise or time to evaluate the enormous amounts of information that are generated by web sites. The article, based on the experience of evaluating The Times web server access logs, describes the methodological problems that lie at the heart of web log analysis, evaluates a range of use measures (visits, page impressions, hits) and provides some advice on what analyses are worth conducting.
Details
Keywords
Hamid R. Jamali, David Nicholas and Paul Huntington
To provide a review of the log analysis studies of use and users of scholarly electronic journals.
Abstract
Purpose
To provide a review of the log analysis studies of use and users of scholarly electronic journals.
Design/methodology/approach
The advantages and limitations of log analysis are described and then past studies of e‐journals' use and users that applied this methodology are critiqued. The results of these studies will be very briefly compared with some survey studies. Those aspects of online journals' use and users studies that log analysis can investigate well and those aspects that log analysis can not disclose enough information about are highlighted.
Findings
The review indicates that although there is a debate about reliability of the results of log analysis, this methodology has great potential for studying online journals' use and their users' information seeking behaviour.
Originality/value
This paper highlights the strengths and weaknesses of log analysis for studying digital journals and raises a couple of questions to be investigated by further studies.
Details
Keywords
Guillermo Navarro‐Arribas and Vicenç Torra
The purpose of this paper is to anonymize web server log files used in e‐commerce web mining processes.
Abstract
Purpose
The purpose of this paper is to anonymize web server log files used in e‐commerce web mining processes.
Design/methodology/approach
The paper has applied statistical disclosure control (SDC) techniques to achieve its goal. More precisely, it has introduced the micro‐aggregation of web access logs.
Findings
The experiments show that the proposed technique provides good results in general, but it is especially outstanding when dealing with relatively small websites.
Research limitations/implications
As in all SDC techniques there is always a trade‐off between privacy and utility or, in other words, between disclosure risk and information loss. In this proposal, it has borne this issue in mind, providing k‐anonymity, while preserving acceptable information accuracy.
Practical implications
Web server logs are valuable information used nowadays for user profiling and general data‐mining analysis of a website in e‐commerce and e‐services. This proposal allows anonymizing such logs, so they can be safely outsourced to other companies for marketing purposes, stored for further analysis, or made publicly available, without risking customer privacy.
Originality/value
Current solutions to the problem presented here are very poor and scarce. They are normally reduced to the elimination of sensitive information from query strings of URLs in general. Moreover, to its knowledge, the use of SDC techniques has never been applied to the anonymization of web logs.
Details
Keywords
Alesia Zuccala, Mike Thelwall, Charles Oppenheim and Rajveen Dhiensa
The purpose of this paper is to explore the use of LexiURL as a Web intelligence tool for collecting and analysing links to digital libraries, focusing specifically on the…
Abstract
Purpose
The purpose of this paper is to explore the use of LexiURL as a Web intelligence tool for collecting and analysing links to digital libraries, focusing specifically on the National electronic Library for Health (NeLH).
Design/methodology/approach
The Web intelligence techniques in this study are a combination of link analysis (web structure mining), web server log file analysis (web usage mining), and text analysis (web content mining), utilizing the power of commercial search engines and drawing upon the information science fields of bibliometrics and webometrics. LexiURL is a computer program designed to calculate summary statistics for lists of links or URLs. Its output is a series of standard reports, for example listing and counting all of the different domain names in the data.
Findings
Link data, when analysed together with user transaction log files (i.e. Web referring domains) can provide insights into who is using a digital library and when, and who could be using the digital library if they are “surfing” a particular part of the Web; in this case any site that is linked to or colinked with the NeLH. This study found that the NeLH was embedded in a multifaceted Web context, including many governmental, educational, commercial and organisational sites, with the most interesting being sites from the.edu domain, representing American Universities. Not many links directed to the NeLH were followed on September 25, 2005 (the date of the log file analysis and link extraction analysis), which means that users who access the digital library have been arriving at the site via only a few select links, bookmarks and search engine searches, or non‐electronic sources.
Originality/value
A number of studies concerning digital library users have been carried out using log file analysis as a research tool. Log files focus on real‐time user transactions; while LexiURL can be used to extract links and colinks associated with a digital library's growing Web network. This Web network is not recognized often enough, and can be a useful indication of where potential users are surfing, even if they have not yet specifically visited the NeLH site.
Details