Search results

1 – 10 of 337
To view the access options for this content please click here
Article
Publication date: 1 April 2003

Mike Thelwall

Google's PageRank is an influential algorithm that uses a model of Web use that is dominated by its link structure in order to rank pages by their estimated value to the…

Downloads
1405

Abstract

Google's PageRank is an influential algorithm that uses a model of Web use that is dominated by its link structure in order to rank pages by their estimated value to the Web community. This paper reports on the outcome of applying the algorithm to the Web sites of three national university systems in order to test whether it is capable of identifying the most important Web pages. The results are also compared with simple inlink counts. It was discovered that the highest inlinked pages do not always have the highest PageRank, indicating that the two metrics are genuinely different, even for the top pages. More significantly, however, internal links dominated external links for the high ranks in either method and superficial reasons accounted for high scores in both cases. It is concluded that PageRank is not useful for identifying the top pages in a site and that it must be combined with a powerful text matching techniques in order to get the quality of information retrieval results provided by Google.

Details

Journal of Documentation, vol. 59 no. 2
Type: Research Article
ISSN: 0022-0418

Keywords

To view the access options for this content please click here
Book part
Publication date: 14 December 2004

Mike Thelwall

Abstract

Details

Link Analysis: An Information Science Approach
Type: Book
ISBN: 978-012088-553-4

To view the access options for this content please click here
Article
Publication date: 4 July 2016

Adam Jachimczyk, Magdalena Chrapek and Zbigniew Chrapek

– The purpose of this paper is statistical examination of nearly 7,000 web directories and an analysis of factors which affect their quality measured by PageRank.

Abstract

Purpose

The purpose of this paper is statistical examination of nearly 7,000 web directories and an analysis of factors which affect their quality measured by PageRank.

Design/methodology/approach

The authors analysed 6,821 directories registered at www.katalogiseo.info/. The following information about the directories was examined: the year of registration on the website, the directory’s PageRank value, the existence of an active IP address, backlink requests, a fee charged for submission to the directory, as well as directory moderation and subject. Statistical analyses were performed with the use of Microsoft Excel, version 2010, and R software, version 3.0.0. The PageRank values were collected with a software written in Python.

Findings

The study has shown a gradual increase in popularity of directories as one of the basic tools in search engine optimisation. The analysis has indicated a relatively high percentage of spam web directories. The evidence of this is the number of directories with undetermined PageRank values. The study revealed that careful management of a directory and its subject have key impact on directory quality measured by PageRank.

Originality/value

Relatively few publications focus on the problem of web directories which represent a very large group of websites created solely to manipulate web search engine rankings. This paper discusses the phenomenon of web directories, reveals the percentage of spam directories, and factors which affect their quality measured by PageRank.

Details

Program, vol. 50 no. 3
Type: Research Article
ISSN: 0033-0337

Keywords

To view the access options for this content please click here
Article
Publication date: 1 April 2002

Mike Thelwall

The spread of subject gateway sites can have an impact on the other major Web information retrieval tool: the commercial search engine. This is because gateway sites…

Downloads
1241

Abstract

The spread of subject gateway sites can have an impact on the other major Web information retrieval tool: the commercial search engine. This is because gateway sites perturb the link structure of the Web, something used to rank matches in search engine results pages. The success of Google means that its PageRank algorithm for ranking the importance of Web pages is an object of particular interest, and it is one of the few published ranking algorithms. Although highly mathematical, PageRank admits a simple underlying explanation that allows an analysis of its impact on Web spaces. It is shown that under certain stated assumptions gateway sites can actually decrease the PageRank of their targets. Suggestions are made for gateway site designers and other Web authors to minimise this.

Details

Online Information Review, vol. 26 no. 2
Type: Research Article
ISSN: 1468-4527

Keywords

To view the access options for this content please click here
Article
Publication date: 1 February 2004

Mike Thelwall and Liwen Vaughan

Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document…

Downloads
544

Abstract

Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based on directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based on these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects’ rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.

To view the access options for this content please click here
Article
Publication date: 5 January 2018

Tehmina Amjad, Ali Daud and Naif Radi Aljohani

This study reviews the methods found in the literature for the ranking of authors, identifies the pros and cons of these methods, discusses and compares these methods. The…

Downloads
1235

Abstract

Purpose

This study reviews the methods found in the literature for the ranking of authors, identifies the pros and cons of these methods, discusses and compares these methods. The purpose of this paper is to study is to find the challenges and future directions of ranking of academic objects, especially authors, for future researchers.

Design/methodology/approach

This study reviews the methods found in the literature for the ranking of authors, classifies them into subcategories by studying and analyzing their way of achieving the objectives, discusses and compares them. The data sets used in the literature and the evaluation measures applicable in the domain are also presented.

Findings

The survey identifies the challenges involved in the field of ranking of authors and future directions.

Originality/value

To the best of the knowledge, this is the first survey that studies the author ranking problem in detail and classifies them according to their key functionalities, features and way of achieving the objective according to the requirement of the problem.

Details

Library Hi Tech, vol. 36 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

To view the access options for this content please click here
Article
Publication date: 6 February 2007

Michael P. Evans

The purpose of this paper is to identify the most popular techniques used to rank a web page highly in Google.

Downloads
10652

Abstract

Purpose

The purpose of this paper is to identify the most popular techniques used to rank a web page highly in Google.

Design/methodology/approach

The paper presents the results of a study into 50 highly optimized web pages that were created as part of a Search Engine Optimization competition. The study focuses on the most popular techniques that were used to rank highest in this competition, and includes an analysis on the use of PageRank, number of pages, number of in‐links, domain age and the use of third party sites such as directories and social bookmarking sites. A separate study was made into 50 non‐optimized web pages for comparison.

Findings

The paper provides insight into the techniques that successful Search Engine Optimizers use to ensure a page ranks highly in Google. Recognizes the importance of PageRank and links as well as directories and social bookmarking sites.

Research limitations/implications

Only the top 50 web sites for a specific query were analyzed. Analysing more web sites and comparing with similar studies in different competition would provide more concrete results.

Practical implications

The paper offers a revealing insight into the techniques used by industry experts to rank highly in Google, and the success or otherwise of those techniques.

Originality/value

This paper fulfils an identified need for web sites and e‐commerce sites keen to attract a wider web audience.

Details

Internet Research, vol. 17 no. 1
Type: Research Article
ISSN: 1066-2243

Keywords

To view the access options for this content please click here
Article
Publication date: 21 November 2008

Ola Ågren

The purpose of this paper is to assign topic‐specific ratings to web pages.

Downloads
1137

Abstract

Purpose

The purpose of this paper is to assign topic‐specific ratings to web pages.

Design/methodology/approach

The paper uses power iteration to assign topic‐specific rating values (called relevance) to web pages, creating a ranking or partial order among these pages for each topic. This approach depends on a set of pages that are initially assumed to be relevant for a specific topic; the spatial link structure of the web pages; and a net‐specific decay factor designated ξ.

Findings

The paper finds that this approach exhibits desirable properties such as fast convergence, stability and yields relevant answer sets. The first property will be shown using theoretical proofs, while the others are evaluated through stability experiments and assessments of real world data in comparison with already established algorithms.

Research limitations/implications

In the assessment, all pages that a web spider was able to find in the Nordic countries were used. It is also important to note that entities that use domains outside the Nordic countries (e.g..com or.org) are not present in the paper's datasets even though they reside logically within one or more of the Nordic countries. This is quite a large dataset, but still small in comparison with the entire worldwide web. Moreover, the execution speed of some of the algorithms unfortunately prohibited the use of a large test dataset in the stability tests.

Practical implications

It is not only possible, but also reasonable, to perform ranking of web pages without using Markov chain approaches. This means that the work of generating answer sets for complex questions could (at least in theory) be divided into smaller parts that are later summed up to give the final answer.

Originality/value

This paper contributes to the research on internet search engines.

Details

International Journal of Web Information Systems, vol. 4 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

To view the access options for this content please click here
Article
Publication date: 13 November 2009

Jiang Li and Peter Willett

The purpose of this paper is to suggest an alternative to the widely used Times Cited criterion for analysing citation networks. The approach involves taking account of…

Abstract

Purpose

The purpose of this paper is to suggest an alternative to the widely used Times Cited criterion for analysing citation networks. The approach involves taking account of the natures of the papers that cite a given paper, so as to differentiate between papers that attract the same number of citations.

Design/methodology/approach

ArticleRank is an algorithm that has been derived from Google's PageRank algorithm to measure the influence of journal articles. ArticleRank is applied to two datasets – a citation network based on an early paper on webometrics, and a self‐citation network based on the 19 most cited papers in the Journal of Documentation – using citation data taken from the Web of Knowledge database.

Findings

ArticleRank values provide a different ranking of a set of papers from that provided by the corresponding Times Cited values, and overcomes the inability of the latter to differentiate between papers with the same numbers of citations. The difference in rankings between Times Cited and ArticleRank is greatest for the most heavily cited articles in a dataset.

Originality/value

This is a novel application of the PageRank algorithm.

Details

Aslib Proceedings, vol. 61 no. 6
Type: Research Article
ISSN: 0001-253X

Keywords

To view the access options for this content please click here
Article
Publication date: 6 November 2017

Ngurah Agus Sanjaya Er, Mouhamadou Lamine Ba, Talel Abdessalem and Stéphane Bressan

This paper aims to focus on the design of algorithms and techniques for an effective set expansion. A tool that finds and extracts candidate sets of tuples from the World…

Abstract

Purpose

This paper aims to focus on the design of algorithms and techniques for an effective set expansion. A tool that finds and extracts candidate sets of tuples from the World Wide Web was designed and implemented. For instance, when a given user provides <Indonesia, Jakarta, Indonesian Rupiah>, <China, Beijing, Yuan Renminbi>, <Canada, Ottawa, Canadian Dollar> as seeds, our system returns tuples composed of countries with their corresponding capital cities and currency names constructed from content extracted from Web pages retrieved.

Design/methodology/approach

The seeds are used to query a search engine and to retrieve relevant Web pages. The seeds are also used to infer wrappers from the retrieved pages. The wrappers, in turn, are used to extract candidates. The Web pages, wrappers, seeds and candidates, as well as their relationships, are vertices and edges of a heterogeneous graph. Several options for ranking candidates from PageRank to truth finding algorithms were evaluated and compared. Remarkably, all vertices are ranked, thus providing an integrated approach to not only answer direct set expansion questions but also find the most relevant pages to expand a given set of seeds.

Findings

The experimental results show that leveraging the truth finding algorithm can indeed improve the level of confidence in the extracted candidates and the sources.

Originality/value

Current approaches on set expansion mostly support sets of atomic data expansion. This idea can be extended to the sets of tuples and extract relation instances from the Web given a handful set of tuple seeds. A truth finding algorithm is also incorporated into the approach and it is shown that it can improve the confidence level in the ranking of both candidates and sources in set of tuples expansion.

Details

International Journal of Web Information Systems, vol. 13 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 10 of 337