Search results
1 – 10 of 32Khuram Ali Khan, Tasadduq Niaz, Đilda Pečarić and Josip Pečarić
In this work, we estimated the different entropies like Shannon entropy, Rényi divergences, Csiszár divergence by using Jensen’s type functionals. The Zipf’s–Mandelbrot law and…
Abstract
In this work, we estimated the different entropies like Shannon entropy, Rényi divergences, Csiszár divergence by using Jensen’s type functionals. The Zipf’s–Mandelbrot law and hybrid Zipf’s–Mandelbrot law are used to estimate the Shannon entropy. The Abel–Gontscharoff Green functions and Fink’s Identity are used to construct new inequalities and generalized them for
Details
Keywords
An exact, discrete formulation of Bradford's law describing the distribution of articles in journals is derived by showing that Bradford's law is a special case of the…
Abstract
An exact, discrete formulation of Bradford's law describing the distribution of articles in journals is derived by showing that Bradford's law is a special case of the Zipf‐Mandelbrot ‘rank frequency’ law. A relatively simple method is presented for fitting the model to empirical data and estimating the number of journals and articles in a subject collection. This method is demonstrated with an example application.
Felipe Mata, José Luis García‐Dorado, Javier Aracil and Jorge E. López de Vergara
This study aims to assess whether similar user populations in the Internet produce similar geographical traffic destination patterns on a per‐country basis.
Abstract
Purpose
This study aims to assess whether similar user populations in the Internet produce similar geographical traffic destination patterns on a per‐country basis.
Design/methodology/approach
The authors collected a country‐wide NetFlow trace, which encompasses the whole Spanish academic network. Such a trace comprises several similar campus networks in terms of population size and structure. To compare their behaviors, the authors propose a mixture model, which is primarily based on the Zipf‐Mandelbrot power law to capture the heavy‐tailed nature of the per‐country traffic distribution. Then, factor analysis is performed to understand the relation between the response variable, number of bytes or packets per day, with dependent variables such as the source IP network, traffic direction, and country.
Findings
Surprisingly, the results show that the geographical distribution is strongly dependent on the source IP network. Furthermore, even though there are thousands of users in a typical campus network, it turns out that the aggregation level which is required to observe a stable geographical pattern is even larger.
Practical implications
Based on these findings, conclusions drawn for one network cannot be directly extrapolated to different ones. Therefore, ISPs' traffic measurement campaigns should include an extensive set of networks to cope with the space diversity, and also encompass a significant period of time due to the large transient time.
Originality/value
Current state of the art includes some analysis of geographical patterns, but not comparisons between networks with similar populations. Such comparison can be useful for the design of content distribution networks and the cost‐optimization of peering agreements.
Details
Keywords
Since 1960, and especially during the past three years, many papers have appeared about particular manifestations and applications of a certain class of empirical laws to a field…
Abstract
Since 1960, and especially during the past three years, many papers have appeared about particular manifestations and applications of a certain class of empirical laws to a field that may be labelled conveniently ‘Bibliometrics’. This term, resuscitated by Alan Pritchard (see page 348), denotes, in my paraphrase, quantitative treatment of the properties of recorded discourse and behaviour appertaining to it.
Aims to review Fairthorne's classic article “Empirical hyperbolic distributions (Bradford‐Zipf‐Mandelbrot) for bibliometric description and prediction” (Journal of Documentation…
Abstract
Purpose
Aims to review Fairthorne's classic article “Empirical hyperbolic distributions (Bradford‐Zipf‐Mandelbrot) for bibliometric description and prediction” (Journal of Documentation, Vol. 25, pp. 319‐343, 1969), as part of a series marking the Journal of Documentation's 60th anniversary.
Design/methodology/approach
Analysis of article content, qualitative evaluation of its subsequent impact, citation analysis, and diffusion analysis.
Findings
The content, further developments and influence on the field of informetrics of this landmark paper are explained.
Originality/value
A review is given of the contents of Fairthorne's original article and its influence on the field of informetrics. Its transdisciplinary reception is measured through a diffusion analysis.
Details
Keywords
Aims to build on the work of Buckland and Hindle regarding statistical distribution as applied to the field of bibliometrics, particularly the use of empirical laws.
Abstract
Purpose
Aims to build on the work of Buckland and Hindle regarding statistical distribution as applied to the field of bibliometrics, particularly the use of empirical laws.
Design/methodology/approach
Gives examples of hyperbolic distributions that have a bearing on the bibliometric application, and discusses the characteristics of hyperbolic distributions and the Bradford distribution.
Findings
Hyperbolic distributions are the inevitable result of combinatorial necessity and a tendency to short‐term rational behaviour.
Originality/value
Supports Bradford's conclusion from his law, i.e. that to know about one's speciality, one must go outside it.
Details
Keywords
P. Sastre‐Vazquez, J.L. Usó‐Domènech and J. Mateu
It is known that a mathematical ecological model and, in general, a particular methodology of modelling, can be considered a literary text written in a formal mathematical…
Abstract
It is known that a mathematical ecological model and, in general, a particular methodology of modelling, can be considered a literary text written in a formal mathematical language. In this context, stylometric mathematical laws such as Zipf’s (range‐frequency and number‐frequency) can be applied to obtain information parameters in different semantic levels within the same model. Adapts several of these laws and introduces new elements, lexic units, operating and separating units, to carry out several statistical analyses upon two models or texts. The estimated slopes in the regression equations obtained in the present work are compared with the results of previous papers where Mandelbrot’s law was applied and comparisons between them are shown.
Details
Keywords
There are two fundamental facts about programming languages: there are lots of them; all but a handful are never used beyond the immediate circle of friends of the inventor. An…
Abstract
There are two fundamental facts about programming languages: there are lots of them; all but a handful are never used beyond the immediate circle of friends of the inventor. An exhaustive survey of all languages used over the past twenty years in Western Europe and the US would be time‐consuming and of questionable utility; however, it seems safe to suggest that the number is considerably in excess of 1,000. Sammet's latest annual survey lists 132 languages currently in use in the United States, and this can only be a minor fraction of those that have been constructed at one time or another.
Nicolas Travers, Zeinab Hmedeh, Nelly Vouzoukidou, Cedric du Mouza, Vassilis Christophides and Michel Scholl
The purpose of this paper is to present a thorough analysis of three complementary features of real-scale really simple syndication (RSS)/Atom feeds, namely, publication activity…
Abstract
Purpose
The purpose of this paper is to present a thorough analysis of three complementary features of real-scale really simple syndication (RSS)/Atom feeds, namely, publication activity, items characteristics and their textual vocabulary, that the authors believe are crucial for emerging Web 2.0 applications. Previous works on RSS/Atom statistical characteristics do not provide a precise and updated characterization of feeds’ behavior and content, characterization that can be used to successfully benchmark the effectiveness and efficiency of various Web syndication processing/analysis techniques.
Design/methodology/approach
The authors empirical study relies on a large-scale testbed acquired over an eight-month campaign from 2010. They collected a total number of 10,794,285 items originating from 8,155 productive feeds. The authors deeply analyze feeds productivity (types and bandwidth), content (XML, text and duplicates) and textual content (vocabulary and buzz-words).
Findings
The findings of the study are as follows: 17 per cent of feeds produce 97 per cent of the items; a formal characterization of feeds publication rate conducted by using a modified power law; most popular textual elements are the title and description, with the average size of 52 terms; cumulative item size follows a lognormal distribution, varying greatly with feeds type; 47 per cent of the feed-published items share the same description; the vocabulary does not belong to Wordnet terms (4 per cent); characterization of vocabulary growth using Heaps’ laws and the number of occurrences by a stretched exponential distribution conducted; and ranking of terms does not significantly vary for frequent terms.
Research limitations/implications
Modeling dedicated Web applications capacities, Defining benchmarks, optimizing Publish/Subscribe index structures.
Practical implications
It especially opens many possibilities for tuning Web applications, like an RSS crawler designed with a resource allocator and a refreshing strategy based on the Gini values and evolution to predict bursts for each feed, according to their category and class for targeted feeds; an indexing structure which matches textual items’ content, which takes into account item size according to targeted feeds, size of the vocabulary and term occurrences, updates of the vocabulary and evolution of term ranks, typos and misspelling correction; filtering by pruning items for content duplicates of different feeds and correlation of terms to easily detect replicates.
Originality/value
A content-oriented analysis of dynamic Web information.
Details