Semiparametric tail-index estimation for randomly right-truncated heavy-tailed data

Saida Mancer (Universite Mohamed Khider de Biskra, Biskra, Algeria)

Abdelhakim Necir (Department of Mathematics, University of Biskra, Biskra, Algeria)

Souad Benchaira (Universite Mohamed Khider de Biskra, Biskra, Algeria)

Arab Journal of Mathematical Sciences

ISSN: 1319-5166

Article publication date: 27 June 2022

Issue publication date: 2 July 2024

Downloads

643

pdf (569 KB)

Abstract

Purpose

The purpose of this paper is to propose a semiparametric estimator for the tail index of Pareto-type random truncated data that improves the existing ones in terms of mean square error. Moreover, we establish its consistency and asymptotic normality.

Design/methodology/approach

To construct a root mean squared error (RMSE)-reduced estimator of the tail index, the authors used the semiparametric estimator of the underlying distribution function given by Wang (1989). This allows us to define the corresponding tail process and provide a weak approximation to this one. By means of a functional representation of the given estimator of the tail index and by using this weak approximation, the authors establish the asymptotic normality of the aforementioned RMSE-reduced estimator.

Findings

In basis on a semiparametric estimator of the underlying distribution function, the authors proposed a new estimation method to the tail index of Pareto-type distributions for randomly right-truncated data. Compared with the existing ones, this estimator behaves well both in terms of bias and RMSE. A useful weak approximation of the corresponding tail empirical process allowed us to establish both the consistency and asymptotic normality of the proposed estimator.

Originality/value

A new tail semiparametric (empirical) process for truncated data is introduced, a new estimator for the tail index of Pareto-type truncated data is introduced and asymptotic normality of the proposed estimator is established.

Keywords

Citation

Mancer, S., Necir, A. and Benchaira, S. (2024), "Semiparametric tail-index estimation for randomly right-truncated heavy-tailed data", Arab Journal of Mathematical Sciences, Vol. 30 No. 2, pp. 171-196. https://doi.org/10.1108/AJMS-02-2022-0033

Publisher

:

Emerald Publishing Limited

License

Published in the Arab Journal of Mathematical Sciences. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) license. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this license may be seen at http://creativecommons.org/licences/by/4.0/legalcode

1. Introduction

Let Xi,Yi, i = 1, …, N ≥ 1 be a sample from a couple X,Y of independent positive random variables (rv’s) defined over a probability space Ω,A,P, with continuous distribution functions (df’s) F and G, respectively. Suppose that X is right-truncated by Y, in the sense that X_i is only observed when X_i ≤ Y_i. Thus, let us denote Xi,Yi, i = 1, …, n to be the observed data, as copies of a couple of dependent rv’s X,Y corresponding to the truncated sample Xi,Yi, i = 1, …, N, where n = n_N is a random sequence of discrete rv’s. By the weak law of large numbers, we have

(1.1)n/N→Pp≔PX≤Y=∫0∞FwdGw, as N→∞,

where the notation →P stands for the convergence in probability. The constant p corresponds to the probability of observed sample which is supposed to be non-null, otherwise nothing is observed. The truncation phenomena frequently occurs in medical studies, when one wants to study the length of survival after the start of the disease: if Y denotes the elapsed time between the onset of the disease and death, and if the follow-up period starts X units of time after the onset of the disease then, clearly, X is right-truncated by Y. For concrete examples of truncated data in medical treatments one refers, among others, to Refs. [1, 2]. Truncated data schemes may also occur in many other fields, namely actuarial sciences, astronomy, demography and epidemiology, see for instance the textbook of [3].

From [4] the marginal df’s F^* and G^* corresponding to the joint df of X,Y are given by

F*x≔p−1∫0xG¯wdFw and G*x≔p−1∫0xFwdGw.

By the previous first equation, we derive a representation of the underlying df F as follows:

(1.2)Fx=p∫0xdF*wG¯w,

which will be for a great interest thereafter. In the sequel, we are dealing with the concept of regular variation. A function φ is said to be regularly varying at infinity with negative index − 1/η, notation φ∈RV−1/η, if

(1.3)φst/φt→s−1/η, as t→∞,

for s > 0. This relation is known as the first-order condition of regular variation and the corresponding uniform convergence is formulated in terms of “Potter’s inequalities” as follows: for any small ϵ > 0, there exists t₀ > 0 such that for any t ≥ t₀ and s ≥ 1, we have

(1.4)1−ϵs−1/η−ϵ<φst/φt<1+ϵs−1/η+ϵ.

See for instance Proposition B.1.9 (assertion 5, page 367) in Ref. [5]. The second-order condition (see Ref. [6] expresses the rate of the convergence 1.3 above. For any x > 0, we have

(1.5)φtx/φt−x−1/ηAt→x−1/ηxτ/η−1τη, as t→∞,

where τ < 0 denotes the second-order parameter and A is a function tending to zero and not changing signs near infinity with regularly varying absolute value with positive index τ/η. A function φ that satisfies assumption 1.5 is denoted φ∈RV2−1/η;τ,A. We now have enough material to tackle the main goal of the paper. To begin, let us assume that the tails of both df’s F and G are regularly varying. That is

(1.6)F¯∈RV−1/γ1 and G¯∈RV−1/γ2, with γ1,γ2>0.

Under this assumption, [4] showed that

(1.7)F¯*∈RV−1/γ1 and G¯*∈RV−1/γ,

where

(1.8)γ≔γ1γ2γ1+γ2.

For further details on the proof of this statement one refers to Ref. [7] (Lemma A1). The estimation of the tail index γ₁ was recently addressed for the first time in Ref. [4] where the authors used equation 1.8 to propose an estimator to γ₁ as a ratio of Hill estimators [8] of the tail indices γ and γ₂. These estimators are based on the top order statistics X_n−k:n ≤ … ≤ X_n:n and Y_n−k:n ≤ … ≤ Y_n:n pertaining to the samples X1,…,Xn and Y1,…,Yn respectively. The sample fraction k = k_n being a sequence of integers such that, k_n → ∞ and k_n/n → 0 as n → ∞. The asymptotic normality of the given estimator is established in Ref. [9]. By using a Lynden-Bell integral, [10] proposed the following estimator for the tail index γ₁:

γ^1Wu≔1F¯n1u∑i=1n1Xi>uFn1XiCnXilogXiu,

for a given deterministic threshold u > 0, where

Fn1x≔∏Xi>x1−1nCnXi,

is the popular nonparametric maximum likelihood estimator of cdf F introduced in the well-known work [11]; with

Cnx≔1n∑i=1n1Xi≤x≤Yi.

Independently, [7] used a Woodroofe integral with a random threshold, to derive the following estimator

(1.9)γ^1BMN≔1F¯n2Xn−k:n∑i=1kFn2Xn−i+1:nCnXn−i+1:nlogXn−i+1:nXn−k:n,

where

Fn2x≔∏Xi>xexp−1nCnXi,

is the so-called Woodroofe’s nonparametric estimator [12] of df F. To improve the performance of γ^1BMN, [13, 14], respectively, proposed a Kernel-smoothed and a reduced-bias versions of this estimator and established their consistency and asymptotic normality. It is worth mentioning that Lynden-Bell integral estimator γ^1Wu with a random threshold u = X_n−k:n becomes

(1.10)γ^1W≔1F¯n1Xn−k:n∑i=1kFn1Xn−i+1:nCnXn−i+1:nlogXn−i+1:nXn−k:n.

In a simulation study, [15] compared this estimator with γ^1BMN. They pointed out that both estimators have similar behaviors in terms of biases and mean squared errors.

Recall that the nonparametric Lynden-Bell estimator Fn1 was constructed on the basis of the fact that F and G are both unknown. In this paper, we are dealing with the situation when F is unknown but G is parametrized by a known model G_θ, θ∈Θ⊂Rd, d ≥ 1 having a density g_θ with respect to Lebesgue measure. [2] considered this assumption and introduced a semiparametric estimator for df F defined by

(1.11)Fnx;θ^n≔Pnθ^n1n∑i=1n1Xi≤xG¯θ^nXi,

where 1/Pnθ^n≔n−1∑i=1n1/G¯θ^nXi and

(1.12)θ^n≔argmaxθ∈Θ∏i=1ngθYi/G¯θXi,

denoting the conditional maximum likelihood estimator (CMLE) of θ, which is consistent and asymptotically normal, see for instance Ref. [16]. On the other hand, [2] showed that Fnx;θ^n is an uniformly consistent estimator over the x-axis and established, under suitable regularity assumptions, its asymptotic normality. [2, 17] pointed out that the semiparametric estimate has greater efficiency uniformly over the x-axis. In the light of a simulation study, the authors suggest that the semiparametric estimate is a better choice when parametric information of the truncation distribution is available. Since the apparition of this estimation method many papers are devoted to the statistical inference with truncation data, see for instance Refs. [18–22] and [23].

Motivated by the features of the semiparametric estimation, we next propose a new estimator for γ₁ by means of a suitable functional of Fnx;θ^n. We start our construction by noting that from Theorem 1.2.2 in de [5]; the first-order condition 1.6 (for F) implies that

(1.13)limt→∞1F¯t∫t∞⁡logx/tdFx=γ1.

In other words, γ₁ may viewed as a functional ψtF, for a large t, where

ψtF≔1F¯t∫t∞⁡logx/tdFx.

Replacing F by Fn⋅;θ^n and letting t = X_n−k:n yield

(1.14)γ^1=ψXn−k:nFn⋅;θ^n=1F¯nXn−k:n;θ^n∫Xn−k:n∞⁡logx/Xn−k:ndFnx;θ^n,

as new estimator for γ₁. Observe that

∫t∞⁡logx/tdFnx;θ^n=Pnθ^∫Xn−k:n∞⁡logx/Xn−k:n1x≥Xn−kdFnx;θ^n,

which may be rewritten into

Pnθ^n1n∑i=1n∫Xn−k:n∞logx/Xn−k:n1x≥Xn−kG¯θ^nXid1Xi≤x=Pnθ^n1n∑i=1klogXn−i+1/Xn−k:nG¯θ^nXn−i+1:n.

On the other hand, FXn−k:n;θ^n equals

Pnθ^n1n∑i=1n1Xi:n≤Xn−k:nG¯θ^nXi:n=Pnθ^n1n∑i=1n−k1/G¯θ^nXi:n.

Hence,

F¯Xn−k:n;θ^n=1n∑i=1n1/G¯θ^nXi:n−1n∑i=1n−k1/G¯θ^nXi:n1n∑i=1n1/G¯θ^nXi:n=Pnθ^n1n∑i=1k1/G¯θ^nXn−i+1:n.

Thereby, the form of our new estimator is

(1.15)γ^1=∑i=1kG¯θ^nXn−i+1:n−1⁡logXn−i+1/Xn−k:n∑i=1kG¯θ^nXn−i+1:n−1.

The asymptotic behavior of γ^1 will be established by means of the following tail empirical process

Dnx;θ^n;γ1≔kF¯nxXn−k:n;θ^nF¯nXn−k:n;θ^n−x−1/γ1,for x>1.

This method was already used to establish the asymptotic behavior of Hill’s estimator for complete data [5]; page 162) that we will adapt to the truncation case. Indeed, by using an integration by parts and a change of variables of the integral 1.14, one gets

γ^1=∫1∞x−1F¯nxXn−k:n;θ^nF¯nXn−k:n;θ^ndx,

and therefore

(1.16)kγ^1−γ1=∫1∞x−1Dnx;θ^n;γ1dx.

Thus, for a suitable weighted weak approximation to Dn⋅;θ^n;γ1, we may easily deduce the consistency and asymptotic normality of γ^1. This process may also contribute to the goodness-of-fit test to fitting heavy-tailed distributions via, among others, the Kolmogorov–Smirnov and Cramér–von Mises type statistics

supx>1Dnx;θ^n,γ^1 and ∫1∞Dn2x;θ^n,γ^1dx−1/γ^1.

More precisely, these statistics are used when testing the null hypothesis H₀: “both F and G are heavy-tailed” versus the alternative one H₁: “at least one of F and G is not heavy-tailed”, that is H₀: “1.6 holds” versus H₁: “1.6 does not hold”. This problem has been already addressed by Refs. [24, 25] in the case of complete data. The (uniform) weighted weak convergence of Dnx;θ^n,γ1 and the asymptotic normality of γ^1, stated below, will be of great interest to establish the limit distributions of the aforementioned test statistics. This is out of the scope of this paper whose remainder is structured as follows. In Section 2, we present our main results which consist in the consistency and asymptotic normality of estimator γ^1. The performance of the proposed estimator is checked by simulation in Section 3. An application to a real dataset composed of induction times of AIDS diseases is given in Section 4. The proofs are gathered in Section 5. A useful lemma and its proof are postponed to Appendix.

2. Main results

The regularity assumptions, denoted A0, concerning the existence, consistency and asymptotic normality of the CLME estimator θ^n, given in 1.12, are discussed in Ref. [16]. Here, we only state additional conditions on df G_θ corresponding to Pareto-type models which are required to establish the asymptotic behavior of our newly estimator γ^1.

A1 For each fixed y, the function θ→Gθy is continuously differentiable of partial derivatives Gθj=:∂Gθ/∂θj, j = 1, …, d.
A2G¯θj∈RV−1/γ2.
A3y−ϵG¯θjy/G¯θy→0, as y → ∞, for any ϵ > 0.

For common Pareto-type models, one may easily check that there exist some constants a_j ≥ 0, c_j and d_j, such that G¯θjy∼cjy−1/γ2+djlog⁡y, for all large x. Then one may consider that the assumptions A1−A3 are not very restrictive and they may be acceptable in the extreme value theory.

Theorem 2.1.

Assume that F¯∈RV2−1/γ1;ρ1,A and Gθ∈RV−1/γ2 satisfying the assumptions A0−A3, and suppose that γ₁ < γ₂. Then on the probability space Ω,A,P, there exists a standard Wiener process Ws,0≤s≤1 such that, for any small 0 < ϵ < 1/2, we have

supx>1xϵDnx;θ^n,γ1−Γx;W−x−1/γ1xρ1/γ1−1ρ1γ1kAak→P0,

provided that kAak=O1, where

Γx;W≔γγ1x−1/γ1x1/γWx−1/γ−W1+γγ1+γ2x−1/γ1∫01s−γ/γ2−1x1/γWx−1/γs−Wsds,

is a centered Gaussian process and ak≔F*←1−k/n, where

F*←s≔infx:F*x≥s, 0<s<1,

denotes the quantile (or the generalized inverse) function pertaining to df F*.

Applying this weak approximation, we establish both consistency and asymptotic normality of our new estimator γ^1, that we state in the following Theorem.

Theorem 2.2.

Under the assumptions of Theorem 2.1, we have

γ^1−γ1=k−1/2∫1∞x−1Γx;Wdx+Aak∫1∞x−1/γ1−1xρ1/γ1−1ρ1γ1dx+oPk−1/2,

this implies that γ^1→Pγ1. Whenever kAak→λ<∞, we get

kγ^1−γ1→DNλ1−ρ1,σ2,

where σ2≔γ21+γ1/γ21+γ1/γ221−γ1/γ23, and 1A stands for the indicator function pertaining to a set A.

3. Simulation study

In this section, we will perform a simulation study in order to compare the finite sample behavior of our new semiparametric estimator γ^1, given in 1.15, with Woodrofee and Lynden-Bell integral estimators γ^1BMN and γ^1W, given respectively in 1.9 and 1.10. The truncation and truncated distributions functions F and G will be chosen among the following two models:

Burr γ,δ distribution with right-tail function:

H¯x=1+x1/δ−δ/γ, x≥0, δ>0, γ>0;

Fréchet γ distribution with right-tail function:

H¯x=1−exp−x−1/γ,x>0,γ>0.

The simulation study is being made in fours scenarios following to the choice of the underlying df’s F and G_θ:

S1 Burr γ1,δ truncated by Burr γ2,δ; with θ=γ2,δ
S2 Fréchet γ1 truncated by Fréchet γ2; with θ = γ₂
S3 Fréchet γ1 truncated by Burr γ2,δ; with θ=γ2,δ
S4 Burr γ1,δ truncated by Fréchet γ2; with θ = γ₂

To this end, we fix δ = 1/4 and choose the values 0.6 and 0.8 for γ₁ and 55% and 90% for the portions of observed truncated data given in 1.1 so that the assumption γ₁ < γ₂ stated in Theorem 2.1 holds. In other words, the values of p have to be greater than 50%. For each couple γ1,p, we solve the equation 1.1 to get the pertaining γ₂-value, which we summarize as follows:

(3.17)p,γ1,γ2=55%,0.6,1.4,90%,0.6,5.4,55%,0.8,1.9,90%,0.8,7.2.

For each scenario, we simulate 1000 random samples of size N = 300 and compute the root mean squared error (RMSE) and the absolute bias (ABIAS) corresponding to each estimator γ^1, γ^1BMN and γ^1W. The comparison is done by plotting the ABIAS and RMSE as functions of the sample fraction k which varies from 2 to 120. This range is chosen so that it contains the optimal number of upper extremes k* used in the computation of the tail index estimate. There are many heuristic methods to select k*, see for instance Ref. [26]; here we use the algorithm proposed by Ref. [27] in page 137, which is incorporated in the R software “Xtremes” package. Note that the computation the CMLE of θ is made by means of the syntax ”maxLik” of the MaxLik R software package. The optimal sample fraction k* is defined, in this procedure, by

k*≔argmin1<k<n1k∑i=1kiωγ^i−medianγ^1,…,γ^k,

for suitable constant 0 ≤ ω ≤ 1/2, where γ^i corresponds to an estimator of tail index γ, based on the i upper order statistics, of a Pareto-type model. We observed, in our simulation study, that ω = 0.3 allows better results both in terms of bias and RMSE. It is worth mentioning that making N vary did not provide notable findings; therefore, we kept the size N fixed. The finite sample behaviors of the above-mentioned estimators are illustrated in Figures 1–8. The overall conclusion is that the biases of three estimators are almost equal, however, in the case of medium truncation p≈50%, the RMSE of our new semiparametric γ^1 is clearly the smallest compared that of γ^1BMN and γ^1W. Actually, the medium truncation situation is the most frequently encountered in real data, while the strong truncation p≫50% remains, up to our knowledge, theoretical. In this sense, we may consider that the semiparametric estimator is more efficient than the two other ones. We point out that the two estimators γ^1BMN and γ^1W have almost the same behavior which actually was noticed before by Ref. [15]. The optimal sample fractions and estimate values of the tail index obtained through the three estimators are given in Tables 1–4.

4. Real data example

In this section, we give an application to the AIDS data set, available in the “DTDA” R package and the textbook of [28] (page 19) and already used by Ref. [1]. The data present the infection and induction times for n = 258 adults who were infected with HIV virus and developed AIDS by June 30, 1986. The variable of interest here is the time of induction T of the disease duration which elapses between the date of infection M and the date M + T of the declaration of the disease. The sample (T₁, M₁), …, (T_n, M_n) are taken between two fixed dates: “0” and “8”, i.e. between April 1, 1978, and June 30, 1986. The initial date “0” denotes an infection occurring in the three months: from April 1, 1978, to June 30, 1978. Let us assume that M and T are the observed rv’s, corresponding to the underlying rv’s M and T, given by the truncation scheme 0 ≤ M + T ≤ 8, which in turn may be rewritten into

(4.18)0≤M≤S,

where S≔8 − T. To work within the framework of the present paper, let us make the following transformations:

(4.19)X≔1S+ϵ and Y≔1M+ϵ,

where ϵ = 0.05 so that the two denominators be non-null. Thus, in view of 4.18, we have X ≤ Y, which means that X is randomly right-truncated by Y. Thereby, for the given sample (T₁, M₁), …, (T_n, M_n), from T,M, the previous transformations produce a new one (X₁, Y₁), …, (X_n, Y_n) from X,Y.

Let us now denote by F and G the df’s of the underling rv’s X and Y corresponding to the truncated rv’s X and Y, respectively. By using parametric likelihood methods, [29] fits both df’s of M and S by the two-parameter Weibull model, this implies that the df’s of F and G by may be fitted by two-parameter Fréchet model, namely Ha.rx=exp−arx−r, x > 0, a > 0, r > 0, hence both F and G are heavy-tailed. The estimated parameters corresponding to the fitting of df G are a₀ = 0.004 and r₀ = 2.1, see also [1] page 520. Thus, one may consider that df G is known and equals Gθ=Ha0,r0, where θ=a0,r0. By using the Thomas and Reiss algorithm, given above, we compute the optimal sample fraction k* corresponds to the tail index estimator γ^1 of df F is γ₁. We find

(4.20)k*=19, Xn−k:n=0.356 and γ^1=0.917.

The well-known Weissman estimator [30] of the high quantile, qv≔F−11−vn, corresponding to the underling df F is given by

q^v≔Xn−k:nvF¯nXn−k:n−γ^1,

where v=1/2n and F_n is the semiparametric estimator of df F of X given in 1.11. From the values 4.20, we get q^v=0.061. Let us now compute the high quantile of T based on the original data, T₁, …, T_n. Recall that PX≥qv=v and X=1/8−T+ϵ, this implies that PT≥1/qv−8+ϵ=v, this means that 1/q_v − 8 + ϵ is the high quantile of T, which corresponds to the end-time t_end that we want to estimate. Thereby t^end=1/q^v−8+10−2=1/0.061−8+10−2=8.40, the value the end time of induction of AIDS is: 8 years, 4 months and 24 days.

5. Proofs

5.1 Proof of Theorem 2.1

Let us first notice that the semiparametric estimator of df F given in 1.12 may be rewritten into

(5.21)Fnx;θ^n=Pnθ^n∫0xdFn*wG¯θ^nw,

and 1/Pnθ^=∫0∞dFn*w/G¯θ^nw, where Fn*w≔n−1∑i=1n1Xi≤w denotes the usual empirical df pertaining to the observed sample X₁, …, X_n. It is worth mentioning that by using the strong law of large numbers Pnθ^n→Pθ (almost surely) as n → ∞, where Pθ=1/∫0∞dF*w/G¯θw (see e.g. Lemma 3.2 in Ref. [2]. On the other hand from equation 1.2, we deduce that p=1/∫0∞dF*w/G¯w, it follows that p≡Pθ because we already assumed that G ≡ G_θ. Next we use the distribution tail

(5.22)F¯x=Pθ∫x∞dF*wG¯θw,

and its empirical counterpart

F¯nx;θ^n=Pnθ^n∫x∞dFn*wG¯θ^nw.

We begin by decomposing k−1/2Dnx;θ^n, for x > 1, into the sum of

Mn1x≔x−1/γ1F¯nxXn−k:n;θ^n−F¯nxXn−k:n;θF¯xXn−k:n,

Mn2x≔x−1/γ1F¯nxXn−k:n;θ−F¯xXn−k:nF¯xXn−k:n,

Mn3x≔−F¯xXn−k:nF¯nXn−k:n;θF¯nXn−k:n;θ−F¯Xn−k:nF¯Xn−k:n,

Mn4x≔F¯xXn−k:nF¯nXn−k:n;θ−x−1/γ1F¯nxXn−k:n;θ−F¯xXn−k:nF¯xXn−k:n

and

Mn5x≔F¯xXn−k:nF¯Xn−k:n−x−1/γ1.

Our goal is to provide a weighted weak approximation to the tail empirical process Dnx;θ^n;γ1. Let ξi≔F¯*Xi, i = 1, …, n be a sequence of independent and identically distributed rv’s. Recall that both df’s F and G_θ are assumed to be continuous, this implies that F^* is continuous as well, therefore Pξi≤u=u, this means that ξii=1,n are uniformly distributed on 0,1. Let us now define the corresponding uniform tail empirical process

(5.23)αns≔kUns−s, for 0≤s≤1,

where

(5.24)Uns≔k−1∑i=1n1ξi<ks/n,

denotes the tail empirical df pertaining to the sample ξii=1,n. In view of Proposition 3.1 of [31], there exists a Wiener process W such that for every 0 ≤ ϵ < 1/2,

(5.25)sup0≤s<1s−ϵαns−Ws→P0, as n→∞.

Let us fix a sufficiently small 0 < ϵ < 1/2. We will successively show that, under the first-order conditions of regular variation 1.6, we have, uniformly on x ≥ 1, for all large n:

(5.26)kMn2x=γγ1x1/γ2Wt−1/γ+γγ1∫x1/γ2∞Wt−γ2/γdt+oPx121γ2−1γ1+ϵ

and

(5.27)kMn3x=−x−1/γ1γγ1W1+γγ1∫1∞Wt−γ2/γdt+oPx−1/γ1+ϵ,

while

(5.28)kMn1x=oPx−1/γ1+ϵ, kMn4x=oPx121γ2−1γ1+ϵ,

and

(5.29)kMn5x=x−1/γ1xρ1/γ1−1ρ1γ1kAak+oPx−1/γ1.

Throughout the proof, without loss of generality, we assume that aϵ ≡ ϵ, for any constant a > 0. We point out that all the rest terms of the previous approximations are negligible in probability, uniformly on x > 1. Let us begin by the term Mn1x which may be made into

x−1/γ1F¯xXn−k:nPnθ^n∫x∞dFn*Xn−k:nwG¯θ^Xn−k:nw−∫x∞dFn*Xn−k:nwG¯θXn−k:nw=x−1/γ1F¯xXn−k:nPnθ^n∫x∞1G¯θ^Xn−k:nw−1G¯θXn−k:nwdFn*Xn−k:nw.

Applying the mean value theorem (for several variables) to function θ→1/G¯θ⋅, yields

1G¯θ^z−1G¯θz=∑i=1dθ^i,n−θiG¯θ̃izG¯θ̃2z, for any z>1,

where θ̃n is such that θ̃i,n is between θ_i and θ^i,n, for i = 1, …, d, therefore

Mn1x=x−1/γ1F¯xXn−k:nPnθ^n∑i=1dθ^i−θi∫x∞G¯θ̃iXn−k:nwG¯θ̃2Xn−k:nwdFn*Xn−k:nw.

Recall that by assumptions 1.6 and A2 both G¯θ and G¯θi are regularly varying with the same index −1/γ2 and, on the other hand, Xn−k:n→P∞ and w > 1 imply that Xn−k:nw→P∞. Applying Pooter’s inequalities 1.4, we get

G¯θ̃Xn−k:nwG¯θ̃Xn−k:n=1+oP1w−1/γ2+ϵ=G¯θ̃iXn−k:nwG¯θ̃iXn−k:n,

it follows that

Mn1x=1+oP1Pnθ^nx−1/γ1G¯θ̃Xn−k:nF¯xXn−k:n×∑i=1dG¯θ̃iXn−k:nG¯θ̃Xn−k:nθ^i,n−θi∫x∞w1/γ2−ϵdFn*Xn−k:nw.

Under some regularity assumptions, [16] stated that nθ^n−θ is asymptotically a centered multivariate normal rv, which implies that θ^i,n−θi=OPn−1/2 and thus θ^n→Pθ. On the other hand, by the law of large numbers Pnθ→PPθ as n → ∞, then we may readily show that Pnθ^n→PPθ as n → ∞ as well. Note that since θ^n is a consistent estimator of θ then θ̃n is too. Then by using the fact that Xn−k:n→P∞ and both conditions A1 and A3, we show readily that

Xn−k:n−ϵG¯θ̃niXn−k:nG¯θ̃nXn−k:n→P0, as n→∞,

and G¯θXn−k:n/G¯θ̃nXn−k:n→P1. In view of Lemma A1 in Ref. [7], we infer that Xn−k:n=1+oP1k/n−γ, thus

Mn1x=k/n−ϵγoPn−1/2M̃n1x,

where

M̃n1x≔x−1/γ1PθG¯θXn−k:nF¯xXn−k:n∫x∞w1/γ2−ϵdFn*Xn−k:nw.

Making use of representation 5.22, we write

(5.30)M̃n1x=x−1/γ1∫x∞G¯θXn−k:nG¯θXn−k:nwdF*Xn−k:nwF¯*Xn−k:n−1×∫x∞w1/γ2−ϵdFn*Xn−k:nwF¯*Xn−k:n.

Once again by using the routine manipulations of Potter’s inequalities, we show that the first integral in 5.30 is equal to

1+oP1∫x∞w1/γ2+ϵ/2dF*Xn−k:nwF¯*Xn−k:n.

An integration by parts to the previous integral yields

x1/γ2+ϵ/2F¯*Xn−k:nxF¯*Xn−k:n+1/γ2+ϵ/2∫x∞w1/γ2+ϵ/2−1F¯*Xn−k:nwF¯*Xn−k:ndw.

Recall that from1.7,we have F¯*∈RV−1/γ, then

F¯*Xn−k:nwF¯*Xn−k:n=1+oP1w−1/γ+ϵ/2,

uniformly on w > 1. Therefore, the previous quantity reduces into

1+oP11+1/γ2+ϵ/2−1/γ1+ϵx−1/γ1+ϵ.

Thereby the first expression between two brackets in (5.30) equals OPx1/γ1−ϵ. Let us consider the second factor in (5.30). By similar arguments as used for the first factor, we show that

x1/γ2+ϵ/2F¯n*Xn−k:nxF¯*Xn−k:n+1/γ2+ϵ/2∫x∞w1/γ2+ϵ/2F¯n*Xn−k:nwF¯*Xn−k:ndw,

multiplied by 1+oP1, uniformly on x > 1. From Lemma 7.1, we have

F¯n*Xn−k:nwF¯*Xn−k:n=OPw−1/γ+ϵ/2,

which implies that the previous expression equals OPx−1/γ1+ϵ, thus M̃n1x=OPx−1/γ+ϵ and therefore

kMn1x=k/n1/2−ϵγOPx−1/γ1+ϵ.

By assumption k/n → 0, it follows that kMn1x=oPx−1/γ1+ϵ which meets the result of (5.30). Let now consider the second term Mn2x which may be rewritten into

−x−1/γ1k/nF¯*Xn−k:nF¯Xn−k:nF¯xXn−k:nG¯θXn−k:n/F¯*Xn−k:nF¯Xn−k:n×∫x∞G¯θXn−k:nG¯θXn−k:nwdF¯n*Xn−k:nw−F¯*Xn−k:nwk/n.

In view of Potter’s inequalities, it is clear that

F¯Xn−k:nF¯*Xn−k:n/G¯θXn−k:n→Pγ1γPθ

and

F¯Xn−k:nF¯xXn−k:n→Px1/γ1.

Smirnov’s lemma (see, e.g. Lemma 2.2.3 in Ref. [5] with the fact that F¯*Xn−k:n=dξk+1:n imply that nkξk+1:n→P1, hence nkF¯*Xn−k:n=1+oP1. Therefore,

Mn2x=−1+oP1γγ1∫x∞G¯θXn−k:nG¯θXn−k:nwdF¯n*Xn−k:nw−F¯*Xn−k:nwk/n.

On the other hand, using an integration by parts yields

Mn2x=1+oP1γ1γMn21x+Mn22x,

where

Mn21x≔∫x∞F¯n*Xn−k:nw−F¯*Xn−k:nwk/ndG¯θXn−k:nG¯θXn−k:nw

and

Mn22x≔G¯θXn−k:nG¯θXn−k:nxF¯n*Xn−k:nx−F¯*xXn−k:nk/n.

By using the change of variables t=G¯θXn−k:n/G¯θXn−k:nw, it is easy to verify that

Mn21x=∫G¯θXn−k:nG¯θXn−k:nx∞nkF¯n*Gθ←1−G¯θXn−k:nt−1−F¯*Gθ←1−G¯θXn−k:nt−1dt.

Observe that

Mn21x=∫G¯θXn−k:nG¯θXn−k:nx∞Unϑnt;θ−ϑnt;θdt,

where ϑnt;θ≔nkF¯*Gθ←1−G¯θXn−k:nt−1 and U_n are the tail empirical df given in (5.24). Thereby,

kMn21x=∫G¯θXn−k:nG¯θXn−k:nx∞αnϑnt;θdt,

with α_n being the tail empirical process defined in (5.23). Let us decompose the previous integral into

∫G¯θXn−k:nG¯θXn−k:nx∞αnϑnt;θ−Wϑnt;θdt+∫G¯θXn−k:nG¯θXn−k:nx∞Wϑnt;θdt=Sn+Rn.

By applying weak approximation (5.25), we get

Sn=oP1∫G¯θXn−k:nG¯θXn−k:nx∞ϑnt;θ1/2−ϵdt.

Observe that F¯*Gθ←1−G¯θXn−k:n=F¯*Xn−k:n, thereby

ϑnt;θ=nkF¯*Xn−k:nF¯*Gθ←1−G¯θXn−k:nt−1F¯*Gθ←1−G¯θXn−k:n.

It is easy to check that F¯*Gθ←1−⋅∈RVγ2/γ, then once again by means of Pooter’s inequality, we show that ϑnt;θ=1+oP1t−γ2/γ+ϵ, therefore

Sn=oP1∫G¯θXn−k:nG¯θXn−k:nx∞t−γ2/γ+ϵ1/2−ϵdt.

By using an elementary integration, we get

Sn=oP1G¯θXn−k:nG¯θXn−k:nx−γ2/γ+ϵ1/2−ϵ+1=oPx1γ2−12γ+ϵ.

By replacing γ by its by its expression given in (1.8), we end up with

Sn=oPx121γ2−1γ1+ϵ.

The term R_n may be decomposed into

∫G¯θXn−k:nG¯θXn−k:nxx1/γ2Wϑnt;θdt+∫x1/γ2∞Wϑnt;θdt=Rn1+Rn2.

It is clear that

Rn1<supt>G¯θXn−k:nG¯θXn−k:nxWϑnt;θϑnt;θϵ∫G¯θXn−k:nG¯θXn−k:nxx1/γ2ϑnt;θϵdt.

It is ready to check, by using the change of variables ϑnt;θ=s, that the previous first factor between the curly brackets equals

sup0<s<nkF¯*Xn−k:nx;θWssϵ<sup0<s<nkF¯*Xn−k:n;θWssϵ.

From Lemma 3.2 in Ref. [31] sup0<s≤1s−δWs=OP1, for any 0 < δ < 1/2, then since nF¯*Xn−k:n;θ/k→P1, as n → ∞, we infer that

sup0<s<nkF¯*Xn−k:n;θs−ϵWs=OP1.

for all large n. On the other hand, we already pointed out above that

ϑnt;θ=1+oP1t−γ2/γ+ϵ,

which implies that the second factor is equal to

OP1∫G¯θXn−k:nG¯θXn−k:nxx1/γ2t−γ2/γ+ϵϵdt=OP1∫G¯θXn−k:nG¯θXn−k:nxx1/γ2t−ϵγ2/γ+ϵdt,

which after integration yields

OP1G¯θXn−k:nG¯θXn−k:nx−ϵγ2/γ+ϵ+1−x−1/γ−ϵγ2/γ+ϵ+1.

Recall that from formula (1.8), we have γ₂/γ > 1, then by using the mean value theorem and Pooter’s inequalities, we get Rn1=oPx−ϵ. The second term R_n2 may be decomposed into

Rn2=∫x1/γ2∞Wϑnt;θ−Wt−γ2/γdt+∫x1/γ2∞Wt−γ2/γdt.

From Proposition B.1.10 in Ref. [5], we have with high probability,

(5.31)cnt;θ:=ϑnt;θ−t−γ2/γ≤ϵt−γ2/γ−ϵ,asn→∞,

this means that supx>1supt>x1/γ2cnt;θ→P0, as n → ∞. This implies by using Levy’s modulus of continuity of the Wiener process (see, e.g. Theorem 1.1.1 in Ref. [32]) that

Wϑnt;θ−Wt−γ2/γ≤2cnt;θlog1/cnt;θ,

with high probability. By using the fact that log s < ϵs^−ϵ, for s ↓ 0 together with inequality (5.31), we show that

Wϑnt;θ−Wt−γ2/γ<2ϵt−γ2/γ−ϵ/2,

uniformly on t>x1/γ2, it follows that

∫x1/γ2∞Wϑnt;θ−Wt−γ2/γdt=oP1∫x1/γ2∞t−γ2/γ−ϵ/2dt.

Recall that the assumption γ₁ < γ₂ together with equation 1/γ = 1/γ₁ + 1/γ₂, imply that γ2/2γ>1, thus −γ2/γ−ϵ/2+1<0, therefore ∫x1/γ2∞t−γ2/γ−ϵ/2dt=oPx−1/γ1−ϵ. Then we showed that

Rn1=oPx−ϵ and Rn2=∫x1/γ2∞Wt−γ2/γdt+oPx−1/γ1−ϵ,

hence

kMn21x=Rn+Sn=∫x1/γ2∞Wt−γ2/γdt+oPx−1/γ1−ϵ+oPx121γ2−1γ1+ϵ.

It is clear that

−1γ1−ϵ−121γ2−1γ1+ϵ=−γ1+γ2+4ϵγ1γ22γ1γ2<0.

then

kMn21x=∫x1/γ2∞Wt−γ2/γdt+oPx121γ2−1γ1+ϵ.

By using similar arguments, we end up with

kMn22x=x1/γ2Wt−1/γ+oPx−1γ1+ϵ,

therefore, we omit further details. Finally, we have

kMn2x=γγ1x1/γ2Wt−1/γ+γγ1∫x1/γ2∞Wt−γ2/γdt+oPx121γ2−1γ1+ϵ.

Let us now focus on the term Mn3x. From the latter approximation, we infer that

(5.32)kMn21=kF¯nXn−k:n;θ−F¯Xn−k:nF¯Xn−k:n=γγ1W1+γγ1∫1∞Wt−γ2/γdt+oP1,

which implies that

kF¯nXn−k:n;θ−F¯Xn−k:nF¯Xn−k:n=OP1.

In other words, we have

(5.33)F¯nXn−k:n;θF¯Xn−k:n=1+OPk−1/2.

The regular variation of F¯⋅ and (5.33) together imply that

(5.34)F¯xXn−k:nF¯nXn−k:n;θ=x−1/γ1+oPx−1/γ1+ϵ.

By combining the results (5.32) and (5.34), we get

kMn3x=−x−1/γ2γγ1W1+γγ1∫1∞Wt−γ2/γdt+oPx−1/γ1+ϵ.

For the fourth term Mn4x, we write

kMn4x=F¯xXn−k:nF¯nXn−k:n;θ−x−1/γ1kF¯nxXn−k:n;θ−F¯xXn−k:nF¯xXn−k:n.

From (5.34) the first factor of the previous equation equals oPx−1/γ1+ϵ. On the other hand, the change of variables s=t−γ2/γ yields

∫x1/γ2∞Wt−γ2/γdt=γγ2∫0x−1/γs−γ/γ2−1Wsds.

Since sup0<s<1s−1/2+ϵWs=OP1, then we easily show that

∫x1/γ2∞Wt−γ2/γdt=OPx121γ2−1γ1+ϵ,

it follows that kMn2x=OPx121γ2−1γ1+ϵ as well. Therefore,

kF¯nxXn−k:n;θ−F¯xXn−k:nF¯xXn−k:n=x1/γ1OPx121γ2−1γ1+ϵ=OPx12γ+ϵ.

Hence, we have

kMn4x=oPx−1/γ1+ϵOPx12γ+ϵ=oPx121γ2−1γ1+ϵ.

By assumption, F¯ satisfies the second-order condition of regular variation (1.5), this means that for

(5.35)limt→∞F¯tx/F¯t−x−1/γ1At=x−1/γ1xρ1/γ1−1ρ1γ1,

for any x > 0, where ρ₁ < 0 is the second-order parameter and A is RVρ1/γ1. The uniform inequality corresponding to 5.35 says: there exist t₀ > 0, such that for any t > t₀, we have

F¯tx/F¯t−x−1/γ1At−x−1/γ1xρ1/γ1−1ρ1γ1<ϵx−1/γ1+ρ1/γ1+ϵ,

see for instance assertion (2.3.23) of Theorem 2.3.9 in Ref. [5]. It is easy to check that the latter inequality implies that

kMn5x=kF¯xXn−k:nF¯Xn−k:n−x−1/γ1=x−1/γ1xρ1/γ1−1ρ1γ1kAXn−k:n+oPx−1/γ1xρ1/γ1−1ρ1γ1kAXn−k:n.

Recall that ak=F*←1−k/n and notice that Xn−k:n/ak→P1 as n → ∞, then in view of the regular variation of A, we infer that AXn−k:n=1+oP1Aak. On the other hand, by assumption kAak is asymptotically bounded, therefore

kMn5x=x−1/γ1xρ1/γ1−1ρ1γ1kAak+oPx−1/γ1.

To summarize, at this stage, we showed that

Dnx;θ^=γγ1x1/γ2Wt−1/γ+γγ1∫x1/γ2∞Wt−γ2/γdt−x−1/γ2γγ1W1+γγ1∫1∞Wt−γ2/γdt+x−1/γ1xρ1/γ1−1ρ1γ1kAak+ςx,

where ςx≔oPx−1/γ1+ϵ+oPx−1/γ1+oPx121γ2−1γ1+ϵ. By using a change of variables, we show that sum of the first three terms equals the Gaussian process Γx;W stated in Theorem 2.1. Recall that γ₁ < γ₂ and

121γ2−1γ1+ϵ<0,

then it is easy to verify that ςx=oPx121γ2−1γ1+ϵ. It follows that

xϵDnx;θ^−Γx;W−x−1/γ1xρ1/γ1−1ρ1γ1kAak=oPx121γ2−1γ1+2ϵ=oP1,

uniformly on x > 1, therefore

supx>1xϵDnx;θ^−Γx;W−x−1/γ1xρ1/γ1−1ρ1γ1kAak=oP1,

for any sample 0 < ϵ < 1/2, which completes the proof of Theorem 2.1.

5.2 Proof of Theorem 2.2

From the representation 1.16, we write

γ^1−γ1=Tn1+Tn2+Tn3,

where

Tn1≔k−1/2∫1∞x−1Dnx;θ^;γ1−Γx;W−x−1/γ1xρ1/γ1−1ρ1γ1kAakdx

Tn2≔k−1/2∫1∞x−1Γx;Wdx

and

Tn3≔−Aak∫1∞x−1/γ1−1xρ1/γ1−1ρ1γ1dx.

Using Theorem 2.1 yields Tn1=oPk−1/2∫1∞x−1+ϵdx=oPk−1/2=oP1. SinceEWs≤s1/2, then it is easy to show that ∫1∞x−1Γx;Wdx=OP1, it follows that Tn2=OPk−1/2=oP1. Using an elementary integration, we get Tn3=Aak/1−ρ1 which tends to zero as n → ∞, because a_k → ∞ and A is regularly varying with negative index. Therefore, γ^1→Pγ1, as n → ∞ which gives the first result of Theorem. To establish the asymptotic normality, we write

kγ^1−γ1=kTn1+kTn2+kTn3,

where

kTn1=oP1,kTn2=∫1∞x−1Γx;Wdx

and

kTn3=kAak1−ρ1.

Note that Γx;W is a centered Gaussian process and by using the assumption kAak→λ<∞, we end up with

kγ^1−γ1→DNλ1−ρ1,E∫1∞x−1Γx;Wdx2.

By elementary calculations (we omit the details), we show that

E∫1∞x−1Γx;Wdx2=σ2.

6. Conclusion

On the basis of a semiparametric estimator of the underlying distribution function, we proposed a new estimation method to the tail index of Pareto-type distributions for randomly right-truncated data. Compared with the existing ones, this estimator behaves well both in terms of bias and RMSE. A useful weak approximation of the corresponding tail empirical process allowed us to establish both the consistency and asymptotic normality of the proposed estimator.

Figures

Figure 1

Absolute bias (left two panels) and RMSE (right two panels) of γ^1 (black) and γ^1BMN (red) and γ^1W(blue), corresponding to two situations of scenario S1:γ1=0.6,p=55% (top two panels) and γ1=0.6,p=90% (bottom two panels) based on 1000 samples of size 300

Figure 2

Absolute bias (left two panels) and RMSE (right two panels) of γ^1 (black) and γ^1BMN (red) and γ^1W(blue), corresponding to two situations of scenario S1:γ1=0.8,p=55% (top two panels) and γ1=0.8,p=90% (bottom two panels) based on 1000 samples of size 300

Figure 3

Absolute bias (left two panels) and RMSE (right two panels) of γ^1 (black) and γ^1BMN (red) and γ^1W(blue), corresponding to two situations of scenario S2:γ1=0.6,p=55% (top two panels) and γ1=0.6,p=90% (bottom two panels) based on 1000 samples of size 300

Figure 4

Absolute bias (left two panels) and RMSE (right two panels) of γ^1 (black) and γ^1BMN (red) and γ^1W(blue), corresponding to two situations of scenario S2:γ1=0.8,p=55% (top two panels) and γ1=0.8,p=90% (bottom two panels) based on 1000 samples of size 300

Figure 5

Absolute bias (left two panels) and RMSE (right two panels) of γ^1 (black) and γ^1MBN (red) and γ^1W(blue), corresponding to two situations of scenario S3:γ1=0.6,p=55% (top two panels) and γ1=0.6,p=90% (bottom two panels) based on 1000 samples of size 300

Figure 6

Absolute bias (left two panels) and RMSE (right two panels) of γ^1 (black) and γ^1BMN (red) and γ^1W(blue), corresponding to two situations of scenario S3:γ1=0.8,p=55% (top two panels) and γ1=0.8,p=90% (bottom two panels) based on 1000 samples of size 300

Figure 7

Absolute bias (left two panels) and RMSE (right two panels) of γ^1 (black) and γ^1BMN (red) and γ^1W(blue), corresponding to two situations of scenario S4:γ1=0.6,p=55% (top two panels) and γ1=0.6,p=90% (bottom two panels) based on 1000 samples of size 300

Figure 8

Absolute bias (left two panels) and RMSE (right two panels) of γ^1 (black) and γ^1BMN (red) and γ^1W(blue), corresponding to two situations of scenario S4:γ1=0.8,p=55% (top two panels) and γ1=0.8,p=90% (bottom two panels) based on 1000 samples of size 300

Table 1

Optimal sample fractions and estimate values of the tail index γ₁ = 0.6 based on 1,000 samples of size 300 for the four scenarios with p = 0.55

	k*	γ^1	k*	γ^1BMN	k*	γ^1W
S1	44	0.600	41	0.599	40	0.600
S2	18	0.601	17	0.600	16	0.597
S3	21	0.601	20	0.601	19	0.599
S4	30	0.603	27	0.600	25	0.598

Table 2

Optimal sample fractions and estimate values of the tail index γ₁ = 0.6 based on 1,000 samples of size 300 for the four scenarios with p = 0.9

	k*	γ^1	k*	γ^1BMN	k*	γ^1W
S1	82	0.610	82	0.611	82	0.611
S2	37	0.640	37	0.640	37	0.640
S3	46	0.633	37	0.625	37	0.625
S4	52	0.610	52	0.610	52	0.610

Table 3

Optimal sample fractions and estimate values of the tail index γ₁ = 0.8 based on 1,000 samples of size 300 for the four scenarios with p = 0.55

	k*	γ^1	k*	γ^1BMN	k*	γ^1W
S1	59	0.799	57	0.800	54	0.799
S2	21	0.803	21	0.803	20	0.799
S3	24	0.802	22	0.798	22	0.801
S4	51	0.799	52	0.800	50	0.801

Table 4

Optimal sample fractions and estimate values of the tail index γ₁ = 0.8 based on 1,000 samples of size 300 for the four scenarios with p = 0.9

	k*	γ^1	k*	γ^1BMN	k*	γ^1W
S1	90	0.804	90	0.806	90	0.807
S2	34	0.845	34	0.846	34	0.846
S3	40	0.831	40	0.831	40	0.831
S4	71	0.814	71	0.814	71	0.815

Appendix

Lemma 7.1.

For any small ϵ > 0, we have

F¯n*Xn−k:nwF¯*Xn−k:n=OPw−1/γ+ϵ/2,uniformly on w≥1.

Proof.

Let Vnt≔n−1∑i=1n1ξi≤t be the uniform empirical df pertaining to the sample ξi≔F¯*Xi, i = 1, …, n, of independent and identically distributed uniform0,1 rv’s. It is clear that, for an arbitrary x, we have VnF¯*x=F¯n*x almost surely. From Assertion 7 in Ref. [33] (page 415), Vnt/t=OP1 uniformly on 1/n ≤ t ≤ 1, this implies that

(7.36) F¯n*Xn−k:nwF¯*Xn−k:nw=OP1, uniformly on w≥1.

On the other hand, by applying Potter’s inequalities (1.4) to F¯*, we get

(7.37)F¯*Xn−k:nwF¯*Xn−k:n=OPw−1/γ+ϵ/2, uniformly on w≥1.

Combining the two statements, (7.36) and (7.37), gives the desired result. □

References

1Lagakos SW, Barraj LM, De Gruttola V. Nonparametric analysis of truncated survival data, with applications to AIDS. Biometrika. 1988; 75: 515-23.

2Wang MC. A semiparametric model for randomly truncated data. J Amer Statist Assoc. 1989; 84: 742-8.

3Lawless JF. Statistical models and methods for lifetime data. 2nd ed. Wiley Series in Probability and Statistics; 2002.

4Gardes L, Stupfler G. Estimating extreme quantiles under random truncation. TEST. 2015; 24: 207-27.

5de Haan L, Ferreira A. Extreme value theory: an introduction. Springer; 2006.

6de Haan L, Stadtmüller U. Generalized regular variation of second order. J Aust Math. Soc. (Series A). 1996; 61: 381-95.

7Benchaira S, Meraghni D, Necir A. Tail product-limit process for truncated data with application to extreme value index estimation. Extremes. 2016a; 19: 219-51.

8Hill BM. A simple general approach to inference about the tail of a distribution. Ann Statist. 1975; 3: 1163-74.

9Benchaira S, Meraghni D, Necir A. On the asymptotic normality of the extreme value index for right-truncated data. Statist Probab Lett. 2015; 107: 378-84.

10Worms J, Worms R. A Lynden-Bell integral estimator for extremes of randomly truncated data. Statist Probab Lett. 2016; 109: 106-17.

11Lynden-Bell D. A method of allowing for known observational selection in small samples applied to 3CR quasars. Monthly Notices Roy Astron Soc. 1971; 155: 95-118.

12Woodroofe M. Estimating a distribution function with truncated data. Ann Statist. 1985; 13: 163-77.

13Benchaira S, Meraghni D, Necir A. Kernel estimation of the tail index of a right-truncated Pareto-type distribution. Statist Probab Lett. 2016b; 119: 186-93.

14Haouas N, Necir A, Brahimi B. Estimating the second-order parameter of regular variation and bias reduction in tail index estimation under random truncation. J Stat Theor Pract. 2019; 13: 110-144.

15Haouas N, Necir A, Meraghni D, Brahimi B. A Lynden-Bell integral estimator for the tail index of right-truncated data with a random threshold. Afr Stat. 2018; 12: 1159-70.

16Andersen EB. Asymptotic properties of conditional maximum-likelihood estimators. J Roy Statist Soc Ser B. 1970; 32: 283-301.

17Moreira C, de Uña-Álvarez J. A semiparametric estimator of survival for doubly truncated data. Stat Med. 2010; 29: 3147-59.

18Bilker WB, Wang MC. Asemiparametric extension of the Mann–Whitney test for randomly truncated data. Biometrics. 1996; 52: 10-20.

19Li G, Qin J, Tiwari RC. Semiparametric likelihood ratio-based inferences for truncated data. J Amer Statist Assoc. 1997; 92: 236-45.

20Moreira C, de Uña-Álvarez J, Van Keilegom I. Goodness-of-fit tests for a semiparametric model under random double truncation. Comput Statist. 2014; 29: 1365-137.

21Qin J, Wang MC. Semiparametric analysis of truncated data. Lifetime Data Anal. 2001; 7(3): 225-42.

22Shen PS. Semiparametric analysis of doubly truncated data. Comm Statist Theor Methods. 2010; 39: 3178-90.

23Shen PS, Hsu H. Conditional maximum likelihood estimation for semiparametric transformation models with doubly truncated data. Comput Statist Data Anal. 2020; 144: 15: 106862.

24Drees H, de Haan L, Li D. Approximations to the tail empirical distribution function with application to testing extreme value conditions. J Statist Plann Inference. 2006; 136: 3498-538.

25Koning AJ, Peng L. Goodness-of-fit tests for a heavy tailed distribution. J Statist Plann Inference. 2008; 138: 3960-81.

26Caeiro F, Gomes MI. Threshold selection in extreme value analysis. Chapter in. In: Dey D, Yan J, (Eds). Extreme value modeling and risk analysis: methods and applications. 9781498701297. Chapman-Hall/CRC; 2015. p. 69-87.

27Reiss RD, Thomas M. Statistical analysis of extreme values with applications to insurance, finance, hydrology and other fields. 3rd ed. Basel, Boston, Berlin: Birkhäuser Verlag; 2007.

28Klein JP, Moeschberger S. Survival analysis: techniques for censored and truncated data. Berlin: Springer; 1997. doi: 10.1007/978-1-4757-2728-9.

29Lui KJ, Lawrence DN, Morgan WM, Peterman TA, Haverkos HH, Breakman DJ. A model-based approach for estimating the mean incubation period of transfusion-associated acquired immunodeficiency syndrome. Proc Nat Acad Sc. 1986; 83: 2913-7.

30Weissman I. Estimation of parameters and large quantiles based on the k largest observations. J Am Statist Assoc. 1978; 73: 812-15.

31Einmahl JHJ, de Haan L, Li D. Weighted approximations of tail copula processes with application to testing the bivariate extreme value condition. Ann Statist. 2006; 34: 1987-2014.

32Csörgő M, Révész P. Strong approximations in probability and statistics. Probability andMathematical statistics. New York, London: Academic Press [Harcourt Brace Jovanovich, Publishers]; 1981.

33Shorack GR, Wellner JA. Empirical processes with applications to statistics. New York: Wiley; 1986.

Acknowledgements

The authors are indebted to the reviewers for their pertinent remarks and valuable suggestions that led to a real improvement of the paper.

Corresponding author

Abdelhakim Necir can be contacted at: ah.necir@univ-biskra.dz

Semiparametric tail-index estimation for randomly right-truncated heavy-tailed data

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Keywords

Citation

Publisher

License

1. Introduction

2. Main results

3. Simulation study

4. Real data example

5. Proofs

5.1 Proof of Theorem 2.1

5.2 Proof of Theorem 2.2

6. Conclusion

Figures

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Table 1

Table 2

Table 3

Table 4

References

Further reading

Acknowledgements

Corresponding author

Related articles

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Keywords

Citation

Publisher

License

1. Introduction

2. Main results

3. Simulation study

4. Real data example

5. Proofs

5.1 Proof of Theorem 2.1

5.2 Proof of Theorem 2.2

6. Conclusion

Figures

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

References

Further reading

Acknowledgements

Corresponding author

Related articles

All feedback is valuable

Report an issue or find answers to frequently asked questions