Search results
1 – 10 of over 60000Minghu Ha, Witold Pedrycz, Jiqiang Chen and Lifang Zheng
The purpose of this paper is to introduce some basic knowledge of statistical learning theory (SLT) based on random set samples in set‐valued probability space for the first time…
Abstract
Purpose
The purpose of this paper is to introduce some basic knowledge of statistical learning theory (SLT) based on random set samples in set‐valued probability space for the first time and generalize the key theorem and bounds on the rate of uniform convergence of learning theory in Vapnik, to the key theorem and bounds on the rate of uniform convergence for random sets in set‐valued probability space. SLT based on random samples formed in probability space is considered, at present, as one of the fundamental theories about small samples statistical learning. It has become a novel and important field of machine learning, along with other concepts and architectures such as neural networks. However, the theory hardly handles statistical learning problems for samples that involve random set samples.
Design/methodology/approach
Being motivated by some applications, in this paper a SLT is developed based on random set samples. First, a certain law of large numbers for random sets is proved. Second, the definitions of the distribution function and the expectation of random sets are introduced, and the concepts of the expected risk functional and the empirical risk functional are discussed. A notion of the strict consistency of the principle of empirical risk minimization is presented.
Findings
The paper formulates and proves the key theorem and presents the bounds on the rate of uniform convergence of learning theory based on random sets in set‐valued probability space, which become cornerstones of the theoretical fundamentals of the SLT for random set samples.
Originality/value
The paper provides a studied analysis of some theoretical results of learning theory.
Details
Keywords
Minghu Ha, Jiqiang Chen, Witold Pedrycz and Lu Sun
Bounds on the rate of convergence of learning processes based on random samples and probability are one of the essential components of statistical learning theory (SLT). The…
Abstract
Purpose
Bounds on the rate of convergence of learning processes based on random samples and probability are one of the essential components of statistical learning theory (SLT). The constructive distribution‐independent bounds on generalization are the cornerstone of constructing support vector machines. Random sets and set‐valued probability are important extensions of random variables and probability, respectively. The paper aims to address these issues.
Design/methodology/approach
In this study, the bounds on the rate of convergence of learning processes based on random sets and set‐valued probability are discussed. First, the Hoeffding inequality is enhanced based on random sets, and then making use of the key theorem the non‐constructive distribution‐dependent bounds of learning machines based on random sets in set‐valued probability space are revisited. Second, some properties of random sets and set‐valued probability are discussed.
Findings
In the sequel, the concepts of the annealed entropy, the growth function, and VC dimension of a set of random sets are presented. Finally, the paper establishes the VC dimension theory of SLT based on random sets and set‐valued probability, and then develops the constructive distribution‐independent bounds on the rate of uniform convergence of learning processes. It shows that such bounds are important to the analysis of the generalization abilities of learning machines.
Originality/value
SLT is considered at present as one of the fundamental theories about small statistical learning.
Details
Keywords
This paper aims to offer a tutorial/introduction to new statistics arising from the theory of optimal transport to empirical researchers in econometrics and machine learning.
Abstract
Purpose
This paper aims to offer a tutorial/introduction to new statistics arising from the theory of optimal transport to empirical researchers in econometrics and machine learning.
Design/methodology/approach
Presenting in a tutorial/survey lecture style to help practitioners with the theoretical material.
Findings
The tutorial survey of some main statistical tools (arising from optimal transport theory) should help practitioners to understand the theoretical background in order to conduct empirical research meaningfully.
Originality/value
This study is an original presentation useful for new comers to the field.
Details
Keywords
Nguyen Thi Dinh, Nguyen Thi Uyen Nhi, Thanh Manh Le and Thanh The Van
The problem of image retrieval and image description exists in various fields. In this paper, a model of content-based image retrieval and image content extraction based on the…
Abstract
Purpose
The problem of image retrieval and image description exists in various fields. In this paper, a model of content-based image retrieval and image content extraction based on the KD-Tree structure was proposed.
Design/methodology/approach
A Random Forest structure was built to classify the objects on each image on the basis of the balanced multibranch KD-Tree structure. From that purpose, a KD-Tree structure was generated by the Random Forest to retrieve a set of similar images for an input image. A KD-Tree structure is applied to determine a relationship word at leaves to extract the relationship between objects on an input image. An input image content is described based on class names and relationships between objects.
Findings
A model of image retrieval and image content extraction was proposed based on the proposed theoretical basis; simultaneously, the experiment was built on multi-object image datasets including Microsoft COCO and Flickr with an average image retrieval precision of 0.9028 and 0.9163, respectively. The experimental results were compared with those of other works on the same image dataset to demonstrate the effectiveness of the proposed method.
Originality/value
A balanced multibranch KD-Tree structure was built to apply to relationship classification on the basis of the original KD-Tree structure. Then, KD-Tree Random Forest was built to improve the classifier performance and retrieve a set of similar images for an input image. Concurrently, the image content was described in the process of combining class names and relationships between objects.
Details
Keywords
Distribution. The purpose of this study is to obtain the modified maximum likelihood estimator of stress–strength model using the ranked set sampling, to obtain the asymptotic and…
Abstract
Purpose
Distribution. The purpose of this study is to obtain the modified maximum likelihood estimator of stress–strength model using the ranked set sampling, to obtain the asymptotic and bootstrap confidence interval of P[Y < X], to compare the performance of author’s estimates with the estimates under simple random sampling and to apply author’s estimates on head and neck cancer.
Design/methodology/approach
The maximum likelihood estimator of R = P[Y < X], where X and Y are two independent inverse Weibull random variables common shape parameter that affect the shape of the distribution, and different scale parameters that have an effect on the distribution dispersion are given under ranked set sampling. Together with the asymptotic and bootstrap confidence interval, Monte Carlo simulation shows that this estimator performs better than the estimator under simple random sampling. Also, the asymptotic and bootstrap confidence interval under ranked set sampling is better than these interval estimators under simple random sampling. The application to head and neck cancer disease data shows that the estimator of R = P[Y < X] that shows the treatment with radiotherapy is more efficient than the treatment with a combined radiotherapy and chemotherapy under ranked set sampling that is better than these estimators under simple random sampling.
Findings
The ranked set sampling is more effective than the simple random sampling for the inference of stress-strength model based on inverse Weibull distribution.
Originality/value
This study sheds light on the author’s estimates on head and neck cancer.
Details
Keywords
M'Hamed El-Louh, Mohammed El Allali and Fatima Ezzaki
In this work, the authors are interested in the notion of vector valued and set valued Pettis integrable pramarts. The notion of pramart is more general than that of martingale…
Abstract
Purpose
In this work, the authors are interested in the notion of vector valued and set valued Pettis integrable pramarts. The notion of pramart is more general than that of martingale. Every martingale is a pramart, but the converse is not generally true.
Design/methodology/approach
In this work, the authors present several properties and convergence theorems for Pettis integrable pramarts with convex weakly compact values in a separable Banach space.
Findings
The existence of the conditional expectation of Pettis integrable mutifunctions indexed by bounded stopping times is provided. The authors prove the almost sure convergence in Mosco and linear topologies of Pettis integrable pramarts with values in (cwk(E)) the family of convex weakly compact subsets of a separable Banach space.
Originality/value
The purpose of the present paper is to present new properties and various new convergence results for convex weakly compact valued Pettis integrable pramarts in Banach space.
Details
Keywords
Eun-Suk Yang, Jong Dae Kim, Chan-Young Park, Hye-Jeong Song and Yu-Seop Kim
In this paper, the problem of a nonlinear model – specifically the hidden unit conditional random fields (HUCRFs) model, which has binary stochastic hidden units between the data…
Abstract
Purpose
In this paper, the problem of a nonlinear model – specifically the hidden unit conditional random fields (HUCRFs) model, which has binary stochastic hidden units between the data and the labels – exhibiting unstable performance depending on the hyperparameter under consideration.
Design/methodology/approach
There are three main optimization search methods for hyperparameter tuning: manual search, grid search and random search. This study shows that HUCRFs’ unstable performance depends on the hyperparameter values used and its performance is based on tuning that draws on grid and random searches. All experiments conducted used the n-gram features – specifically, unigram, bigram, and trigram.
Findings
Naturally, selecting a list of hyperparameter values based on a researchers’ experience to find a set in which the best performance is exhibited is better than finding it from a probability distribution. Realistically, however, it is impossible to calculate using the parameters in all combinations. The present research indicates that the random search method has a better performance compared with the grid search method while requiring shorter computation time and a reduced cost.
Originality/value
In this paper, the issues affecting the performance of HUCRF, a nonlinear model with performance that varies depending on the hyperparameters, but performs better than CRF, has been examined.
Details
Keywords
Intends to address a fundamental problem in maintenance engineering: how should the shutdown of a production system be scheduled? In this regard, intends to investigate a way to…
Abstract
Purpose
Intends to address a fundamental problem in maintenance engineering: how should the shutdown of a production system be scheduled? In this regard, intends to investigate a way to predict the next system failure time based on the system historical performances.
Design/methodology/approach
GM(1,1) model from the grey system theory and the fuzzy set statistics methodologies are used.
Findings
It was found out that the system next unexpected failure time can be predicted by grey system theory model as well as fuzzy set statistics methodology. Particularly, the grey modelling is more direct and less complicated in mathematical treatments.
Research implications
Many maintenance models have developed but most of them are seeking optimality from the viewpoint of probabilistic theory. A new filtering theory based on grey system theory is introduced so that any actual system functioning (failure) time can be effectively partitioned into system characteristic functioning times and repair improvement (damage) times.
Practical implications
In today's highly competitive business world, the effectively address the production system's next failure time can guarantee the quality of the product and safely secure the delivery of product in schedule under contract. The grey filters have effectively addressed the next system failure time which is a function of chronological time of the production system, the system behaviour of near future is clearly shown so that management could utilize this state information for production and maintenance planning.
Originality/value
Provides a viewpoint on system failure‐repair predictions.
Details
Keywords
Chon Van Le and Uyen Hoang Pham
This paper aims mainly at introducing applied statisticians and econometricians to the current research methodology with non-Euclidean data sets. Specifically, it provides the…
Abstract
Purpose
This paper aims mainly at introducing applied statisticians and econometricians to the current research methodology with non-Euclidean data sets. Specifically, it provides the basis and rationale for statistics in Wasserstein space, where the metric on probability measures is taken as a Wasserstein metric arising from optimal transport theory.
Design/methodology/approach
The authors spell out the basis and rationale for using Wasserstein metrics on the data space of (random) probability measures.
Findings
In elaborating the new statistical analysis of non-Euclidean data sets, the paper illustrates the generalization of traditional aspects of statistical inference following Frechet's program.
Originality/value
Besides the elaboration of research methodology for a new data analysis, the paper discusses the applications of Wasserstein metrics to the robustness of financial risk measures.
Details