Search results
1 – 10 of over 10000Nearest neighbor imputation has a long tradition for handling item nonresponse in survey sampling. In this article, we study the asymptotic properties of the nearest neighbor…
Abstract
Nearest neighbor imputation has a long tradition for handling item nonresponse in survey sampling. In this article, we study the asymptotic properties of the nearest neighbor imputation estimator for general population parameters, including population means, proportions and quantiles. For variance estimation, we propose novel replication variance estimation, which is asymptotically valid and straightforward to implement. The main idea is to construct replicates of the estimator directly based on its asymptotically linear terms, instead of individual records of variables. The simulation results show that nearest neighbor imputation and the proposed variance estimation provide valid inferences for general population parameters.
Details
Keywords
All the modified Nearest Neighbour methods of pattern classification2–6 developed to reduce the amount of computer storage and time needed for the implementation of a NN…
Abstract
All the modified Nearest Neighbour methods of pattern classification2–6 developed to reduce the amount of computer storage and time needed for the implementation of a NN classifier require prohibitively costly data preprocessing which involves detailed examination of the neighbouring points to the elements of the reference set. In this paper a method for determining k‐nearest neighbours to a given point is described. The method uses the computationally efficient city block distance to select candidate points for the set of k‐nearest neighbours. In this way the preprocessing time is considerably reduced.
Dominique Guégan and Patrick Rakotomarolahy
Purpose – The purpose of this chapter is twofold: to forecast gross domestic product (GDP) using nonparametric method, known as multivariate k-nearest neighbors method, and to…
Abstract
Purpose – The purpose of this chapter is twofold: to forecast gross domestic product (GDP) using nonparametric method, known as multivariate k-nearest neighbors method, and to provide asymptotic properties for this method.
Methodology/approach – We consider monthly and quarterly macroeconomic variables, and to match the quarterly GDP, we estimate the missing monthly economic variables using multivariate k-nearest neighbors method and parametric vector autoregressive (VAR) modeling. Then linking these monthly macroeconomic variables through the use of bridge equations, we can produce nowcasting and forecasting of GDP.
Findings – Using multivariate k-nearest neighbors method, we provide a forecast of the euro area monthly economic indicator and quarterly GDP, which is better than that obtained with a competitive linear VAR modeling. We also provide the asymptotic normality of this k-nearest neighbors regression estimator for dependent time series, as a confidence interval for point forecast in time series.
Originality/value of chapter – We provide a new theoretical result for nonparametric method and propose a novel methodology for forecasting using macroeconomic data.
Details
Keywords
With the ever‐increasing volume of text data via the internet, it is important that documents are classified as manageable and easy to understand categories. This paper proposes…
Abstract
Purpose
With the ever‐increasing volume of text data via the internet, it is important that documents are classified as manageable and easy to understand categories. This paper proposes the use of binary k‐nearest neighbour (BKNN) for text categorization.
Design/methodology/approach
The paper describes the traditional k‐nearest neighbor (KNN) classifier, introduces BKNN and outlines experiemental results.
Findings
The experimental results indicate that BKNN requires much less CPU time than KNN, without loss of classification performance.
Originality/value
The paper demonstrates how BKNN can be an efficient and effective algorithm for text categorization. Proposes the use of binary k‐nearest neighbor (BKNN ) for text categorization.
Details
Keywords
This study aims to propose and evaluate a searching scheme for a bichromatic reverse k-nearest neighbor (BRkNN) that has objects and queries in spatial networks. In this proposed…
Abstract
Purpose
This study aims to propose and evaluate a searching scheme for a bichromatic reverse k-nearest neighbor (BRkNN) that has objects and queries in spatial networks. In this proposed scheme, the author’s search for the BRkNN of the query using an influence zone for each object with a network Voronoi diagram (NVD).
Design/methodology/approach
The author’s analyze and evaluate the performance of the proposed searching scheme.
Findings
The contribution of this paper is that it confirmed that the proposed searching scheme gives shorter processing time than the conventional linear search.
Research limitations/implications
A future direction of this study will involve making a searching scheme that reduces the processing time when objects move automatically on spatial networks.
Practical implications
In BRkNN, consider two groups in a convenience store, where several convenience stores, which are constructed in Groups A and B, operate in a given region. The author’s can use RNN is RkNN when k = 1 (RNN) effectively to set a new store considering the Euclidean and road distances among stores and the location relationship between Groups A and B.
Originality/value
In the proposed searching scheme, the author’s search for the BRkNN of the query for each object with an NVD using the influence zone, which is the region where an object in the spatial network recognizes the nearest neighbor for the query.
Details
Keywords
William McCluskey and Sarabjot Anand
Hybrid systems as the next generation of intelligent applications within the field of mass appraisal and valuation are investigated. Motivated by the obvious limitations of…
Abstract
Hybrid systems as the next generation of intelligent applications within the field of mass appraisal and valuation are investigated. Motivated by the obvious limitations of paradigms that are being used in isolation or as stand‐alone techniques such as multiple regression analysis, artificial neural networks and expert systems. Clearly, there are distinct advantages in integrating two or more information processing systems that would address some of the discrete problems of individual techniques. Examines first, the strategic development of mass appraisal approaches which have traditionally been based on “stand‐alone” techniques; second, the potential application of an intelligent hybrid system. Highlights possible solutions by investigating various hybrid systems that may be developed incorporating a nearest neighbour algorithm (k‐NN). The enhancements are aimed at two major deficiencies in traditional distance metrics; user dependence for attribute weights and biases in the distance metric towards matching categorical variables in the retrieval of neighbours. Solutions include statistical techniques: mean, coefficient of variation and significant mean. Data mining paradigms based on a loosely coupled neural network or alternatively a tight coupling with genetic algorithms are used to discover attribute weights. The hybrid architectures developed are applied to a property data set and their performance measured based on their predictive value as well as perspicuity. Concludes by considering the application and the relevance of these techniques within the field of computer assisted mass appraisal.
Details
Keywords
Wei Zhang, Xianghong Hua, Kegen Yu, Weining Qiu, Shoujian Zhang and Xiaoxing He
This paper aims to introduce the weighted squared Euclidean distance between points in signal space, to improve the performance of the Wi-Fi indoor positioning. Nowadays, the…
Abstract
Purpose
This paper aims to introduce the weighted squared Euclidean distance between points in signal space, to improve the performance of the Wi-Fi indoor positioning. Nowadays, the received signal strength-based Wi-Fi indoor positioning, a low-cost indoor positioning approach, has attracted a significant attention from both academia and industry.
Design/methodology/approach
The local principal gradient direction is introduced and used to define the weighting function and an average algorithm based on k-means algorithm is used to estimate the local principal gradient direction of each access point. Then, correlation distance is used in the new method to find the k nearest calibration points. The weighted squared Euclidean distance between the nearest calibration point and target point is calculated and used to estimate the position of target point.
Findings
Experiments are conducted and the results indicate that the proposed Wi-Fi indoor positioning approach considerably outperforms the weighted k nearest neighbor method. The new method also outperforms support vector regression and extreme learning machine algorithms in the absence of sufficient fingerprints.
Research limitations/implications
Weighted k nearest neighbor approach, support vector regression algorithm and extreme learning machine algorithm are the three classic strategies for location determination using Wi-Fi fingerprinting. However, weighted k nearest neighbor suffers from dramatic performance degradation in the presence of multipath signal attenuation and environmental changes. More fingerprints are required for support vector regression algorithm to ensure the desirable performance; and labeling Wi-Fi fingerprints is labor-intensive. The performance of extreme learning machine algorithm may not be stable.
Practical implications
The new weighted squared Euclidean distance-based Wi-Fi indoor positioning strategy can improve the performance of Wi-Fi indoor positioning system.
Social implications
The received signal strength-based effective Wi-Fi indoor positioning system can substitute for global positioning system that does not work indoors. This effective and low-cost positioning approach would be promising for many indoor-based location services.
Originality/value
A novel Wi-Fi indoor positioning strategy based on the weighted squared Euclidean distance is proposed in this paper to improve the performance of the Wi-Fi indoor positioning, and the local principal gradient direction is introduced and used to define the weighting function.
Details
Keywords
Song Zhang, Cong Li, Li Ma and Qi Li
The purpose of this paper is to introduce an improved nearest‐neighbor collaborative filtering algorithm based on rough set theory to alleviate the sparsity problem of…
Abstract
Purpose
The purpose of this paper is to introduce an improved nearest‐neighbor collaborative filtering algorithm based on rough set theory to alleviate the sparsity problem of collaborative filtering. With experimentations, the new algorithm is thereafter evaluated.
Design/methodology/approach
Nearest‐neighbor algorithm is the earliest proposed and the main collaborative filtering recommendation algorithm, and its recommendation quality is seriously influenced by the sparsity of user ratings. By using rough set theory, the nearest‐neighbor collaborative filtering algorithm can be improved in the sparsity data situation. The union of user rating items is used as the basis of similarity computing among users, and then a rating predicting method based on rough set theory is proposed to estimate missing values in the union of user rating items for decreasing sparsity.
Findings
The sparsity problem of collaborative filtering can be alleviated by using the union of user rating items and estimating missing values based on rough set theory. The experimental results show that the new algorithm can efficiently improve recommendation quality of collaborative filtering.
Originality/value
The union of user rating items was used as the basis of similarity computing among users. A rating prediction method based on rough set theory with an assistant method was proposed to complete the missing values in the union of user rating items. Orthogonal list was used to storage user‐item ratings matrix.
Details
Keywords
Wei Lu, Heng Ding and Jiepu Jiang
The purpose of this paper is to utilize document expansion techniques for improving image representation and retrieval. This paper proposes a concise framework for tag-based image…
Abstract
Purpose
The purpose of this paper is to utilize document expansion techniques for improving image representation and retrieval. This paper proposes a concise framework for tag-based image retrieval (TBIR).
Design/methodology/approach
The proposed approach includes three core components: a strategy of selecting expansion (similar) images from the whole corpus (e.g. cluster-based or nearest neighbor-based); a technique for assessing image similarity, which is adopted for selecting expansion images (text, image, or mixed); and a model for matching the expanded image representation with the search query (merging or separate).
Findings
The results show that applying the proposed method yields significant improvements in effectiveness, and the method obtains better performance on the top of the rank and makes a great improvement on some topics with zero score in baseline. Moreover, nearest neighbor-based expansion strategy outperforms the cluster-based expansion strategy, and using image features for selecting expansion images is better than using text features in most cases, and the separate method for calculating the augmented probability P(q|RD) is able to erase the negative influences of error images in RD.
Research limitations/implications
Despite these methods only outperform on the top of the rank instead of the entire rank list, TBIR on mobile platforms still can benefit from this approach.
Originality/value
Unlike former studies addressing the sparsity, vocabulary mismatch, and tag relatedness in TBIR individually, the approach proposed by this paper addresses all these issues with a single document expansion framework. It is a comprehensive investigation of document expansion techniques in TBIR.
Details
Keywords
Gebeyehu Belay Gebremeskel, Chai Yi, Zhongshi He and Dawit Haile
Among the growing number of data mining (DM) techniques, outlier detection has gained importance in many applications and also attracted much attention in recent times. In the…
Abstract
Purpose
Among the growing number of data mining (DM) techniques, outlier detection has gained importance in many applications and also attracted much attention in recent times. In the past, outlier detection researched papers appeared in a safety care that can view as searching for the needles in the haystack. However, outliers are not always erroneous. Therefore, the purpose of this paper is to investigate the role of outliers in healthcare services in general and patient safety care, in particular.
Design/methodology/approach
It is a combined DM (clustering and the nearest neighbor) technique for outliers’ detection, which provides a clear understanding and meaningful insights to visualize the data behaviors for healthcare safety. The outcomes or the knowledge implicit is vitally essential to a proper clinical decision-making process. The method is important to the semantic, and the novel tactic of patients’ events and situations prove that play a significant role in the process of patient care safety and medications.
Findings
The outcomes of the paper is discussing a novel and integrated methodology, which can be inferring for different biological data analysis. It is discussed as integrated DM techniques to optimize its performance in the field of health and medical science. It is an integrated method of outliers detection that can be extending for searching valuable information and knowledge implicit based on selected patient factors. Based on these facts, outliers are detected as clusters and point events, and novel ideas proposed to empower clinical services in consideration of customers’ satisfactions. It is also essential to be a baseline for further healthcare strategic development and research works.
Research limitations/implications
This paper mainly focussed on outliers detections. Outlier isolation that are essential to investigate the reason how it happened and communications how to mitigate it did not touch. Therefore, the research can be extended more about the hierarchy of patient problems.
Originality/value
DM is a dynamic and successful gateway for discovering useful knowledge for enhancing healthcare performances and patient safety. Clinical data based outlier detection is a basic task to achieve healthcare strategy. Therefore, in this paper, the authors focussed on combined DM techniques for a deep analysis of clinical data, which provide an optimal level of clinical decision-making processes. Proper clinical decisions can obtain in terms of attributes selections that important to know the influential factors or parameters of healthcare services. Therefore, using integrated clustering and nearest neighbors techniques give more acceptable searched such complex data outliers, which could be fundamental to further analysis of healthcare and patient safety situational analysis.
Details