Search results

1 – 10 of over 9000
Article
Publication date: 19 June 2009

Imam Machdi, Toshiyuki Amagasa and Hiroyuki Kitagawa

The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig…

Abstract

Purpose

The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig joins: grid metadata model for XML (GMX) and streams‐based partitioning method for XML (SPX).

Design/methodology/approach

GMX exploits the relationships between XML documents and query patterns to perform workload‐aware partitioning of XML data. Specifically, the paper constructs a two‐dimensional model with a document dimension and a query dimension in which each object in a dimension is composed from XML metadata related to the dimension. GMX provides a set of XML data partitioning methods that include document clustering, query clustering, document‐based refinement, query‐based refinement, and query‐path refinement, thereby enabling XML data partitioning based on the static information of XML metadata. In contrast, SPX explores the structural relationships of query elements and a range‐containment property of XML streams to generate partitions and allocate them to cluster nodes on‐the‐fly.

Findings

GMX provides several salient features: a set of partition granularities that balance workloads of query processing costs among cluster nodes statically; inter‐query parallelism as well as intra‐query parallelism at multiple extents; and better parallel query performance when all estimated queries are executed simultaneously to meet their probability of query occurrences in the system. SPX also offers the following features: minimal computation time to generate partitions; balancing skewed workloads dynamically on the system; producing higher intra‐query parallelism; and gaining better parallel query performance.

Research limitations/implications

The current status of the proposed XML data partitioning schemes does not take into account XML data updates, e.g. new XML documents and query pattern changes submitted by users on the system.

Practical implications

Note that effectiveness of the XML data partitioning schemes mainly relies on the accuracy of the cost model to estimate query processing costs. The cost model must be adjusted to reflect characteristics of a system platform used in the implementation.

Originality/value

This paper proposes novel schemes of conducting XML data partitioning to achieve both static and dynamic workload balance.

Details

International Journal of Web Information Systems, vol. 5 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 August 1997

A. Macfarlane, S.E. Robertson and J.A. Mccann

The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for text…

Abstract

The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for text retrieval. We analyse parallel IR systems using a classification defined by Rasmussen and describe some parallel IR systems. We give a description of the retrieval models used in parallel information processing. We describe areas of research which we believe are needed.

Details

Journal of Documentation, vol. 53 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 7 July 2022

Sirilak Ketchaya and Apisit Rattanatranurak

Sorting is a very important algorithm to solve problems in computer science. The most well-known divide and conquer sorting algorithm is quicksort. It starts with dividing the data

1244

Abstract

Purpose

Sorting is a very important algorithm to solve problems in computer science. The most well-known divide and conquer sorting algorithm is quicksort. It starts with dividing the data into subarrays and finally sorting them.

Design/methodology/approach

In this paper, the algorithm named Dual Parallel Partition Sorting (DPPSort) is analyzed and optimized. It consists of a partitioning algorithm named Dual Parallel Partition (DPPartition). The DPPartition is analyzed and optimized in this paper and sorted with standard sorting functions named qsort and STLSort which are quicksort, and introsort algorithms, respectively. This algorithm is run on any shared memory/multicore systems. OpenMP library which supports multiprocessing programming is developed to be compatible with C/C++ standard library function. The authors’ algorithm recursively divides an unsorted array into two halves equally in parallel with Lomuto's partitioning and merge without compare-and-swap instructions. Then, qsort/STLSort is executed in parallel while the subarray is smaller than the sorting cutoff.

Findings

In the authors’ experiments, the 4-core Intel i7-6770 with Ubuntu Linux system is implemented. DPPSort is faster than qsort and STLSort up to 6.82× and 5.88× on Uint64 random distributions, respectively.

Originality/value

The authors can improve the performance of the parallel sorting algorithm by reducing the compare-and-swap instructions in the algorithm. This concept can be used to develop related problems to increase speedup of algorithms.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 12 June 2017

Shabia Shabir Khan and S.M.K. Quadri

As far as the treatment of most complex issues in the design is concerned, approaches based on classical artificial intelligence are inferior compared to the ones based on…

Abstract

Purpose

As far as the treatment of most complex issues in the design is concerned, approaches based on classical artificial intelligence are inferior compared to the ones based on computational intelligence, particularly this involves dealing with vagueness, multi-objectivity and good amount of possible solutions. In practical applications, computational techniques have given best results and the research in this field is continuously growing. The purpose of this paper is to search for a general and effective intelligent tool for prediction of patient survival after surgery. The present study involves the construction of such intelligent computational models using different configurations, including data partitioning techniques that have been experimentally evaluated by applying them over realistic medical data set for the prediction of survival in pancreatic cancer patients.

Design/methodology/approach

On the basis of the experiments and research performed over the data belonging to various fields using different intelligent tools, the authors infer that combining or integrating the qualification aspects of fuzzy inference system and quantification aspects of artificial neural network can prove an efficient and better model for prediction. The authors have constructed three soft computing-based adaptive neuro-fuzzy inference system (ANFIS) models with different configurations and data partitioning techniques with an aim to search capable predictive tools that could deal with nonlinear and complex data. After evaluating the models over three shuffles of data (training set, test set and full set), the performances were compared in order to find the best design for prediction of patient survival after surgery. The construction and implementation of models have been performed using MATLAB simulator.

Findings

On applying the hybrid intelligent neuro-fuzzy models with different configurations, the authors were able to find its advantage in predicting the survival of patients with pancreatic cancer. Experimental results and comparison between the constructed models conclude that ANFIS with Fuzzy C-means (FCM) partitioning model provides better accuracy in predicting the class with lowest mean square error (MSE) value. Apart from MSE value, other evaluation measure values for FCM partitioning prove to be better than the rest of the models. Therefore, the results demonstrate that the model can be applied to other biomedicine and engineering fields dealing with different complex issues related to imprecision and uncertainty.

Originality/value

The originality of paper includes framework showing two-way flow for fuzzy system construction which is further used by the authors in designing the three simulation models with different configurations, including the partitioning methods for prediction of patient survival after surgery. Several experiments were carried out using different shuffles of data to validate the parameters of the model. The performances of the models were compared using various evaluation measures such as MSE.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 10 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 22 June 2010

Imam Machdi, Toshiyuki Amagasa and Hiroyuki Kitagawa

The purpose of this paper is to propose general parallelism techniques for holistic twig join algorithms to process queries against Extensible Markup Language (XML) databases on a…

Abstract

Purpose

The purpose of this paper is to propose general parallelism techniques for holistic twig join algorithms to process queries against Extensible Markup Language (XML) databases on a multi‐core system.

Design/methodology/approach

The parallelism techniques comprised data and task parallelism. As for data parallelism, the paper adopted the stream‐based partitioning for XML to partition XML data as the basis of parallelism on multiple CPU cores. The XML data partitioning was performed in two levels. The first level was to create buckets for creating data independence and balancing loads among CPU cores; each bucket was assigned onto a CPU core. Within each bucket, the second level of XML data partitioning was performed to create finer partitions for providing finer parallelism. Each CPU core performed the holistic twig join algorithm on each finer partition of its own in parallel with other CPU cores. In task parallelism, the holistic twig join algorithm was decomposed into two main tasks, which were pipelined to create parallelism. The first task adopted the data parallelism technique and their outputs were transferred to the second task periodically. Since data transfers incurred overheads, the size of each data transfer needed to be estimated cautiously for achieving optimal performance.

Findings

The data and task parallelism techniques contribute to good performance especially for queries having complex structures and/or higher values of query selectivity. The performance of data parallelism can be further improved by task parallelism. Significant performance improvement is attained by queries having higher selectivity because more outputs computed by the second task is performed in parallel with the first task.

Research limitations/implications

The proposed parallelism techniques primarily deals with executing a single long‐running query for intra‐query parallelism, partitioning XML data on‐the‐fly, and allocating partitions on CPU cores statically. During the parallel execution, presumably there are no such dynamic XML data updates.

Practical implications

The effectiveness of the proposed parallel holistic twig joins relies fundamentally on some system parameter values that can be obtained from a benchmark of the system platform.

Originality/value

The paper proposes novel techniques to increase parallelism by combining techniques of data and task parallelism for achieving high performance. To the best of the author's knowledge, this is the first paper of parallelizing the holistic twig join algorithms on a multi‐core system.

Details

International Journal of Web Information Systems, vol. 6 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 5 October 2012

Burcu Tunga and Metin Demiralp

The plain High Dimensional Model Representation (HDMR) method needs Dirac delta type weights to partition the given multivariate data set for modelling an interpolation problem…

Abstract

Purpose

The plain High Dimensional Model Representation (HDMR) method needs Dirac delta type weights to partition the given multivariate data set for modelling an interpolation problem. Dirac delta type weight imposes a different importance level to each node of this set during the partitioning procedure which directly effects the performance of HDMR. The purpose of this paper is to develop a new method by using fluctuation free integration and HDMR methods to obtain optimized weight factors needed for identifying these importance levels for the multivariate data partitioning and modelling procedure.

Design/methodology/approach

A common problem in multivariate interpolation problems where the sought function values are given at the nodes of a rectangular prismatic grid is to determine an analytical structure for the function under consideration. As the multivariance of an interpolation problem increases, incompletenesses appear in standard numerical methods and memory limitations in computer‐based applications. To overcome the multivariance problems, it is better to deal with less‐variate structures. HDMR methods which are based on divide‐and‐conquer philosophy can be used for this purpose. This corresponds to multivariate data partitioning in which at most univariate components of the Plain HDMR are taken into consideration. To obtain these components there exist a number of integrals to be evaluated and the Fluctuation Free Integration method is used to obtain the results of these integrals. This new form of HDMR integrated with Fluctuation Free Integration also allows the Dirac delta type weight usage in multivariate data partitioning to be discarded and to optimize the weight factors corresponding to the importance level of each node of the given set.

Findings

The method developed in this study is applied to the six numerical examples in which there exist different structures and very encouraging results were obtained. In addition, the new method is compared with the other methods which include Dirac delta type weight function and the obtained results are given in the numerical implementations section.

Originality/value

The authors' new method allows an optimized weight structure in modelling to be determined in the given problem, instead of imposing the use of a certain weight function such as Dirac delta type weight. This allows the HDMR philosophy to have the chance of a flexible weight utilization in multivariate data modelling problems.

Details

Engineering Computations, vol. 29 no. 7
Type: Research Article
ISSN: 0264-4401

Keywords

Book part
Publication date: 17 November 2010

Gregory E. Smith and Cliff T. Ragsdale

Several prominent data-mining studies have evaluated the performance of neural networks (NNs) against traditional statistical methods on the two-group classification problem in…

Abstract

Several prominent data-mining studies have evaluated the performance of neural networks (NNs) against traditional statistical methods on the two-group classification problem in discriminant analysis. Although NNs often outperform traditional statistical methods, their performance can be hindered because of failings in the use of training data. This problem is particularly acute when using NNs on smaller data sets. A heuristic is presented that utilizes Mahalanobis distance measures (MDM) to deterministically partition training data so that the resulting NN models are less prone to overfitting. We show this heuristic produces classification results that are more accurate, on average, than traditional NNs and MDM.

Details

Advances in Business and Management Forecasting
Type: Book
ISBN: 978-0-85724-201-3

Article
Publication date: 27 September 2019

Giuseppe Orlando, Rosa Maria Mininni and Michele Bufalo

The purpose of this study is to suggest a new framework that we call the CIR#, which allows forecasting interest rates from observed financial market data even when rates are…

Abstract

Purpose

The purpose of this study is to suggest a new framework that we call the CIR#, which allows forecasting interest rates from observed financial market data even when rates are negative. In doing so, we have the objective is to maintain the market volatility structure as well as the analytical tractability of the original CIR model.

Design/methodology/approach

The novelty of the proposed methodology consists in using the CIR model to forecast the evolution of interest rates by an appropriate partitioning of the data sample and calibration. The latter is performed by replacing the standard Brownian motion process in the random term of the model with normally distributed standardized residuals of the “optimal” autoregressive integrated moving average (ARIMA) model.

Findings

The suggested model is quite powerful for the following reasons. First, the historical market data sample is partitioned into sub-groups to capture all the statistically significant changes of variance in the interest rates. An appropriate translation of market rates to positive values was included in the procedure to overcome the issue of negative/near-to-zero values. Second, this study has introduced a new way of calibrating the CIR model parameters to each sub-group partitioning the actual historical data. The standard Brownian motion process in the random part of the model is replaced with normally distributed standardized residuals of the “optimal” ARIMA model suitably chosen for each sub-group. As a result, exact CIR fitted values to the observed market data are calculated and the computational cost of the numerical procedure is considerably reduced. Third, this work shows that the CIR model is efficient and able to follow very closely the structure of market interest rates (especially for short maturities that, notoriously, are very difficult to handle) and to predict future interest rates better than the original CIR model. As a measure of goodness of fit, this study obtained high values of the statistics R2 and small values of the root of the mean square error for each sub-group and the entire data sample.

Research limitations/implications

A limitation is related to the specific dataset as we are examining the period around the 2008 financial crisis for about 5 years and by using monthly data. Future research will show the predictive power of the model by extending the dataset in terms of frequency and size.

Practical implications

Improved ability to model/forecast interest rates.

Originality/value

The original value consists in turning the CIR from modeling instantaneous spot rates to forecasting any rate of the yield curve.

Details

Studies in Economics and Finance, vol. 37 no. 2
Type: Research Article
ISSN: 1086-7376

Keywords

Book part
Publication date: 18 July 2016

Alan D. Olinsky, Kristin Kennedy and Michael Salzillo

Forecasting the number of bed days (NBD) needed within a large hospital network is extremely challenging, but it is imperative that management find a predictive model that best…

Abstract

Forecasting the number of bed days (NBD) needed within a large hospital network is extremely challenging, but it is imperative that management find a predictive model that best estimates the calculation. This estimate is used by operational managers for logistical planning purposes. Furthermore, the finance staff of a hospital would require an expected NBD as input for estimating future expenses. Some hospital reimbursement contracts are on a per diem schedule, and expected NBD is useful in forecasting future revenue.

This chapter examines two ways of estimating the NBD for a large hospital system, and it builds from previous work comparing time regression and an autoregressive integrated moving average (ARIMA). The two approaches discussed in this chapter examine whether using the total or combined NBD for all the data is a better predictor than partitioning the data by different types of services. The four partitions are medical, maternity, surgery, and psychology. The partitioned time series would then be used to forecast future NBD by each type of service, but one could also sum the partitioned predictors for an alternative total forecaster. The question is whether one of these two approaches outperforms the other with a best fit for forecasting the NBD. The approaches presented in this chapter can be applied to a variety of time series data for business forecasting when a large database of information can be partitioned into smaller segments.

Details

Advances in Business and Management Forecasting
Type: Book
ISBN: 978-1-78635-534-8

Keywords

Article
Publication date: 1 June 2004

K.L. Lo and Haji Izham Haji Zainal Abidin

This paper describes voltage collapse in power system networks and how it could lead to a collapse of the whole system. Discusses the effect of machine learning and artificial…

1195

Abstract

This paper describes voltage collapse in power system networks and how it could lead to a collapse of the whole system. Discusses the effect of machine learning and artificial intelligence, leading to new methods. Spotlight, the fuzzy decision tree (FDT) method and its application to voltage collapse assessments. Concludes that FDT can identify and group data sets, giving a new understanding of its application in voltage collapse analysis.

Details

COMPEL - The international journal for computation and mathematics in electrical and electronic engineering, vol. 23 no. 2
Type: Research Article
ISSN: 0332-1649

Keywords

1 – 10 of over 9000