Search results

1 – 10 of over 10000
Article
Publication date: 1 January 1989

EDIE M. RASMUSSEN and PETER WILLETT

The implementation of hierarchic agglomerative methods of cluster anlaysis for large datasets is very demanding of computational resources when implemented on conventional…

Abstract

The implementation of hierarchic agglomerative methods of cluster anlaysis for large datasets is very demanding of computational resources when implemented on conventional computers. The ICL Distributed Array Processor (DAP) allows many of the scanning and matching operations required in clustering to be carried out in parallel. Experiments are described using the single linkage and Ward's hierarchical agglomerative clustering methods on both real and simulated datasets. Clustering runs on the DAP are compared with the most efficient algorithms currently available implemented on an IBM 3083 BX. The DAP is found to be 2.9–7.9 times as fast as the IBM, the exact degree of speed‐up depending on the size of the dataset, the clustering method, and the serial clustering algorithm that is used. An analysis of the cycle times of the two machines is presented which suggests that further, very substantial speed‐ups could be obtained from array processors of this type if they were to be based on more powerful processing elements.

Details

Journal of Documentation, vol. 45 no. 1
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 June 2003

Jaroslav Mackerle

This paper gives a bibliographical review of the finite element and boundary element parallel processing techniques from the theoretical and application points of view. Topics…

1205

Abstract

This paper gives a bibliographical review of the finite element and boundary element parallel processing techniques from the theoretical and application points of view. Topics include: theory – domain decomposition/partitioning, load balancing, parallel solvers/algorithms, parallel mesh generation, adaptive methods, and visualization/graphics; applications – structural mechanics problems, dynamic problems, material/geometrical non‐linear problems, contact problems, fracture mechanics, field problems, coupled problems, sensitivity and optimization, and other problems; hardware and software environments – hardware environments, programming techniques, and software development and presentations. The bibliography at the end of this paper contains 850 references to papers, conference proceedings and theses/dissertations dealing with presented subjects that were published between 1996 and 2002.

Details

Engineering Computations, vol. 20 no. 4
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 1 February 1996

Jaroslav Mackerle

Presents a review on implementing finite element methods on supercomputers, workstations and PCs and gives main trends in hardware and software developments. An appendix included…

Abstract

Presents a review on implementing finite element methods on supercomputers, workstations and PCs and gives main trends in hardware and software developments. An appendix included at the end of the paper presents a bibliography on the subjects retrospectively to 1985 and approximately 1,100 references are listed.

Details

Engineering Computations, vol. 13 no. 1
Type: Research Article
ISSN: 0264-4401

Keywords

Open Access
Article
Publication date: 7 July 2022

Sirilak Ketchaya and Apisit Rattanatranurak

Sorting is a very important algorithm to solve problems in computer science. The most well-known divide and conquer sorting algorithm is quicksort. It starts with dividing the…

1253

Abstract

Purpose

Sorting is a very important algorithm to solve problems in computer science. The most well-known divide and conquer sorting algorithm is quicksort. It starts with dividing the data into subarrays and finally sorting them.

Design/methodology/approach

In this paper, the algorithm named Dual Parallel Partition Sorting (DPPSort) is analyzed and optimized. It consists of a partitioning algorithm named Dual Parallel Partition (DPPartition). The DPPartition is analyzed and optimized in this paper and sorted with standard sorting functions named qsort and STLSort which are quicksort, and introsort algorithms, respectively. This algorithm is run on any shared memory/multicore systems. OpenMP library which supports multiprocessing programming is developed to be compatible with C/C++ standard library function. The authors’ algorithm recursively divides an unsorted array into two halves equally in parallel with Lomuto's partitioning and merge without compare-and-swap instructions. Then, qsort/STLSort is executed in parallel while the subarray is smaller than the sorting cutoff.

Findings

In the authors’ experiments, the 4-core Intel i7-6770 with Ubuntu Linux system is implemented. DPPSort is faster than qsort and STLSort up to 6.82× and 5.88× on Uint64 random distributions, respectively.

Originality/value

The authors can improve the performance of the parallel sorting algorithm by reducing the compare-and-swap instructions in the algorithm. This concept can be used to develop related problems to increase speedup of algorithms.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 1 March 1988

D.J. Evans and S. Ghanemi

The string searching problem is central to many information retrieval and text editing applications. The Brute Force algorithm is inefficient in some cases and in this article…

Abstract

The string searching problem is central to many information retrieval and text editing applications. The Brute Force algorithm is inefficient in some cases and in this article four other algorithms are discussed, of which the Boyer‐Moore and the Improved Boyer‐Moore are found to be the fastest. A parallel implementation using the divide and conquer method is examined. Comparisons using the MIMD‐type parallel computer systems are presented.

Details

Kybernetes, vol. 17 no. 3
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 22 June 2010

Imam Machdi, Toshiyuki Amagasa and Hiroyuki Kitagawa

The purpose of this paper is to propose general parallelism techniques for holistic twig join algorithms to process queries against Extensible Markup Language (XML) databases on a…

Abstract

Purpose

The purpose of this paper is to propose general parallelism techniques for holistic twig join algorithms to process queries against Extensible Markup Language (XML) databases on a multi‐core system.

Design/methodology/approach

The parallelism techniques comprised data and task parallelism. As for data parallelism, the paper adopted the stream‐based partitioning for XML to partition XML data as the basis of parallelism on multiple CPU cores. The XML data partitioning was performed in two levels. The first level was to create buckets for creating data independence and balancing loads among CPU cores; each bucket was assigned onto a CPU core. Within each bucket, the second level of XML data partitioning was performed to create finer partitions for providing finer parallelism. Each CPU core performed the holistic twig join algorithm on each finer partition of its own in parallel with other CPU cores. In task parallelism, the holistic twig join algorithm was decomposed into two main tasks, which were pipelined to create parallelism. The first task adopted the data parallelism technique and their outputs were transferred to the second task periodically. Since data transfers incurred overheads, the size of each data transfer needed to be estimated cautiously for achieving optimal performance.

Findings

The data and task parallelism techniques contribute to good performance especially for queries having complex structures and/or higher values of query selectivity. The performance of data parallelism can be further improved by task parallelism. Significant performance improvement is attained by queries having higher selectivity because more outputs computed by the second task is performed in parallel with the first task.

Research limitations/implications

The proposed parallelism techniques primarily deals with executing a single long‐running query for intra‐query parallelism, partitioning XML data on‐the‐fly, and allocating partitions on CPU cores statically. During the parallel execution, presumably there are no such dynamic XML data updates.

Practical implications

The effectiveness of the proposed parallel holistic twig joins relies fundamentally on some system parameter values that can be obtained from a benchmark of the system platform.

Originality/value

The paper proposes novel techniques to increase parallelism by combining techniques of data and task parallelism for achieving high performance. To the best of the author's knowledge, this is the first paper of parallelizing the holistic twig join algorithms on a multi‐core system.

Details

International Journal of Web Information Systems, vol. 6 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 16 March 2010

Ivan Hanuliak and Peter Hanuliak

With the availability of powerful personal computers (PCs), workstations and networking devices, the recent trend in parallel computing is to connect a number of individual…

Abstract

Purpose

With the availability of powerful personal computers (PCs), workstations and networking devices, the recent trend in parallel computing is to connect a number of individual workstations (PC and PC symmetric multiprocessor systems (SMP)) to solve computation‐intensive tasks in parallel way on such clusters (networks of workstations (NOW), SMP and Grid). In this sense, it is not more true to consider traditionally evolved parallel computing and distributed computing as two separate research disciplines. Current trends in high performance computing are to use NOW (and SMP) as a cheaper alternative to traditionally used massively parallel multiprocessors or supercomputers and to profit from unifying of both mentioned disciplines. The purpose of this paper is to consider the individual workstations could be so single PC as parallel computers based on modern SMP implemented within workstation.

Design/methodology/approach

Such parallel systems (NOW and SMP), are connected through widely used communication standard networks and co‐operate to solve one large problem. Each workstation is threatened similarly to a processing element as in a conventional multiprocessor system. But, personal processors or multiprocessors as workstations are far more powerful and flexible than the processing elements in conventional multiprocessors. To make the whole system appear to the applications as a single parallel computing engine (a virtual parallel system), run‐time environments such as OpenMP, Java (SMP), message passing interface, Java (NOW) are used to provide an extra layer of abstraction.

Findings

To exploit the parallel processing capability of such cluster, the application program must be paralleled. The effective way how to do it for (parallelisation strategy) belongs to a most important step in developing effective parallel algorithm (optimisation). To behaviour analysis, all overheads that have the influence to performance of parallel algorithms (architecture, computation, communication, etc.) have to be taken into account. In this paper, such complex performance evaluation of iterative parallel algorithms (IPA) and their practical implementations are discussed (Jacobi and Gauss‐Seidel iteration). On real application example, the various influences in process of modelling and performance evaluation and the consequences of their distributed parallel implementations are demonstrated.

Originality/value

The paper usefully shows that better load balancing can be achieved among used network nodes (performance optimisation of parallel algorithm). Generally, it claims that the parallel algorithms or their parts (processes) with more communication (similar to analyzed Gauss‐Seidel parallel algorithm) will have better speed‐up values using modern SMP parallel system as its parallel implementation in NOW. For the algorithms or processes with small communication overheads (similar to analysed Jacobi parallel algorithm) the other network nodes can be used based on single processors.

Article
Publication date: 17 June 2022

Mümin Emre Şenol and Adil Baykasoğlu

The purpose of this study is to develop a new parallel metaheuristic algorithm for solving unconstrained continuous optimization problems.

Abstract

Purpose

The purpose of this study is to develop a new parallel metaheuristic algorithm for solving unconstrained continuous optimization problems.

Design/methodology/approach

The proposed method brings several metaheuristic algorithms together to form a coalition under Weighted Superposition Attraction-Repulsion Algorithm (WSAR) in a parallel computing environment. The proposed approach runs different single solution based metaheuristic algorithms in parallel and employs WSAR (which is a recently developed and proposed swarm intelligence based optimizer) as controller.

Findings

The proposed approach is tested against the latest well-known unconstrained continuous optimization problems (CEC2020). The obtained results are compared with some other optimization algorithms. The results of the comparison prove the efficiency of the proposed method.

Originality/value

This study aims to combine different metaheuristic algorithms in order to provide a satisfactory performance on solving the optimization problems by benefiting their diverse characteristics. In addition, the run time is shortened by parallel execution. The proposed approach can be applied to any type of optimization problems by its problem-independent structure.

Details

Engineering Computations, vol. 39 no. 8
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 5 April 2024

Fangqi Hong, Pengfei Wei and Michael Beer

Bayesian cubature (BC) has emerged to be one of most competitive approach for estimating the multi-dimensional integral especially when the integrand is expensive to evaluate, and…

Abstract

Purpose

Bayesian cubature (BC) has emerged to be one of most competitive approach for estimating the multi-dimensional integral especially when the integrand is expensive to evaluate, and alternative acquisition functions, such as the Posterior Variance Contribution (PVC) function, have been developed for adaptive experiment design of the integration points. However, those sequential design strategies also prevent BC from being implemented in a parallel scheme. Therefore, this paper aims at developing a parallelized adaptive BC method to further improve the computational efficiency.

Design/methodology/approach

By theoretically examining the multimodal behavior of the PVC function, it is concluded that the multiple local maxima all have important contribution to the integration accuracy as can be selected as design points, providing a practical way for parallelization of the adaptive BC. Inspired by the above finding, four multimodal optimization algorithms, including one newly developed in this work, are then introduced for finding multiple local maxima of the PVC function in one run, and further for parallel implementation of the adaptive BC.

Findings

The superiority of the parallel schemes and the performance of the four multimodal optimization algorithms are then demonstrated and compared with the k-means clustering method by using two numerical benchmarks and two engineering examples.

Originality/value

Multimodal behavior of acquisition function for BC is comprehensively investigated. All the local maxima of the acquisition function contribute to adaptive BC accuracy. Parallelization of adaptive BC is realized with four multimodal optimization methods.

Details

Engineering Computations, vol. 41 no. 2
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 1 October 2005

Juraj Hanuliak and Ivan Hanuliak

To address the problems of high performance computing by using the networks of workstations (NOW) and to discuss the complex performance evaluation of centralised and distributed…

Abstract

Purpose

To address the problems of high performance computing by using the networks of workstations (NOW) and to discuss the complex performance evaluation of centralised and distributed parallel algorithms.

Design/methodology/approach

Defines the role of performance and performance evaluation methods using a theoretical approach. Presents concrete parallel algorithms and tabulates the results of their performance.

Findings

Sees that a network of workstations based on powerful personal computers belongs in the future and as very cheap, flexible and perspective asynchronous parallel systems. Argues that this trend will produce dynamic growth in the parallel architectures based on the networks of workstations.

Research limitations/implication

We would like to continue these experiments in order to derive more precise and general formulae for typical used parallel algorithms from linear algebra and other application oriented parallel algorithms.

Practical implications

Describes how the use of NOW can provide a cheaper alternative to traditionally used massively parallel multiprocessors or supercomputers and shows the advantages of unifying the two disciplines that are involved.

Originality/value

Produces a new approach and exploits the parallel processing capability of NOW. Gives the concrete practical examples of the method that has been developed using experimental measuring.

Details

Kybernetes, vol. 34 no. 9/10
Type: Research Article
ISSN: 0368-492X

Keywords

1 – 10 of over 10000