Search results
1 – 10 of over 16000A. Macfarlane, S.E. Robertson and J.A. Mccann
The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for text…
Abstract
The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for text retrieval. We analyse parallel IR systems using a classification defined by Rasmussen and describe some parallel IR systems. We give a description of the retrieval models used in parallel information processing. We describe areas of research which we believe are needed.
Details
Keywords
A cross-platform paradigm (computing model), which combines the graphical user interface of MATLAB and parallel Fortran programming, for fluid-film lubrication analysis is…
Abstract
Purpose
A cross-platform paradigm (computing model), which combines the graphical user interface of MATLAB and parallel Fortran programming, for fluid-film lubrication analysis is proposed. The purpose of this paper is to take the advantages of effective multithreaded computing of OpenMP and MATLAB’s user-friendly interface and real-time display capability.
Design/methodology/approach
A validation of computing performance of MATLAB and Fortran coding for solving two simple sliders by iterative solution methods is conducted. The online display of the particles’ search process is incorporated in the MATLAB coding, and the execution of the air foil bearing optimum design is conducted by using OpenMP multithreaded computing in the background. The optimization analysis is conducted by particle swarm optimization method for an air foil bearing design.
Findings
It is found that the MATLAB programs require prolonged execution times than those by using Fortran computing in iterative methods. The execution time of the air foil bearing optimum design is significantly minimized by using the OpenMP computing. As a result, the cross-platform paradigm can provide a useful graphical user interface. And very little code rewritting of the original numerical models is required, which is usually optimized for either serial or parallel computing.
Research limitations/implications
Iterative methods are commonly applied in fluid-film lubrication analyses. In this study, iterative methods are used as the solution methods, which may not be an effective way to compute in the MATLAB’s setting.
Originality/value
In this study, a cross-platform paradigm consisting of a standalone MATLAB and Fortran codes is proposed. The approach combines the best of the two paradigms and each coding can be modified or maintained independently for different applications.
Details
Keywords
This paper gives a bibliographical review of the finite element and boundary element parallel processing techniques from the theoretical and application points of view. Topics…
Abstract
This paper gives a bibliographical review of the finite element and boundary element parallel processing techniques from the theoretical and application points of view. Topics include: theory – domain decomposition/partitioning, load balancing, parallel solvers/algorithms, parallel mesh generation, adaptive methods, and visualization/graphics; applications – structural mechanics problems, dynamic problems, material/geometrical non‐linear problems, contact problems, fracture mechanics, field problems, coupled problems, sensitivity and optimization, and other problems; hardware and software environments – hardware environments, programming techniques, and software development and presentations. The bibliography at the end of this paper contains 850 references to papers, conference proceedings and theses/dissertations dealing with presented subjects that were published between 1996 and 2002.
Details
Keywords
Gibran Agundis-Tinajero, Rafael Peña Gallardo, Juan Segundo-Ramírez, Nancy Visairo-Cruz and Josep M. Guerrero
The purpose of this study is to present the performance evaluation of three shooting methods typically applied to obtain the periodic steady state of electric power systems, with…
Abstract
Purpose
The purpose of this study is to present the performance evaluation of three shooting methods typically applied to obtain the periodic steady state of electric power systems, with the aim to check the benefits of the use of cloud computing regarding relative efficiency and computation time.
Design/methodology/approach
The mathematical formulation of the methods is presented, and their parallelization potential is explained. Two case studies are addressed, and the solution is computed with the shooting methods using multiple computer cores through cloud computing.
Findings
The results obtained show a reduction in the computation time and increase in the relative efficiency by the application of these methods with parallel cloud computing, in the problem of obtainment of the periodic steady state of electric power systems in an efficient way. Additionally, the characteristics of the methods, when parallel cloud computing is used, are shown and comparisons among them are presented.
Originality/value
The main advantage of employment of parallel cloud computing is a significant reduction of the computation time in the solution of the problem of a heavy computational load caused by the application of the shooting methods.
Details
Keywords
Beichuan Yan and Richard Regueiro
This paper aims to present performance comparison between O(n2) and O(n) neighbor search algorithms, studies their effects for different particle shape complexity and…
Abstract
Purpose
This paper aims to present performance comparison between O(n2) and O(n) neighbor search algorithms, studies their effects for different particle shape complexity and computational granularity (CG) and investigates the influence on superlinear speedup of 3D discrete element method (DEM) for complex-shaped particles. In particular, it aims to answer the question: O(n2) or O(n) neighbor search algorithm, which performs better in parallel 3D DEM computational practice?
Design/methodology/approach
The O(n2) and O(n) neighbor search algorithms are carefully implemented in the code paraEllip3d, which is executed on the Department of Defense supercomputers across five orders of magnitude of simulation scale (2,500; 12,000; 150,000; 1 million and 10 million particles) to evaluate and compare the performance, using both strong and weak scaling measurements.
Findings
The more complex the particle shapes (from sphere to ellipsoid to poly-ellipsoid), the smaller the neighbor search fraction (NSF); and the lower is the CG, the smaller is the NSF. In both serial and parallel computing of complex-shaped 3D DEM, the O(n2) algorithm is inefficient at coarse CG; however, it executes faster than O(n) algorithm at fine CGs that are mostly used in computational practice to achieve the best performance. This means that O(n2) algorithm outperforms O(n) in parallel 3D DEM generally.
Practical implications
Taking for granted that O(n) outperforms O(n2) unconditionally, complex-shaped 3D DEM is a misconception commonly encountered in the computational engineering and science literature.
Originality/value
The paper clarifies that performance of O(n2) and O(n) neighbor search algorithms for complex-shaped 3D DEM is affected by particle shape complexity and CG. In particular, the O(n2) algorithm outperforms the O(n) algorithm in large-scale parallel 3D DEM simulations generally, even though this outperformance is counterintuitive.
Details
Keywords
Beichuan Yan and Richard Regueiro
The purpose of this paper is to extend complex-shaped discrete element method simulations from a few thousand particles to millions of particles by using parallel computing on…
Abstract
Purpose
The purpose of this paper is to extend complex-shaped discrete element method simulations from a few thousand particles to millions of particles by using parallel computing on department of defense (DoD) supercomputers and to study the mechanical response of particle assemblies composed of a large number of particles in engineering practice and laboratory tests.
Design/methodology/approach
Parallel algorithm is designed and implemented with advanced features such as link-block, border layer and migration layer, adaptive compute gridding technique and message passing interface (MPI) transmission of C++ objects and pointers, for high performance optimization; performance analyses are conducted across five orders of magnitude of simulation scale on multiple DoD supercomputers; and three full-scale simulations of sand pluviation, constrained collapse and particle shape effect are carried out to study mechanical response of particle assemblies.
Findings
The parallel algorithm and implementation exhibit high speedup and excellent scalability, communication time is a decreasing function of the number of compute nodes and optimal computational granularity for each simulation scale is given. Nearly 50 per cent of wall clock time is spent on rebound phenomenon at the top of particle assembly in dynamic simulation of sand gravitational pluviation. Numerous particles are necessary to capture the pattern and shape of particle assembly in collapse tests; preliminary comparison between sphere assembly and ellipsoid assembly indicates a significant influence of particle shape on kinematic, kinetic and static behavior of particle assemblies.
Originality/value
The high-performance parallel code enables the simulation of a wide range of dynamic and static laboratory and field tests in engineering applications that involve a large number of granular and geotechnical material grains, such as sand pluviation process, buried explosion in various soils, earth penetrator interaction with soil, influence of grain size, shape and gradation on packing density and shear strength and mechanical behavior under different gravity environments such as on the Moon and Mars.
Details
Keywords
This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P…
Abstract
Purpose
This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P networks, clusters, clouds computing or other technologies.
Design/methodology/approach
In the age of Big Data, all companies want to benefit from large amounts of data. These data can help them understand their internal and external environment and anticipate associated phenomena, as the data turn into knowledge that can be used for prediction later. Thus, this knowledge becomes a great asset in companies' hands. This is precisely the objective of data mining. But with the production of a large amount of data and knowledge at a faster pace, the authors are now talking about Big Data mining. For this reason, the authors’ proposed works mainly aim at solving the problem of volume, veracity, validity and velocity when classifying Big Data using distributed and parallel processing techniques. So, the problem that the authors are raising in this work is how the authors can make machine learning algorithms work in a distributed and parallel way at the same time without losing the accuracy of classification results. To solve this problem, the authors propose a system called Dynamic Distributed and Parallel Machine Learning (DDPML) algorithms. To build it, the authors divided their work into two parts. In the first, the authors propose a distributed architecture that is controlled by Map-Reduce algorithm which in turn depends on random sampling technique. So, the distributed architecture that the authors designed is specially directed to handle big data processing that operates in a coherent and efficient manner with the sampling strategy proposed in this work. This architecture also helps the authors to actually verify the classification results obtained using the representative learning base (RLB). In the second part, the authors have extracted the representative learning base by sampling at two levels using the stratified random sampling method. This sampling method is also applied to extract the shared learning base (SLB) and the partial learning base for the first level (PLBL1) and the partial learning base for the second level (PLBL2). The experimental results show the efficiency of our solution that the authors provided without significant loss of the classification results. Thus, in practical terms, the system DDPML is generally dedicated to big data mining processing, and works effectively in distributed systems with a simple structure, such as client-server networks.
Findings
The authors got very satisfactory classification results.
Originality/value
DDPML system is specially designed to smoothly handle big data mining classification.
Details
Keywords
Dimitris Kehagias, Michael Grivas, Basilis Mamalis and Grammati Pantziou
The purpose of this paper is to evaluate the use of a non‐expensive dynamic computing resource, consisting of a Beowulf class cluster and a NoW, as an educational and research…
Abstract
Purpose
The purpose of this paper is to evaluate the use of a non‐expensive dynamic computing resource, consisting of a Beowulf class cluster and a NoW, as an educational and research infrastructure.
Design/methodology/approach
Clusters, built using commodity‐off‐the‐shelf (COTS) hardware components and free, or commonly used, software, provide an inexpensive computing resource to educational institutions. The Department of Informatics of TEI, Athens, has built a dynamic clustering system consisting of a Beowulf‐class cluster and a NoW called DYNER (DYNamic clustER). This paper evaluates the use of the DYNER system, as a platform for running the laboratory work of various courses (parallel computing, operating systems, distributed computing), as well as various parallel applications in the framework of research, which is in progress under on‐going research projects. Three distinct groups from the academic community of the TEI of Athens can benefit directly from the DYNER platform: the students of the Department of Informatics, the faculty members and researchers of the department, and researchers from other departments of the institution.
Findings
The results obtained were positive and satisfactory. The use of the dynamic cluster offers to the students new abilities regarding high performance computing, which will improve their potential for professional excellence.
Research limitations/implications
The implications of this research study are that the students clarified issues, such as “doubling the number of processors does not mean doubling execution speed”, and learned how to build and configure a cluster without going deeply into the complexity of the software set‐up.
Practical implications
This research provides students with the ability to gain hands‐on experience on a not very common to them but useful platform, and faculty members – from a variety of disciplines – to get more computing power for their research.
Originality/value
This paper presents a dynamic clustering system where, its versatility and flexibility with respect to configuration and functionality, together with its dynamic, strong computational power, renders it to a very helpful tool for educational and research purposes.
Details
Keywords
Emmanuel Imuetinyan Aghimien, Lerato Millicent Aghimien, Olutomilayo Olayemi Petinrin and Douglas Omoregie Aghimien
This paper aims to present the result of a scientometric analysis conducted using studies on high-performance computing in computational modelling. This was done with a view to…
Abstract
Purpose
This paper aims to present the result of a scientometric analysis conducted using studies on high-performance computing in computational modelling. This was done with a view to showcasing the need for high-performance computers (HPC) within the architecture, engineering and construction (AEC) industry in developing countries, particularly in Africa, where the use of HPC in developing computational models (CMs) for effective problem solving is still low.
Design/methodology/approach
An interpretivism philosophical stance was adopted for the study which informed a scientometric review of existing studies gathered from the Scopus database. Keywords such as high-performance computing, and computational modelling were used to extract papers from the database. Visualisation of Similarities viewer (VOSviewer) was used to prepare co-occurrence maps based on the bibliographic data gathered.
Findings
Findings revealed the scarcity of research emanating from Africa in this area of study. Furthermore, past studies had placed focus on high-performance computing in the development of computational modelling and theory, parallel computing and improved visualisation, large-scale application software, computer simulations and computational mathematical modelling. Future studies can also explore areas such as cloud computing, optimisation, high-level programming language, natural science computing, computer graphics equipment and Graphics Processing Units as they relate to the AEC industry.
Research limitations/implications
The study assessed a single database for the search of related studies.
Originality/value
The findings of this study serve as an excellent theoretical background for AEC researchers seeking to explore the use of HPC for CMs development in the quest for solving complex problems in the industry.
Details
Keywords
Mohammad Mortezazadeh and Liangzhu (Leon) Wang
The purpose of this paper is the development of a new density-based (DB) semi-Lagrangian method to speed up the conventional pressure-based (PB) semi-Lagrangian methods.
Abstract
Purpose
The purpose of this paper is the development of a new density-based (DB) semi-Lagrangian method to speed up the conventional pressure-based (PB) semi-Lagrangian methods.
Design/methodology/approach
The semi-Lagrangian-based solvers are typically PB, i.e. semi-Lagrangian pressure-based (SLPB) solvers, where a Poisson equation is solved for obtaining the pressure field and ensuring a divergence-free flow field. As an elliptic-type equation, the Poisson equation often relies on an iterative solution, so it can create a challenge of parallel computing and a bottleneck of computing speed. This study proposes a new DB semi-Lagrangian method, i.e. the semi-Lagrangian artificial compressibility (SLAC), which replaces the Poisson equation by a hyperbolic continuity equation with an added artificial compressibility (AC) term, so a time-marching solution is possible. Without the Poisson equation, the proposed SLAC solver is faster, particularly for the cases with more computational cells, and better suited for parallel computing.
Findings
The study compares the accuracy and the computing speeds of both SLPB and SLAC solvers for the lid-driven cavity flow and the step-flow problems. It shows that the proposed SLAC solver is able to achieve the same results as the SLPB, whereas with a 3.03 times speed up before using the OpenMP parallelization and a 3.35 times speed up for the large grid number case (512 × 512) after the parallelization. The speed up can be improved further for larger cases because of increasing the condition number of the coefficient matrixes of the Poisson equation.
Originality/value
This paper proposes a method of avoiding solving the Poisson equation, a typical computing bottleneck for semi-Lagrangian-based fluid solvers by converting the conventional PB solver (SLPB) to the DB solver (SLAC) through the addition of the AC term. The method simplifies and facilitates the parallelization process of semi-Lagrangian-based fluid solvers for modern HPC infrastructures, such as OpenMP and GPU computing.
Details