Search results

1 – 10 of over 75000
Article
Publication date: 19 February 2021

C. Lakshmi and K. Usha Rani

Resilient distributed processing technique (RDPT), in which mapper and reducer are simplified with the Spark contexts and support distributed parallel query processing.

Abstract

Purpose

Resilient distributed processing technique (RDPT), in which mapper and reducer are simplified with the Spark contexts and support distributed parallel query processing.

Design/methodology/approach

The proposed work is implemented with Pig Latin with Spark contexts to develop query processing in a distributed environment.

Findings

Query processing in Hadoop influences the distributed processing with the MapReduce model. MapReduce caters to the works on different nodes with the implementation of complex mappers and reducers. Its results are valid for some extent size of the data.

Originality/value

Pig supports the required parallel processing framework with the following constructs during the processing of queries: FOREACH; FLATTEN; COGROUP.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 14 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 28 September 2007

Alasdair J.G. Gray, Werner Nutt and M. Howard Williams

Distributed data streams are an important topic of current research. In such a setting, data values will be missed, e.g. due to network errors. This paper aims to allow this…

Abstract

Purpose

Distributed data streams are an important topic of current research. In such a setting, data values will be missed, e.g. due to network errors. This paper aims to allow this incompleteness to be detected and overcome with either the user not being affected or the effects of the incompleteness being reported to the user.

Design/methodology/approach

A model for representing the incomplete information has been developed that captures the information that is known about the missing data. Techniques for query answering involving certain and possible answer sets have been extended so that queries over incomplete data stream histories can be answered.

Findings

It is possible to detect when a distributed data stream is missing one or more values. When such data values are missing there will be some information that is known about the data and this is stored in an appropriate format. Even when the available data are incomplete, it is possible in some circumstances to answer a query completely. When this is not possible, additional meta‐data can be returned to inform the user of the effects of the incompleteness.

Research limitations/implications

The techniques and models proposed in this paper have only been partially implemented.

Practical implications

The proposed system is general and can be applied wherever there is a need to query the history of distributed data streams. The work in this paper enables the system to answer queries when there are missing values in the data.

Originality/value

This paper presents a general model of how to detect, represent, and answer historical queries over incomplete distributed data streams.

Details

International Journal of Web Information Systems, vol. 3 no. 1/2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 August 2005

Jinbao Li, Yingshu Li, My T. Thai and Jianzhong Li

This paper investigates query processing in MANETs. Cache techniques and multi‐join database operations are studied. For data caching, a group‐caching strategy is proposed. Using…

Abstract

This paper investigates query processing in MANETs. Cache techniques and multi‐join database operations are studied. For data caching, a group‐caching strategy is proposed. Using the cache and the index of the cached data, queries can be processed at a single node or within the group containing this single node. For multi‐join, a cost evaluation model and a query plan generation algorithm are presented. Query cost is evaluated based on the parameters including the size of the transmitted data, the transmission distance and the query cost at each single node. According to the evaluations, the nodes on which the query should be executed and the join order are determined. Theoretical analysis and experiment results show that the proposed group‐caching based query processing and the cost based join strategy are efficient in MANETs. It is suitable for the mobility, the disconnection and the multi‐hop features of MANETs. The communication cost between nodes is reduced and the efficiency of the query is improved greatly.

Details

International Journal of Pervasive Computing and Communications, vol. 1 no. 3
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 15 May 2019

Usha Manasi Mohapatra, Babita Majhi and Alok Kumar Jagadev

The purpose of this paper is to propose distributed learning-based three different metaheuristic algorithms for the identification of nonlinear systems. The proposed algorithms…

Abstract

Purpose

The purpose of this paper is to propose distributed learning-based three different metaheuristic algorithms for the identification of nonlinear systems. The proposed algorithms are experimented in this study to address problems for which input data are available at different geographic locations. In addition, the models are tested for nonlinear systems with different noise conditions. In a nutshell, the suggested model aims to handle voluminous data with low communication overhead compared to traditional centralized processing methodologies.

Design/methodology/approach

Population-based evolutionary algorithms such as genetic algorithm (GA), particle swarm optimization (PSO) and cat swarm optimization (CSO) are implemented in a distributed form to address the system identification problem having distributed input data. Out of different distributed approaches mentioned in the literature, the study has considered incremental and diffusion strategies.

Findings

Performances of the proposed distributed learning-based algorithms are compared for different noise conditions. The experimental results indicate that CSO performs better compared to GA and PSO at all noise strengths with respect to accuracy and error convergence rate, but incremental CSO is slightly superior to diffusion CSO.

Originality/value

This paper employs evolutionary algorithms using distributed learning strategies and applies these algorithms for the identification of unknown systems. Very few existing studies have been reported in which these distributed learning strategies are experimented for the parameter estimation task.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 12 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 1 March 1990

EFTHIMIS N. EFTHIMIADIS

This review reports on the current state and the potential of tools and systems designed to aid online searching, referred to here as online searching aids. Intermediary…

239

Abstract

This review reports on the current state and the potential of tools and systems designed to aid online searching, referred to here as online searching aids. Intermediary mechanisms are examined in terms of the two stage model, i.e. end‐user, intermediary, ‘raw database’, and different forms of user — system interaction are discussed. The evolution of the terminology of online searching aids is presented with special emphasis on the expert/non‐expert division. Terms defined include gateways, front‐end systems, intermediary systems and post‐processing. The alternative configurations that such systems can have and the approaches to the design of the user interface are discussed. The review then analyses the functions of online searching aids, i.e. logon procedures, access to hosts, help features, search formulation, query reformulation, database selection, uploading, downloading and post‐processing. Costs are then briefly examined. The review concludes by looking at future trends following recent developments in computer science and elsewhere. Distributed expert based information systems (debis), the standard generalised mark‐up language (SGML), the client‐server model, object‐orientation and parallel processing are expected to influence, if they have not done so already, the design and implementation of future online searching aids.

Details

Journal of Documentation, vol. 46 no. 3
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 December 1994

Gerti Kappel and Stefan Vieweg

Changes in market and production profiles require a more flexibleconcept in manufacturing. Computer integrated manufacturing (CIM)describes an integrative concept for joining…

1393

Abstract

Changes in market and production profiles require a more flexible concept in manufacturing. Computer integrated manufacturing (CIM) describes an integrative concept for joining business and manufacturing islands. In this context, database technology is the key technology for implementing the CIM philosophy. However, CIM applications are more complex and thus more demanding than traditional database applications such as business and administrative applications. Systematically analyses the database requirements for CIM applications including business and manufacturing tasks. Special emphasis is given on integration requirements due to the distributed, partly isolated nature of CIM applications developed over the years. An illustrative sampling of current efforts in the database community to meet the challenge of non‐standard applications such as CIM is presented.

Details

Integrated Manufacturing Systems, vol. 5 no. 4/5
Type: Research Article
ISSN: 0957-6061

Keywords

Article
Publication date: 1 December 1999

Michael J. Frasciello and John Richardson

Library consortia require automation systems that adequately address the following questions: Can the system support centralized and decentralized server configurations? Does the…

813

Abstract

Library consortia require automation systems that adequately address the following questions: Can the system support centralized and decentralized server configurations? Does the software’s architecture accommodate changing requirements? Does the system provide seamless behavior? Contends that the evolution of distributed enterprise computing technology has brought the library automation industry to a new realization that automation systems engineered with an n‐tiered client/server architecture will best meet the needs of library consortia. Standards‐based distributed processing is the key to the n‐tier client/server paradigm. While some technologies (i.e. UNIX) provide for a single standard on which to define distributed processing, only Microsoft’s Windows NT supports multiple standards. From Microsoft’s perspective, the Windows NT operating system is the middle tier of the n‐tier client/server environment. To truly exploit the middle tier, an application must utilize Microsoft Transaction Server (MTS). Native Windows NT automation systems utilizing MTS are best positioned for the future because MTS assumes an n‐tier architecture with the middle tier (or tiers) deployed on Windows NT Server. “Native” NT applications are built in and for Microsoft Windows NT. Library consortia considering a native Windows NT automation system should evaluate the system’s distributed processing capabilities to determine its applicability to their needs. Library consortia can test a vendor’s claim to scalable distributed processing by asking three questions: Is the software dependent on the type of data being used? Does the software support logical and physical separation (distribution)? Does the software require a systems‐shut down to perform database or application updates?

Details

Library Consortium Management: An International Journal, vol. 1 no. 3/4
Type: Research Article
ISSN: 1466-2760

Keywords

Article
Publication date: 1 November 1986

R.A. Hamilton

The computer systems developed during the 1960s and 1970s made very little impact on management decision. Management Information System design was constrained by three factors …

Abstract

The computer systems developed during the 1960s and 1970s made very little impact on management decision. Management Information System design was constrained by three factors — the technology was large‐scale and inevitably centralised and controlled by data processing staff; the systems were designed by specialist staff who rarely understood the business requirements; and managers themselves had little knowledge or “hands‐on” experience of computers. In the 1980s a greater awareness of the need for planning and better use of personnel information, coupled with the development of distributed processing systems, has presented personnel management with opportunities to use computing technology as a means of increasing the professionalism of practising personnel managers. Effective use will only occur if the implementation of technology is matched by appraisal of skills and organisation within personnel departments. Staff will need a minimum level of computing expertise and some managers will need skills in modelling, particularly financial modelling. The relationship between personnel and data processing needs careful redefining to build a link between the two and data processing staff need to design and communicate an end‐user strategy.

Details

Industrial Management & Data Systems, vol. 86 no. 11/12
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 6 January 2022

Ahmad Latifian

Big data has posed problems for businesses, the Information Technology (IT) sector and the science community. The problems posed by big data can be effectively addressed using…

Abstract

Purpose

Big data has posed problems for businesses, the Information Technology (IT) sector and the science community. The problems posed by big data can be effectively addressed using cloud computing and associated distributed computing technology. Cloud computing and big data are two significant past-year problems that allow high-efficiency and competitive computing tools to be delivered as IT services. The paper aims to examine the role of the cloud as a tool for managing big data in various aspects to help businesses.

Design/methodology/approach

This paper delivers solutions in the cloud for storing, compressing, analyzing and processing big data. Hence, articles were divided into four categories: articles on big data storage, articles on big data processing, articles on analyzing and finally, articles on data compression in cloud computing. This article is based on a systematic literature review. Also, it is based on a review of 19 published papers on big data.

Findings

From the results, it can be inferred that cloud computing technology has features that can be useful for big data management. Challenging issues are raised in each section. For example, in storing big data, privacy and security issues are challenging.

Research limitations/implications

There were limitations to this systematic review. The first limitation is that only English articles were reviewed. Also, articles that matched the keywords were used. Finally, in this review, authoritative articles were reviewed, and slides and tutorials were avoided.

Practical implications

The research presents new insight into the business value of cloud computing in interfirm collaborations.

Originality/value

Previous research has often examined other aspects of big data in the cloud. This article takes a new approach to the subject. It allows big data researchers to comprehend the various aspects of big data management in the cloud. In addition, setting an agenda for future research saves time and effort for readers searching for topics within big data.

Details

Kybernetes, vol. 51 no. 6
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 1 August 2016

Bao-Rong Chang, Hsiu-Fen Tsai, Yun-Che Tsai, Chin-Fu Kuo and Chi-Chung Chen

The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big…

Abstract

Purpose

The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big data environment.

Design/methodology/approach

First, the integration of Apache Hive, Cloudera Impala and BDAS Shark make the platform support SQL-like query. Next, users can access a single interface and select the best performance of big data warehouse platform automatically by the proposed optimizer. Finally, the distributed memory storage system Memcached incorporated into the distributed file system, Apache HDFS, is employed for fast caching query results. Therefore, if users query the same SQL command, the same result responds rapidly from the cache system instead of suffering the repeated searches in a big data warehouse and taking a longer time to retrieve.

Findings

As a result the proposed approach significantly improves the overall performance and dramatically reduces the search time as querying a database, especially applying for the high-repeatable SQL commands under multi-user mode.

Research limitations/implications

Currently, Shark’s latest stable version 0.9.1 does not support the latest versions of Spark and Hive. In addition, this series of software only supports Oracle JDK7. Using Oracle JDK8 or Open JDK will cause serious errors, and some software will be unable to run.

Practical implications

The problem with this system is that some blocks are missing when too many blocks are stored in one result (about 100,000 records). Another problem is that the sequential writing into In-memory cache wastes time.

Originality/value

When the remaining memory capacity is 2 GB or less on each server, Impala and Shark will have a lot of page swapping, causing extremely low performance. When the data scale is larger, it may cause the JVM I/O exception and make the program crash. However, when the remaining memory capacity is sufficient, Shark is faster than Hive and Impala. Impala’s consumption of memory resources is between those of Shark and Hive. This amount of remaining memory is sufficient for Impala’s maximum performance. In this study, each server allocates 20 GB of memory for cluster computing and sets the amount of remaining memory as Level 1: 3 percent (0.6 GB), Level 2: 15 percent (3 GB) and Level 3: 75 percent (15 GB) as the critical points. The program automatically selects Hive when memory is less than 15 percent, Impala at 15 to 75 percent and Shark at more than 75 percent.

1 – 10 of over 75000