Search results

1 – 10 of 36
Article
Publication date: 1 August 2016

Bao-Rong Chang, Hsiu-Fen Tsai, Yun-Che Tsai, Chin-Fu Kuo and Chi-Chung Chen

The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big…

Abstract

Purpose

The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big data environment.

Design/methodology/approach

First, the integration of Apache Hive, Cloudera Impala and BDAS Shark make the platform support SQL-like query. Next, users can access a single interface and select the best performance of big data warehouse platform automatically by the proposed optimizer. Finally, the distributed memory storage system Memcached incorporated into the distributed file system, Apache HDFS, is employed for fast caching query results. Therefore, if users query the same SQL command, the same result responds rapidly from the cache system instead of suffering the repeated searches in a big data warehouse and taking a longer time to retrieve.

Findings

As a result the proposed approach significantly improves the overall performance and dramatically reduces the search time as querying a database, especially applying for the high-repeatable SQL commands under multi-user mode.

Research limitations/implications

Currently, Shark’s latest stable version 0.9.1 does not support the latest versions of Spark and Hive. In addition, this series of software only supports Oracle JDK7. Using Oracle JDK8 or Open JDK will cause serious errors, and some software will be unable to run.

Practical implications

The problem with this system is that some blocks are missing when too many blocks are stored in one result (about 100,000 records). Another problem is that the sequential writing into In-memory cache wastes time.

Originality/value

When the remaining memory capacity is 2 GB or less on each server, Impala and Shark will have a lot of page swapping, causing extremely low performance. When the data scale is larger, it may cause the JVM I/O exception and make the program crash. However, when the remaining memory capacity is sufficient, Shark is faster than Hive and Impala. Impala’s consumption of memory resources is between those of Shark and Hive. This amount of remaining memory is sufficient for Impala’s maximum performance. In this study, each server allocates 20 GB of memory for cluster computing and sets the amount of remaining memory as Level 1: 3 percent (0.6 GB), Level 2: 15 percent (3 GB) and Level 3: 75 percent (15 GB) as the critical points. The program automatically selects Hive when memory is less than 15 percent, Impala at 15 to 75 percent and Shark at more than 75 percent.

Article
Publication date: 21 November 2008

Rabab Hayek, Guillaume Raschia, Patrick Valduriez and Noureddine Mouaddib

The goal of this paper is to contribute to the development of both data localization and description techniques in P2P systems.

Abstract

Purpose

The goal of this paper is to contribute to the development of both data localization and description techniques in P2P systems.

Design/methodology/approach

The approach consists of introducing a novel indexing technique that relies on linguistic data summarization into the context of P2P systems.

Findings

The cost model of the approach, as well as the simulation results have shown that the approach allows the efficient maintenance of data summaries, without incurring high traffic overhead. In addition, the cost of query routing is significantly reduced in the context of summaries.

Research limitations/implications

The paper has considered a summary service defined on the APPA's architecture. Future works have to study the extension of this work in order to be generally applicable to any P2P data management system.

Practical implications

This paper has mainly studied the quantitative gain that could be obtained in query processing from exploiting data summaries. Future works aim to implement this technique on real data (not synthetic) in order to study the qualitative gain that can be obtained from approximately answering a query.

Originality/value

The novelty of the approach shown in the paper relies on the double exploitation of the summaries in P2P systems: data summaries allow for a semantic‐based query routing, and also for an approximate query answering, using their intentional descriptions.

Details

International Journal of Pervasive Computing and Communications, vol. 4 no. 4
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 4 June 2018

Kento Goto, Misato Kotani and Motomichi Toyama

Currently, the results of database acquisition are variously expressed, but it seems that users’ understanding degree will be improved by expressing some search results such as…

253

Abstract

Purpose

Currently, the results of database acquisition are variously expressed, but it seems that users’ understanding degree will be improved by expressing some search results such as images of products of shopping sites in three dimensions rather than two dimensions. Therefore, this paper aims to propose a system for automatically generating 3D virtual museum that arranges 3D objects with various layouts from the acquisition result of relation database by SuperSQL query.

Design/methodology/approach

The study extended the SuperSQL to generate 3D virtual reality museum using declarative queries on relational data stored in a database.

Findings

This system made it possible to generate various three-dimensional virtual spaces with different layouts through simple queries.

Originality/value

It can be said that this system is useful in that a complicated three-dimensional virtual space can be generated by describing a simple query and a different three-dimensional virtual space can be generated by slightly changing the query or database content. When creating a virtual museum, if there are too many exhibitions or when changing the layout, the burden on the user will be high. But in this system, it is possible to automatically generate various virtual museums easily and reduce the burden on users.

Details

International Journal of Pervasive Computing and Communications, vol. 14 no. 2
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 12 June 2017

Aymen Gammoudi, Allel Hadjali and Boutheina Ben Yaghlane

Time modeling is a crucial feature in many application domains. However, temporal information often is not crisp, but is subjective and fuzzy. The purpose of this paper is to…

Abstract

Purpose

Time modeling is a crucial feature in many application domains. However, temporal information often is not crisp, but is subjective and fuzzy. The purpose of this paper is to address the issue related to the modeling and handling of imperfection inherent to both temporal relations and intervals.

Design/methodology/approach

On the one hand, fuzzy extensions of Allen temporal relations are investigated and, on the other hand, extended temporal relations to define the positions of two fuzzy time intervals are introduced. Then, a database system, called Fuzzy Temporal Information Management and Exploitation (Fuzz-TIME), is developed for the purpose of processing fuzzy temporal queries.

Findings

To evaluate the proposal, the authors have implemented a Fuzz-TIME system and created a fuzzy historical database for the querying purpose. Some demonstrative scenarios from history domain are proposed and discussed.

Research limitations/implications

The authors have conducted some experiments on archaeological data to show the effectiveness of the Fuzz-TIME system. However, thorough experiments on large-scale databases are highly desirable to show the behavior of the tool with respect to the performance and time execution criteria.

Practical implications

The tool developed (Fuzz-TIME) can have many practical applications where time information has to be dealt with. In particular, in several real-world applications like history, medicine, criminal and financial domains, where time is often perceived or expressed in an imprecise/fuzzy manner.

Social implications

The social implications of this work can be expected, more particularly, in two domains: in the museum to manage, exploit and analysis the piece of information related to archives and historic data; and in the hospitals/medical organizations to deal with time information inherent to data about patients and diseases.

Originality/value

This paper presents the design and characterization of a novel and intelligent database system to process and manage the imperfection inherent to both temporal relations and intervals.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 10 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 1 May 2005

Wei Xing, Marios D. Dikaiakos, Hua Yang, Angelos Sphyris and George Eftichidis

This paper aims to describe the main challenges of identifying and accessing useful information and knowledge about natural hazards and disasters results. The paper presents a…

1204

Abstract

Purpose

This paper aims to describe the main challenges of identifying and accessing useful information and knowledge about natural hazards and disasters results. The paper presents a grid‐based digital library system designed to address the challenges.

Design/methodology/approach

The need to organize and publish metadata about European research results in the field of natural disasters has been met with the help of two innovative technologies: the Open Grid Service Architecture (OGSA) and the Resource Description Framework (RDF). OGSA provides a common platform for sharing distributed metadata securely. RDF facilitates the creation and exchange of metadata.

Findings

Using grid technology allows the RDF metadata of European research results in the field of natural disasters to be shared securely and effectively in a heterogeneous network environment.

Originality/value

A metadata approach is proposed for the extraction of the metadata, and their distribution to third parties in batch, and their sharing with other applications can be a quickly process. Furthermore, a method is set out to describe metadata in a common and open format, which can become a widely accepted standard; the existence of a common standard enables the metadata storage in different platforms while supporting the capability of distributed queries across different metadata databases, the integration of metadata extracted from different sources, etc. It can be used for the general‐purpose search engines.

Details

Library Management, vol. 26 no. 4/5
Type: Research Article
ISSN: 0143-5124

Keywords

Article
Publication date: 1 January 1990

Alan F. Smeaton

Database management systems (DBMS) and information retrieval (IR) systems can both be used as online information systems but they differ in the type of data and the types of…

Abstract

Database management systems (DBMS) and information retrieval (IR) systems can both be used as online information systems but they differ in the type of data and the types of retrieval they provide for users. Many previous attempts have been made to couple DBMS and IR systems together, either by integrating the two into a unified framework, or by using a DBMS as an implementation tool for information retrieval functionality. This paper reports on some of these previous attempts and describes a system, retriev, which uses a DBMS to implement an IR system for teaching and research purposes. The implementation of retriev is described in detail and the effects that the current trends in database research will have on the relationship between DBMS and IR systems, are discussed.

Details

Program, vol. 24 no. 1
Type: Research Article
ISSN: 0033-0337

Article
Publication date: 1 May 2003

M.P. Evans and S.M. Furnell

Web resource usage statistics enable server owners to monitor how their users use their Web sites. However, such statistics are only compiled for individual servers. If resource…

Abstract

Web resource usage statistics enable server owners to monitor how their users use their Web sites. However, such statistics are only compiled for individual servers. If resource usage was monitored across the whole Web, the changing interests of society would be revealed, and deep insights made into the changing nature of the Web. However, capturing the information required for such a service, and providing acceptable system performance, presents significant challenges. As such, we have developed a model, called WebRUM, which offers a scalable system‐wide solution through the extension of a resource migration mechanism that we have previously designed. The paper describes the mechanism, and shows how it can be extended to monitor Web‐wide resource usage. The information stored by the model is defined, and the performance of a prototype mechanism is presented to demonstrate the effectiveness of the design.

Details

Campus-Wide Information Systems, vol. 20 no. 2
Type: Research Article
ISSN: 1065-0741

Keywords

Article
Publication date: 11 May 2015

Alejandro Vera-Baquero, Ricardo Colomo Palacios, Vladimir Stantchev and Owen Molloy

This paper aims to present a solution that enables organizations to monitor and analyse the performance of their business processes by means of Big Data technology. Business…

3099

Abstract

Purpose

This paper aims to present a solution that enables organizations to monitor and analyse the performance of their business processes by means of Big Data technology. Business process improvement can drastically influence in the profit of corporations and helps them to remain viable. However, the use of traditional Business Intelligence systems is not sufficient to meet today ' s business needs. They normally are business domain-specific and have not been sufficiently process-aware to support the needs of process improvement-type activities, especially on large and complex supply chains, where it entails integrating, monitoring and analysing a vast amount of dispersed event logs, with no structure, and produced on a variety of heterogeneous environments. This paper tackles this variability by devising different Big-Data-based approaches that aim to gain visibility into process performance.

Design/methodology/approach

Authors present a cloud-based solution that leverages (BD) technology to provide essential insights into business process improvement. The proposed solution is aimed at measuring and improving overall business performance, especially in very large and complex cross-organisational business processes, where this type of visibility is hard to achieve across heterogeneous systems.

Findings

Three different (BD) approaches have been undertaken based on Hadoop and HBase. We introduced first, a map-reduce approach that it is suitable for batch processing and presents a very high scalability. Secondly, we have described an alternative solution by integrating the proposed system with Impala. This approach has significant improvements in respect with map reduce as it is focused on performing real-time queries over HBase. Finally, the use of secondary indexes has been also proposed with the aim of enabling immediate access to event instances for correlation in detriment of high duplication storage and synchronization issues. This approach has produced remarkable results in two real functional environments presented in the paper.

Originality/value

The value of the contribution relies on the comparison and integration of software packages towards an integrated solution that is aimed to be adopted by industry. Apart from that, in this paper, authors illustrate the deployment of the architecture in two different settings.

Details

The Learning Organization, vol. 22 no. 4
Type: Research Article
ISSN: 0969-6474

Keywords

Article
Publication date: 18 October 2021

Sujan Saha and Sukumar Mandal

These projects aim to improve library services for users in the future by combining Link Open Data (LOD) technology with data visualization. It displays and analyses search…

Abstract

Purpose

These projects aim to improve library services for users in the future by combining Link Open Data (LOD) technology with data visualization. It displays and analyses search results in an intuitive manner. These services are enhanced by integrating various LOD technologies into the authority control system.

Design/methodology/approach

The technology known as LOD is used to access, recycle, share, exchange and disseminate information, among other things. The applicability of Linked Data technologies for the development of library information services is evaluated in this study.

Findings

Apache Hadoop is used for rapidly storing and processing massive Linked Data data sets. Apache Spark is a free and open-source data processing tool. Hive is a SQL-based data warehouse that enables data scientists to write, read and manage petabytes of data.

Originality/value

The distributed large data storage system Apache HBase does not use SQL. This study’s goal is to search the geographic, authority and bibliographic databases for relevant links found on various websites. When data items are linked together, all of the data bits are linked together as well. The study observed and evaluated the tools and processes and recorded each data item’s URL. As a result, data can be combined across silos, enhanced by third-party data sources and contextualized.

Details

Library Hi Tech News, vol. 38 no. 6
Type: Research Article
ISSN: 0741-9058

Keywords

Article
Publication date: 31 December 2006

Terry D. May, Shaun H. Dunning, George A. Dowding and Jason O. Hallstrom

Wireless sensor networks (WSNs) will profoundly influence the ubiquitous computing landscape. Their utility derives not from the computational capabilities of any single sensor…

Abstract

Wireless sensor networks (WSNs) will profoundly influence the ubiquitous computing landscape. Their utility derives not from the computational capabilities of any single sensor node, but from the emergent capabilities of many communicating sensor nodes. Consequently, the details of communication within and across single hop neighborhoods is a fundamental component of most WSN applications. But these details are often complex, and popular embedded languages for WSNs provide only low‐level communication primitives. We propose that the absence of suitable communication abstractions contributes to the difficulty of developing large‐scale WSN applications. To address this issue, we present the design and implementation of a Remote Procedure Call (RPC) abstraction for nesC and TinyOS, the emerging standard for developing WSN applications. We present the key language extensions, operating system services, and automation tools that enable the proposed abstraction. We illustrate these contributions in the context of a representative case study, and analyze the overhead introduced when using our approach. We use these results to draw conclusions regarding the suitably of our work to resource‐constrained sensor nodes.

Details

International Journal of Pervasive Computing and Communications, vol. 2 no. 4
Type: Research Article
ISSN: 1742-7371

Keywords

1 – 10 of 36