Search results
1 – 10 of over 24000Recent growth of the Internet has greatly increased the amount of information that is accessible and the number of resources that are available to users. To exploit this growth…
Abstract
Recent growth of the Internet has greatly increased the amount of information that is accessible and the number of resources that are available to users. To exploit this growth, it must be possible for users to find the information and resources they need. Existing techniques for organizing systems have evolved from those used on centralized systems, but these techniques are inadequate for organizing information on a global scale. This article describes Prospero, a distributed file system based on the Virtual System Model. Prospero provides tools to help users organize Internet resources. These tools allow users to construct customized views of available resources, while taking advantage of the structure imposed by others. Prospero provides a framework that can tie together various indexing services producing the fabric on which resource discovery techniques can be applied.
Seokmo Gu, Aria Seo and Yei-chang Kim
The purpose of this paper is a transcoding system based on a virtual machine in a cloud computing environment. There are many studies about transmitting realistic media through a…
Abstract
Purpose
The purpose of this paper is a transcoding system based on a virtual machine in a cloud computing environment. There are many studies about transmitting realistic media through a network. As the size of realistic media data is very large, it is difficult to transmit them using current network bandwidth. Thus, a method of encoding by compressing the data using a new encoding technique is necessary. The next-generation encoding technique high-efficiency video coding (HEVC) can encode video at a high compressibility rate compared to the existing encoding techniques, MPEG-2 and H.264. Yet, encoding the information takes at least ten times longer than existing encoding techniques.
Design/methodology/approach
This paper attempts to solve the tome problem using a virtual machine in a cloud computing environment.
Findings
In addition, by calculating the transcoding time of the proposed technique, it found that the time was reduced compared to existing techniques.
Originality/value
To this end, this paper proposed transcoding appropriate for the transmission of realistic media by dynamically allocating the resources of the virtual machine.
Details
Keywords
The purpose of this paper is to look at the recent growth of the Internet, and how it has greatly increased the amount of information that is accessible and the number of…
Abstract
Purpose
The purpose of this paper is to look at the recent growth of the Internet, and how it has greatly increased the amount of information that is accessible and the number of resources that are available to users. To exploit this growth, it must be possible for users to find the information and resources they need. Existing techniques for organizing systems have evolved from those used on centralized systems, but these techniques are inadequate for organizing information on a global scale.
Design/methodology/approach
The paper describes Prospero, a distributed file system based on the Virtual System Model. Prospero provides tools to help users organize Internet resources.
Findings
These tools allow users to construct customized views of available resources, while taking advantage of the structure imposed by others.
Originality/value
Prospero provides a framework that can tie together various indexing services, producing the fabric on which resource discovery techniques can be applied.
Details
Keywords
This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P…
Abstract
Purpose
This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P networks, clusters, clouds computing or other technologies.
Design/methodology/approach
In the age of Big Data, all companies want to benefit from large amounts of data. These data can help them understand their internal and external environment and anticipate associated phenomena, as the data turn into knowledge that can be used for prediction later. Thus, this knowledge becomes a great asset in companies' hands. This is precisely the objective of data mining. But with the production of a large amount of data and knowledge at a faster pace, the authors are now talking about Big Data mining. For this reason, the authors’ proposed works mainly aim at solving the problem of volume, veracity, validity and velocity when classifying Big Data using distributed and parallel processing techniques. So, the problem that the authors are raising in this work is how the authors can make machine learning algorithms work in a distributed and parallel way at the same time without losing the accuracy of classification results. To solve this problem, the authors propose a system called Dynamic Distributed and Parallel Machine Learning (DDPML) algorithms. To build it, the authors divided their work into two parts. In the first, the authors propose a distributed architecture that is controlled by Map-Reduce algorithm which in turn depends on random sampling technique. So, the distributed architecture that the authors designed is specially directed to handle big data processing that operates in a coherent and efficient manner with the sampling strategy proposed in this work. This architecture also helps the authors to actually verify the classification results obtained using the representative learning base (RLB). In the second part, the authors have extracted the representative learning base by sampling at two levels using the stratified random sampling method. This sampling method is also applied to extract the shared learning base (SLB) and the partial learning base for the first level (PLBL1) and the partial learning base for the second level (PLBL2). The experimental results show the efficiency of our solution that the authors provided without significant loss of the classification results. Thus, in practical terms, the system DDPML is generally dedicated to big data mining processing, and works effectively in distributed systems with a simple structure, such as client-server networks.
Findings
The authors got very satisfactory classification results.
Originality/value
DDPML system is specially designed to smoothly handle big data mining classification.
Details
Keywords
Alexander Döschl, Max-Emanuel Keller and Peter Mandl
This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing…
Abstract
Purpose
This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and resilient distributed data set (RDD) (Apache Spark) paradigms and a graphics processing unit (GPU) approach with Numba for compute unified device architecture (CUDA).
Design/methodology/approach
The paper uses a simple but computationally intensive puzzle as a case study for experiments. To find all solutions using brute force search, 15! permutations had to be computed and tested against the solution rules. The experimental application comprises a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and RDD (Apache Spark) paradigms and a GPU approach with Numba for CUDA. The implementations were benchmarked on Amazon-EC2 instances for performance and scalability measurements.
Findings
The comparison of the solutions with Apache Hadoop and Apache Spark under Amazon EMR showed that the processing time measured in CPU minutes with Spark was up to 30% lower, while the performance of Spark especially benefits from an increasing number of tasks. With the CUDA implementation, more than 16 times faster execution is achievable for the same price compared to the Spark solution. Apart from the multi-threaded implementation, the processing times of all solutions scale approximately linearly. Finally, several application suggestions for the different parallelization approaches are derived from the insights of this study.
Originality/value
There are numerous studies that have examined the performance of parallelization approaches. Most of these studies deal with processing large amounts of data or mathematical problems. This work, in contrast, compares these technologies on their ability to implement computationally intensive distributed algorithms.
Details
Keywords
By formulating a vision that provides for a solid foundation for the virtual library, we can dramatically improve existing library services and create new ones with added value…
Abstract
By formulating a vision that provides for a solid foundation for the virtual library, we can dramatically improve existing library services and create new ones with added value. The new library paradigm will be built on software and hardware information technology. Related requirements include distributed computing and networking; open architectures and standards; authentication, authorization, and encryption; and billing and royalty tracking. The “virtual library tool kit” will include reduced dependence on word indexing and keyword/Boolean retrieval; development and application of natural language processing; and effective tools for navigation of networks. Carnegie Mellon University offers some helpful examples of how information technology and information retrieval may be used to build the virtual library.
Provides a new answer to the resource discovery problem, which arises because although the Internet makes it possible for users to retrieve enormous amounts of information, it…
Abstract
Provides a new answer to the resource discovery problem, which arises because although the Internet makes it possible for users to retrieve enormous amounts of information, it provides insufficient support for locating the specific information that is needed. ALIBI (Adaptive Location of Internetworked Bases of Information) is a new tool that succeeds in locating information without the use of centralized resource catalogs, navigation, or costly searching. Its powerful query‐based interface eliminates the need for the user to connect to one network site after another to find information or to wrestle with overloaded centralized catalogs and archives. This functionality was made possible by an assortment of significant new algorithms and techniques, including classification‐based query routing, fully distributed cooperative caching, and a query language that combines the practicality of Boolean logic with the expressive power of text retrieval. The resulting information system is capable of providing fully automatic resource discovery and retrieval access to a limitless variety of information bases.
Details
Keywords
Bao-Rong Chang, Hsiu-Fen Tsai, Yun-Che Tsai, Chin-Fu Kuo and Chi-Chung Chen
The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big…
Abstract
Purpose
The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big data environment.
Design/methodology/approach
First, the integration of Apache Hive, Cloudera Impala and BDAS Shark make the platform support SQL-like query. Next, users can access a single interface and select the best performance of big data warehouse platform automatically by the proposed optimizer. Finally, the distributed memory storage system Memcached incorporated into the distributed file system, Apache HDFS, is employed for fast caching query results. Therefore, if users query the same SQL command, the same result responds rapidly from the cache system instead of suffering the repeated searches in a big data warehouse and taking a longer time to retrieve.
Findings
As a result the proposed approach significantly improves the overall performance and dramatically reduces the search time as querying a database, especially applying for the high-repeatable SQL commands under multi-user mode.
Research limitations/implications
Currently, Shark’s latest stable version 0.9.1 does not support the latest versions of Spark and Hive. In addition, this series of software only supports Oracle JDK7. Using Oracle JDK8 or Open JDK will cause serious errors, and some software will be unable to run.
Practical implications
The problem with this system is that some blocks are missing when too many blocks are stored in one result (about 100,000 records). Another problem is that the sequential writing into In-memory cache wastes time.
Originality/value
When the remaining memory capacity is 2 GB or less on each server, Impala and Shark will have a lot of page swapping, causing extremely low performance. When the data scale is larger, it may cause the JVM I/O exception and make the program crash. However, when the remaining memory capacity is sufficient, Shark is faster than Hive and Impala. Impala’s consumption of memory resources is between those of Shark and Hive. This amount of remaining memory is sufficient for Impala’s maximum performance. In this study, each server allocates 20 GB of memory for cluster computing and sets the amount of remaining memory as Level 1: 3 percent (0.6 GB), Level 2: 15 percent (3 GB) and Level 3: 75 percent (15 GB) as the critical points. The program automatically selects Hive when memory is less than 15 percent, Impala at 15 to 75 percent and Shark at more than 75 percent.
Details
Keywords
Priyadarshini R., Latha Tamilselvan and Rajendran N.
The purpose of this paper is to propose a fourfold semantic similarity that results in more accuracy compared to the existing literature. The change detection in the URL and the…
Abstract
Purpose
The purpose of this paper is to propose a fourfold semantic similarity that results in more accuracy compared to the existing literature. The change detection in the URL and the recommendation of the source documents is facilitated by means of a framework in which the fourfold semantic similarity is implied. The latest trends in technology emerge with the continuous growth of resources on the collaborative web. This interactive and collaborative web pretense big challenges in recent technologies like cloud and big data.
Design/methodology/approach
The enormous growth of resources should be accessed in a more efficient manner, and this requires clustering and classification techniques. The resources on the web are described in a more meaningful manner.
Findings
It can be descripted in the form of metadata that is constituted by resource description framework (RDF). Fourfold similarity is proposed compared to three-fold similarity proposed in the existing literature. The fourfold similarity includes the semantic annotation based on the named entity recognition in the user interface, domain-based concept matching and improvised score-based classification of domain-based concept matching based on ontology, sequence-based word sensing algorithm and RDF-based updating of triples. The aggregation of all these similarity measures including the components such as semantic user interface, semantic clustering, and sequence-based classification and semantic recommendation system with RDF updating in change detection.
Research limitations/implications
The existing work suggests that linking resources semantically increases the retrieving and searching ability. Previous literature shows that keywords can be used to retrieve linked information from the article to determine the similarity between the documents using semantic analysis.
Practical implications
These traditional systems also lack in scalability and efficiency issues. The proposed study is to design a model that pulls and prioritizes knowledge-based content from the Hadoop distributed framework. This study also proposes the Hadoop-based pruning system and recommendation system.
Social implications
The pruning system gives an alert about the dynamic changes in the article (virtual document). The changes in the document are automatically updated in the RDF document. This helps in semantic matching and retrieval of the most relevant source with the virtual document.
Originality/value
The recommendation and detection of changes in the blogs are performed semantically using n-triples and automated data structures. User-focussed and choice-based crawling that is proposed in this system also assists the collaborative filtering. Consecutively collaborative filtering recommends the user focussed source documents. The entire clustering and retrieval system is deployed in multi-node Hadoop in the Amazon AWS environment and graphs are plotted and analyzed.
Details
Keywords
Zhihua Li, Zianfei Tang and Yihua Yang
The high-efficient processing of mass data is a primary issue in building and maintaining security video surveillance system. This paper aims to focus on the architecture of…
Abstract
Purpose
The high-efficient processing of mass data is a primary issue in building and maintaining security video surveillance system. This paper aims to focus on the architecture of security video surveillance system, which was based on Hadoop parallel processing technology in big data environment.
Design/methodology/approach
A hardware framework of security video surveillance network cascaded system (SVSNCS) was constructed on the basis of Internet of Things, network cascade technology and Hadoop platform. Then, the architecture model of SVSNCS was proposed using the Hadoop and big data processing platform.
Findings
Finally, we suggested the procedure of video processing according to the cascade network characteristics.
Originality/value
Our paper, which focused on the architecture of security video surveillance system in big data environment on the basis of Hadoop parallel processing technology, provided high-quality video surveillance services for security area.
Details