Search results

1 – 10 of over 24000
Article
Publication date: 1 January 1992

B. Clifford Neuman

Recent growth of the Internet has greatly increased the amount of information that is accessible and the number of resources that are available to users. To exploit this growth…

Abstract

Recent growth of the Internet has greatly increased the amount of information that is accessible and the number of resources that are available to users. To exploit this growth, it must be possible for users to find the information and resources they need. Existing techniques for organizing systems have evolved from those used on centralized systems, but these techniques are inadequate for organizing information on a global scale. This article describes Prospero, a distributed file system based on the Virtual System Model. Prospero provides tools to help users organize Internet resources. These tools allow users to construct customized views of available resources, while taking advantage of the structure imposed by others. Prospero provides a framework that can tie together various indexing services producing the fabric on which resource discovery techniques can be applied.

Details

Internet Research, vol. 2 no. 1
Type: Research Article
ISSN: 1066-2243

Article
Publication date: 11 May 2015

Seokmo Gu, Aria Seo and Yei-chang Kim

The purpose of this paper is a transcoding system based on a virtual machine in a cloud computing environment. There are many studies about transmitting realistic media through a…

Abstract

Purpose

The purpose of this paper is a transcoding system based on a virtual machine in a cloud computing environment. There are many studies about transmitting realistic media through a network. As the size of realistic media data is very large, it is difficult to transmit them using current network bandwidth. Thus, a method of encoding by compressing the data using a new encoding technique is necessary. The next-generation encoding technique high-efficiency video coding (HEVC) can encode video at a high compressibility rate compared to the existing encoding techniques, MPEG-2 and H.264. Yet, encoding the information takes at least ten times longer than existing encoding techniques.

Design/methodology/approach

This paper attempts to solve the tome problem using a virtual machine in a cloud computing environment.

Findings

In addition, by calculating the transcoding time of the proposed technique, it found that the time was reduced compared to existing techniques.

Originality/value

To this end, this paper proposed transcoding appropriate for the transmission of realistic media by dynamically allocating the resources of the virtual machine.

Details

Journal of Systems and Information Technology, vol. 17 no. 2
Type: Research Article
ISSN: 1328-7265

Keywords

Article
Publication date: 17 August 2010

B. Clifford Neuman

The purpose of this paper is to look at the recent growth of the Internet, and how it has greatly increased the amount of information that is accessible and the number of…

1194

Abstract

Purpose

The purpose of this paper is to look at the recent growth of the Internet, and how it has greatly increased the amount of information that is accessible and the number of resources that are available to users. To exploit this growth, it must be possible for users to find the information and resources they need. Existing techniques for organizing systems have evolved from those used on centralized systems, but these techniques are inadequate for organizing information on a global scale.

Design/methodology/approach

The paper describes Prospero, a distributed file system based on the Virtual System Model. Prospero provides tools to help users organize Internet resources.

Findings

These tools allow users to construct customized views of available resources, while taking advantage of the structure imposed by others.

Originality/value

Prospero provides a framework that can tie together various indexing services, producing the fabric on which resource discovery techniques can be applied.

Details

Internet Research, vol. 20 no. 4
Type: Research Article
ISSN: 1066-2243

Keywords

Article
Publication date: 21 December 2021

Laouni Djafri

This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P…

384

Abstract

Purpose

This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P networks, clusters, clouds computing or other technologies.

Design/methodology/approach

In the age of Big Data, all companies want to benefit from large amounts of data. These data can help them understand their internal and external environment and anticipate associated phenomena, as the data turn into knowledge that can be used for prediction later. Thus, this knowledge becomes a great asset in companies' hands. This is precisely the objective of data mining. But with the production of a large amount of data and knowledge at a faster pace, the authors are now talking about Big Data mining. For this reason, the authors’ proposed works mainly aim at solving the problem of volume, veracity, validity and velocity when classifying Big Data using distributed and parallel processing techniques. So, the problem that the authors are raising in this work is how the authors can make machine learning algorithms work in a distributed and parallel way at the same time without losing the accuracy of classification results. To solve this problem, the authors propose a system called Dynamic Distributed and Parallel Machine Learning (DDPML) algorithms. To build it, the authors divided their work into two parts. In the first, the authors propose a distributed architecture that is controlled by Map-Reduce algorithm which in turn depends on random sampling technique. So, the distributed architecture that the authors designed is specially directed to handle big data processing that operates in a coherent and efficient manner with the sampling strategy proposed in this work. This architecture also helps the authors to actually verify the classification results obtained using the representative learning base (RLB). In the second part, the authors have extracted the representative learning base by sampling at two levels using the stratified random sampling method. This sampling method is also applied to extract the shared learning base (SLB) and the partial learning base for the first level (PLBL1) and the partial learning base for the second level (PLBL2). The experimental results show the efficiency of our solution that the authors provided without significant loss of the classification results. Thus, in practical terms, the system DDPML is generally dedicated to big data mining processing, and works effectively in distributed systems with a simple structure, such as client-server networks.

Findings

The authors got very satisfactory classification results.

Originality/value

DDPML system is specially designed to smoothly handle big data mining classification.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 6 August 2021

Alexander Döschl, Max-Emanuel Keller and Peter Mandl

This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing…

Abstract

Purpose

This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and resilient distributed data set (RDD) (Apache Spark) paradigms and a graphics processing unit (GPU) approach with Numba for compute unified device architecture (CUDA).

Design/methodology/approach

The paper uses a simple but computationally intensive puzzle as a case study for experiments. To find all solutions using brute force search, 15! permutations had to be computed and tested against the solution rules. The experimental application comprises a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and RDD (Apache Spark) paradigms and a GPU approach with Numba for CUDA. The implementations were benchmarked on Amazon-EC2 instances for performance and scalability measurements.

Findings

The comparison of the solutions with Apache Hadoop and Apache Spark under Amazon EMR showed that the processing time measured in CPU minutes with Spark was up to 30% lower, while the performance of Spark especially benefits from an increasing number of tasks. With the CUDA implementation, more than 16 times faster execution is achievable for the same price compared to the Spark solution. Apart from the multi-threaded implementation, the processing times of all solutions scale approximately linearly. Finally, several application suggestions for the different parallelization approaches are derived from the insights of this study.

Originality/value

There are numerous studies that have examined the performance of parallelization approaches. Most of these studies deal with processing large amounts of data or mathematical problems. This work, in contrast, compares these technologies on their ability to implement computationally intensive distributed algorithms.

Details

International Journal of Web Information Systems, vol. 17 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 March 1995

Charles B. Lowry

By formulating a vision that provides for a solid foundation for the virtual library, we can dramatically improve existing library services and create new ones with added value…

Abstract

By formulating a vision that provides for a solid foundation for the virtual library, we can dramatically improve existing library services and create new ones with added value. The new library paradigm will be built on software and hardware information technology. Related requirements include distributed computing and networking; open architectures and standards; authentication, authorization, and encryption; and billing and royalty tracking. The “virtual library tool kit” will include reduced dependence on word indexing and keyword/Boolean retrieval; development and application of natural language processing; and effective tools for navigation of networks. Carnegie Mellon University offers some helpful examples of how information technology and information retrieval may be used to build the virtual library.

Details

Library Hi Tech, vol. 13 no. 3
Type: Research Article
ISSN: 0737-8831

Article
Publication date: 1 December 1995

David Flater and Yelena Yesha

Provides a new answer to the resource discovery problem, which arises because although the Internet makes it possible for users to retrieve enormous amounts of information, it…

Abstract

Provides a new answer to the resource discovery problem, which arises because although the Internet makes it possible for users to retrieve enormous amounts of information, it provides insufficient support for locating the specific information that is needed. ALIBI (Adaptive Location of Internetworked Bases of Information) is a new tool that succeeds in locating information without the use of centralized resource catalogs, navigation, or costly searching. Its powerful query‐based interface eliminates the need for the user to connect to one network site after another to find information or to wrestle with overloaded centralized catalogs and archives. This functionality was made possible by an assortment of significant new algorithms and techniques, including classification‐based query routing, fully distributed cooperative caching, and a query language that combines the practicality of Boolean logic with the expressive power of text retrieval. The resulting information system is capable of providing fully automatic resource discovery and retrieval access to a limitless variety of information bases.

Details

Internet Research, vol. 5 no. 4
Type: Research Article
ISSN: 1066-2243

Keywords

Article
Publication date: 1 August 2016

Bao-Rong Chang, Hsiu-Fen Tsai, Yun-Che Tsai, Chin-Fu Kuo and Chi-Chung Chen

The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big…

Abstract

Purpose

The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big data environment.

Design/methodology/approach

First, the integration of Apache Hive, Cloudera Impala and BDAS Shark make the platform support SQL-like query. Next, users can access a single interface and select the best performance of big data warehouse platform automatically by the proposed optimizer. Finally, the distributed memory storage system Memcached incorporated into the distributed file system, Apache HDFS, is employed for fast caching query results. Therefore, if users query the same SQL command, the same result responds rapidly from the cache system instead of suffering the repeated searches in a big data warehouse and taking a longer time to retrieve.

Findings

As a result the proposed approach significantly improves the overall performance and dramatically reduces the search time as querying a database, especially applying for the high-repeatable SQL commands under multi-user mode.

Research limitations/implications

Currently, Shark’s latest stable version 0.9.1 does not support the latest versions of Spark and Hive. In addition, this series of software only supports Oracle JDK7. Using Oracle JDK8 or Open JDK will cause serious errors, and some software will be unable to run.

Practical implications

The problem with this system is that some blocks are missing when too many blocks are stored in one result (about 100,000 records). Another problem is that the sequential writing into In-memory cache wastes time.

Originality/value

When the remaining memory capacity is 2 GB or less on each server, Impala and Shark will have a lot of page swapping, causing extremely low performance. When the data scale is larger, it may cause the JVM I/O exception and make the program crash. However, when the remaining memory capacity is sufficient, Shark is faster than Hive and Impala. Impala’s consumption of memory resources is between those of Shark and Hive. This amount of remaining memory is sufficient for Impala’s maximum performance. In this study, each server allocates 20 GB of memory for cluster computing and sets the amount of remaining memory as Level 1: 3 percent (0.6 GB), Level 2: 15 percent (3 GB) and Level 3: 75 percent (15 GB) as the critical points. The program automatically selects Hive when memory is less than 15 percent, Impala at 15 to 75 percent and Shark at more than 75 percent.

Article
Publication date: 21 October 2019

Priyadarshini R., Latha Tamilselvan and Rajendran N.

The purpose of this paper is to propose a fourfold semantic similarity that results in more accuracy compared to the existing literature. The change detection in the URL and the…

Abstract

Purpose

The purpose of this paper is to propose a fourfold semantic similarity that results in more accuracy compared to the existing literature. The change detection in the URL and the recommendation of the source documents is facilitated by means of a framework in which the fourfold semantic similarity is implied. The latest trends in technology emerge with the continuous growth of resources on the collaborative web. This interactive and collaborative web pretense big challenges in recent technologies like cloud and big data.

Design/methodology/approach

The enormous growth of resources should be accessed in a more efficient manner, and this requires clustering and classification techniques. The resources on the web are described in a more meaningful manner.

Findings

It can be descripted in the form of metadata that is constituted by resource description framework (RDF). Fourfold similarity is proposed compared to three-fold similarity proposed in the existing literature. The fourfold similarity includes the semantic annotation based on the named entity recognition in the user interface, domain-based concept matching and improvised score-based classification of domain-based concept matching based on ontology, sequence-based word sensing algorithm and RDF-based updating of triples. The aggregation of all these similarity measures including the components such as semantic user interface, semantic clustering, and sequence-based classification and semantic recommendation system with RDF updating in change detection.

Research limitations/implications

The existing work suggests that linking resources semantically increases the retrieving and searching ability. Previous literature shows that keywords can be used to retrieve linked information from the article to determine the similarity between the documents using semantic analysis.

Practical implications

These traditional systems also lack in scalability and efficiency issues. The proposed study is to design a model that pulls and prioritizes knowledge-based content from the Hadoop distributed framework. This study also proposes the Hadoop-based pruning system and recommendation system.

Social implications

The pruning system gives an alert about the dynamic changes in the article (virtual document). The changes in the document are automatically updated in the RDF document. This helps in semantic matching and retrieval of the most relevant source with the virtual document.

Originality/value

The recommendation and detection of changes in the blogs are performed semantically using n-triples and automated data structures. User-focussed and choice-based crawling that is proposed in this system also assists the collaborative filtering. Consecutively collaborative filtering recommends the user focussed source documents. The entire clustering and retrieval system is deployed in multi-node Hadoop in the Amazon AWS environment and graphs are plotted and analyzed.

Details

International Journal of Intelligent Unmanned Systems, vol. 7 no. 4
Type: Research Article
ISSN: 2049-6427

Keywords

Article
Publication date: 8 February 2016

Zhihua Li, Zianfei Tang and Yihua Yang

The high-efficient processing of mass data is a primary issue in building and maintaining security video surveillance system. This paper aims to focus on the architecture of…

Abstract

Purpose

The high-efficient processing of mass data is a primary issue in building and maintaining security video surveillance system. This paper aims to focus on the architecture of security video surveillance system, which was based on Hadoop parallel processing technology in big data environment.

Design/methodology/approach

A hardware framework of security video surveillance network cascaded system (SVSNCS) was constructed on the basis of Internet of Things, network cascade technology and Hadoop platform. Then, the architecture model of SVSNCS was proposed using the Hadoop and big data processing platform.

Findings

Finally, we suggested the procedure of video processing according to the cascade network characteristics.

Originality/value

Our paper, which focused on the architecture of security video surveillance system in big data environment on the basis of Hadoop parallel processing technology, provided high-quality video surveillance services for security area.

Details

World Journal of Engineering, vol. 13 no. 1
Type: Research Article
ISSN: 1708-5284

Keywords

1 – 10 of over 24000