Search results
1 – 10 of over 22000Bao-Rong Chang, Hsiu-Fen Tsai, Yun-Che Tsai, Chin-Fu Kuo and Chi-Chung Chen
The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big…
Abstract
Purpose
The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big data environment.
Design/methodology/approach
First, the integration of Apache Hive, Cloudera Impala and BDAS Shark make the platform support SQL-like query. Next, users can access a single interface and select the best performance of big data warehouse platform automatically by the proposed optimizer. Finally, the distributed memory storage system Memcached incorporated into the distributed file system, Apache HDFS, is employed for fast caching query results. Therefore, if users query the same SQL command, the same result responds rapidly from the cache system instead of suffering the repeated searches in a big data warehouse and taking a longer time to retrieve.
Findings
As a result the proposed approach significantly improves the overall performance and dramatically reduces the search time as querying a database, especially applying for the high-repeatable SQL commands under multi-user mode.
Research limitations/implications
Currently, Shark’s latest stable version 0.9.1 does not support the latest versions of Spark and Hive. In addition, this series of software only supports Oracle JDK7. Using Oracle JDK8 or Open JDK will cause serious errors, and some software will be unable to run.
Practical implications
The problem with this system is that some blocks are missing when too many blocks are stored in one result (about 100,000 records). Another problem is that the sequential writing into In-memory cache wastes time.
Originality/value
When the remaining memory capacity is 2 GB or less on each server, Impala and Shark will have a lot of page swapping, causing extremely low performance. When the data scale is larger, it may cause the JVM I/O exception and make the program crash. However, when the remaining memory capacity is sufficient, Shark is faster than Hive and Impala. Impala’s consumption of memory resources is between those of Shark and Hive. This amount of remaining memory is sufficient for Impala’s maximum performance. In this study, each server allocates 20 GB of memory for cluster computing and sets the amount of remaining memory as Level 1: 3 percent (0.6 GB), Level 2: 15 percent (3 GB) and Level 3: 75 percent (15 GB) as the critical points. The program automatically selects Hive when memory is less than 15 percent, Impala at 15 to 75 percent and Shark at more than 75 percent.
Details
Keywords
Introduction: With the proliferation and amalgamation of technology and the emergence of artificial intelligence and the internet of things, society is now facing a rapid…
Abstract
Introduction: With the proliferation and amalgamation of technology and the emergence of artificial intelligence and the internet of things, society is now facing a rapid explosion in big data. However, this explosion needs to be handled with care. Ethically managing big data is of great importance. If left unmanageable, it can create a bubble of data waste and not help society achieve human well-being, sustainable economic growth, and development.
Purpose: This chapter aims to understand different perspectives of big data. One philosophy of big data is defined by its volume and versatility, with an annual increase of 40% per annum. The other view represents its capability in dealing with multiple global issues fuelling innovation. This chapter will also offer insight into various ways to deal with societal problems, provide solutions to achieve economic growth, and aid vulnerable sections via sustainable development goals (SDGs).
Methodology: This chapter attempts to lay out a review of literature related to big data. It examines the implication that the big data pool potentially influences ideas and policies to achieve SDGs. Also, different techniques associated with collecting big data and an assortment of significant data sources are analysed in the context of achieving sustainable economic development and growth.
Findings: This chapter presents a list of challenges linked with big data analytics in governance and achievement of SDG. Different ways to deal with the challenges in using big data will also be addressed.
Details
Keywords
This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P…
Abstract
Purpose
This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P networks, clusters, clouds computing or other technologies.
Design/methodology/approach
In the age of Big Data, all companies want to benefit from large amounts of data. These data can help them understand their internal and external environment and anticipate associated phenomena, as the data turn into knowledge that can be used for prediction later. Thus, this knowledge becomes a great asset in companies' hands. This is precisely the objective of data mining. But with the production of a large amount of data and knowledge at a faster pace, the authors are now talking about Big Data mining. For this reason, the authors’ proposed works mainly aim at solving the problem of volume, veracity, validity and velocity when classifying Big Data using distributed and parallel processing techniques. So, the problem that the authors are raising in this work is how the authors can make machine learning algorithms work in a distributed and parallel way at the same time without losing the accuracy of classification results. To solve this problem, the authors propose a system called Dynamic Distributed and Parallel Machine Learning (DDPML) algorithms. To build it, the authors divided their work into two parts. In the first, the authors propose a distributed architecture that is controlled by Map-Reduce algorithm which in turn depends on random sampling technique. So, the distributed architecture that the authors designed is specially directed to handle big data processing that operates in a coherent and efficient manner with the sampling strategy proposed in this work. This architecture also helps the authors to actually verify the classification results obtained using the representative learning base (RLB). In the second part, the authors have extracted the representative learning base by sampling at two levels using the stratified random sampling method. This sampling method is also applied to extract the shared learning base (SLB) and the partial learning base for the first level (PLBL1) and the partial learning base for the second level (PLBL2). The experimental results show the efficiency of our solution that the authors provided without significant loss of the classification results. Thus, in practical terms, the system DDPML is generally dedicated to big data mining processing, and works effectively in distributed systems with a simple structure, such as client-server networks.
Findings
The authors got very satisfactory classification results.
Originality/value
DDPML system is specially designed to smoothly handle big data mining classification.
Details
Keywords
With the exponential growth of the amount of data, the most sophisticated systems of traditional libraries are not able to fulfill the demands of modern business and user needs…
Abstract
Purpose
With the exponential growth of the amount of data, the most sophisticated systems of traditional libraries are not able to fulfill the demands of modern business and user needs. The purpose of this paper is to present the possibility of creating a Big Data smart library as an integral and enhanced part of the educational system that will improve user service and increase motivation in the continuous learning process through content-aware recommendations.
Design/methodology/approach
This paper presents an approach to the design of a Big Data system for collecting, analyzing, processing and visualizing data from different sources to a smart library specifically suitable for application in educational institutions.
Findings
As an integrated recommender system of the educational institution, the practical application of Big Data smart library meets the user needs and assists in finding personalized content from several sources, resulting in economic benefits for the institution and user long-term satisfaction.
Social implications
The need for continuous education alters business processes in libraries with requirements to adopt new technologies, business demands, and interactions with users. To be able to engage in a new era of business in the Big Data environment, librarians need to modernize their infrastructure for data collection, data analysis, and data visualization.
Originality/value
A unique value of this paper is its perspective of the implementation of a Big Data solution for smart libraries as a part of a continuous learning process, with the aim to improve the results of library operations by integrating traditional systems with Big Data technology. The paper presents a Big Data smart library system that has the potential to create new values and data-driven decisions by incorporating multiple sources of differential data.
Details
Keywords
Mohd Naz’ri Mahrin, Anusuyah Subbarao, Suriayati Chuprat and Nur Azaliah Abu Bakar
Cloud computing promises dependable services offered through next-generation data centres based on virtualization technologies for computation, network and storage. Big Data…
Abstract
Purpose
Cloud computing promises dependable services offered through next-generation data centres based on virtualization technologies for computation, network and storage. Big Data Applications have been made viable by cloud computing technologies due to the tremendous expansion of data. Disaster management is one of the areas where big data applications are rapidly being deployed. This study looks at how big data is being used in conjunction with cloud computing to increase disaster risk reduction (DRR). This paper aims to explore and review the existing framework for big data used in disaster management and to provide an insightful view of how cloud-based big data platform toward DRR is applied.
Design/methodology/approach
A systematic mapping study is conducted to answer four research questions with papers related to Big Data Analytics, cloud computing and disaster management ranging from the year 2013 to 2019. A total of 26 papers were finalised after going through five steps of systematic mapping.
Findings
Findings are based on each research question.
Research limitations/implications
A specific study on big data platforms on the application of disaster management, in general is still limited. The lack of study in this field is opened for further research sources.
Practical implications
In terms of technology, research in DRR leverage on existing big data platform is still lacking. In terms of data, many disaster data are available, but scientists still struggle to learn and listen to the data and take more proactive disaster preparedness.
Originality/value
This study shows that a very famous platform selected by researchers is central processing unit based processing, namely, Apache Hadoop. Apache Spark which uses memory processing requires a big capacity of memory, therefore this is less preferred in the world of research.
Details
Keywords
Beatrice Amonoo Nkrumah, Wei Qian, Amanpreet Kaur and Carol Tilt
This paper aims to examine the nature and extent of disclosure on the use of big data by online platform companies and how these disclosures address and discharge stakeholder…
Abstract
Purpose
This paper aims to examine the nature and extent of disclosure on the use of big data by online platform companies and how these disclosures address and discharge stakeholder accountability.
Design/methodology/approach
Content analysis of annual reports and data policy documents of 100 online platform companies were used for this study. More specifically, the study develops a comprehensive big data disclosure framework to assess the nature and extent of disclosures provided in corporate reports. This framework also assists in evaluating the effect of the size of the company, industry and country in which they operate on disclosures.
Findings
The analysis reveals that most companies made limited disclosure on how they manage big data. Only two of the 100 online platform companies have provided moderate disclosures on big data related issues. The focus of disclosure by the online platform companies is more on data regulation compliance and privacy protection, but significantly less on the accountability and ethical issues of big data use. More specifically, critical issues, such as stakeholder engagement, breaches of customer information and data reporting and controlling mechanisms are largely overlooked in current disclosures. The analysis confirms that current attention has been predominantly given to powerful stakeholders such as regulators as a result of compliance pressure while the accountability pressure has yet to keep up the pace.
Research limitations/implications
The study findings may be limited by the use of a new accountability disclosure index and the specific focus on online platform companies.
Practical implications
Although big data permeates, the number of users and uses grow and big data use has become more ingrained into society, this study provides evidence that ethical and accountability issues persist, even among the largest online companies. The findings of this study improve the understanding of the current state of online companies’ reporting practices on big data use, particularly the issues and gaps in the reporting process, which will help policymakers and standard setters develop future data disclosure policies.
Social implications
From these findings, the study improves the understanding of the current state of online companies’ reporting practices on big data use, particularly the issues and gaps in the reporting process – which are helpful for policymakers and standard setters to develop data disclosure policies.
Originality/value
This study provides an analysis of ethical and social issues surrounding big data accountability, an emerging but increasingly important area that needs urgent attention and more research. It also adds a new disclosure dimension to the existing accountability literature and provides practical suggestions to balance the interaction between online platform companies and their stakeholders to promote the responsible use of big data.
Details
Keywords
To be more effective, artificial intelligence (AI) requires a broad overall view of the design and transformation of enterprise architecture and capabilities. Maturity models…
Abstract
Purpose
To be more effective, artificial intelligence (AI) requires a broad overall view of the design and transformation of enterprise architecture and capabilities. Maturity models (MMs) are the recognized tools to identify strengths and weaknesses of certain domains of an organization. They consist of multiple, archetypal levels of maturity of a certain domain and can be used for organizational assessment and development. In the case of AI, quite a few numbers of MMs have been proposed. Generally, the links between AI technology, AI usage and organizational performance stay unclear. To address these gaps, this paper aims to introduce the complete details of the AI maturity model (AIMM) for AI-driven platform companies. The associated AI-Driven Platform Enterprise Maturity framework proposed here can help to achieve most of the AI-driven platform companies' objectives.
Design/methodology/approach
Qualitative research is performed in two stages. In the first stage, a review of the existing literature is performed to identify the types, barriers, drivers, challenges and opportunities of MMs in AI, Advanced Analytics and Big Data domains. In the second stage, a research framework is proposed to align company value chain with AI technologies and levels of the platform enterprise maturity.
Findings
The paper proposes a new five level AI-Driven Platform Enterprise Maturity framework by constructing a formal organizational value chain taxonomy model that explains a vast group of MM phenomena related with the AI-Driven Platform Enterprises. In addition, this study proposes a clear and precise description and structuring of the information in the multidimensional Platform, AI, Advanced Analytics and Big Data domains. The AI-Driven Platform Enterprise Maturity framework assists in identification, creation, assessment and disclosure research of AI-driven platform business organizations.
Research limitations/implications
This research is focused on the basic dimensions of AI value chain. The full reference model of AI consists of much more concepts. In the last few years, AI has achieved a notable drive that, if connected appropriately, may deliver the best of expectations over many application sectors across the field. For this to occur shortly in machine learning, especially in deep neural networks, the entire community stands in front of the barrier of explainability. Paradigms underlying this problem fall within the so-called eXplainable AI (XAI) field, which is widely acknowledged as a crucial feature for the practical deployment of AI models in industry. Our prospects lead toward the concept of a methodology for the large-scale implementation of AI methods in platform organizations with fairness, model explainability and accountability at its core.
Practical implications
AI-driven platform enterprise maturity framework can be used for better communicate to clients the value of AI capabilities through the lens of changing human-machine interactions and in the context of legal, ethical and societal norms.
Social implications
The authors discuss AI in the enterprise platform stack including talent platform, human capital management and recruiting.
Originality/value
The AI value chain and AI-Driven Platform Enterprise Maturity framework are original and represent an effective tools for assessing AI-driven platform enterprises.
Details
Keywords
Elham Ali Shammar and Ammar Thabit Zahary
Internet has changed radically in the way people interact in the virtual world, in their careers or social relationships. IoT technology has added a new vision to this process by…
Abstract
Purpose
Internet has changed radically in the way people interact in the virtual world, in their careers or social relationships. IoT technology has added a new vision to this process by enabling connections between smart objects and humans, and also between smart objects themselves, which leads to anything, anytime, anywhere, and any media communications. IoT allows objects to physically see, hear, think, and perform tasks by making them talk to each other, share information and coordinate decisions. To enable the vision of IoT, it utilizes technologies such as ubiquitous computing, context awareness, RFID, WSN, embedded devices, CPS, communication technologies, and internet protocols. IoT is considered to be the future internet, which is significantly different from the Internet we use today. The purpose of this paper is to provide up-to-date literature on trends of IoT research which is driven by the need for convergence of several interdisciplinary technologies and new applications.
Design/methodology/approach
A comprehensive IoT literature review has been performed in this paper as a survey. The survey starts by providing an overview of IoT concepts, visions and evolutions. IoT architectures are also explored. Then, the most important components of IoT are discussed including a thorough discussion of IoT operating systems such as Tiny OS, Contiki OS, FreeRTOS, and RIOT. A review of IoT applications is also presented in this paper and finally, IoT challenges that can be recently encountered by researchers are introduced.
Findings
Studies of IoT literature and projects show the disproportionate importance of technology in IoT projects, which are often driven by technological interventions rather than innovation in the business model. There are a number of serious concerns about the dangers of IoT growth, particularly in the areas of privacy and security; hence, industry and government began addressing these concerns. At the end, what makes IoT exciting is that we do not yet know the exact use cases which would have the ability to significantly influence our lives.
Originality/value
This survey provides a comprehensive literature review on IoT techniques, operating systems and trends.
Details
Keywords
Abstract
Purpose
The present paper constructed a new framework for government data governance based on the concept of a data middle platform to elicit the detailed requirements and functionalities of a government data governance framework.
Design/methodology/approach
Following a three-cycle activity, the design science research (DSR) paradigm was used to develop design propositions. The design propositions are obtained based on a systematic literature review of government data governance and data governance frameworks. Cases and experts further assessed the effectiveness of the implementation of the artifacts.
Findings
The study developed an effective framework for government data governance that supported the digital service needs of the government. The results demonstrated the advantages of the framework in adapting to organizational operations and data, realized the value of data assets, improved data auditing and oversight and facilitated communication. From the collection of data to the output of government services, the framework adapted to the new characteristics of digital government.
Originality/value
Knowledge of the “data middle platforms” generated in this study provides new knowledge to the design of government data governance frameworks and helps translate design propositions into concrete capabilities. By reviewing earlier literature, the article identified the core needs and challenges of government data governance to help practitioners approach government data governance in a structured manner.
Details