Search results

1 – 10 of 114

Open Access

Article

Publication date: 3 August 2020

DNA short read alignment on apache spark

The evolution of technologies has unleashed a wealth of challenges by generating massive amount of data. Recently, biological data has increased exponentially, which has…

HTML

PDF (2 MB)

Downloads

1157

Abstract

The evolution of technologies has unleashed a wealth of challenges by generating massive amount of data. Recently, biological data has increased exponentially, which has introduced several computational challenges. DNA short read alignment is an important problem in bioinformatics. The exponential growth in the number of short reads has increased the need for an ideal platform to accelerate the alignment process. Apache Spark is a cluster-computing framework that involves data parallelism and fault tolerance. In this article, we proposed a Spark-based algorithm to accelerate DNA short reads alignment problem, and it is called Spark-DNAligning. Spark-DNAligning exploits Apache Spark ’s performance optimizations such as broadcast variable, join after partitioning, caching, and in-memory computations. Spark-DNAligning is evaluated in term of performance by comparing it with SparkBWA tool and a MapReduce based algorithm called CloudBurst. All the experiments are conducted on Amazon Web Services (AWS). Results demonstrate that Spark-DNAligning outperforms both tools by providing a speedup in the range of 101–702 in aligning gigabytes of short reads to the human genome. Empirical evaluation reveals that Apache Spark offers promising solutions to DNA short reads alignment problem.

Details

Applied Computing and Informatics, vol. 19 no. 1/2

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Open Access

Article

Publication date: 7 June 2018

Application of biclustering algorithm to extract rules from labeled data

Zhang Yanjie and Sun Hongbo

For many pattern recognition problems, the relation between the sample vectors and the class labels are known during the data acquisition procedure. However, how to find the…

HTML

PDF (597 KB)

Downloads

803

Abstract

Purpose

For many pattern recognition problems, the relation between the sample vectors and the class labels are known during the data acquisition procedure. However, how to find the useful rules or knowledge hidden in the data is very important and challengeable. Rule extraction methods are very useful in mining the important and heuristic knowledge hidden in the original high-dimensional data. It can help us to construct predictive models with few attributes of the data so as to provide valuable model interpretability and less training times.

Design/methodology/approach

In this paper, a novel rule extraction method with the application of biclustering algorithm is proposed.

Findings

To choose the most significant biclusters from the huge number of detected biclusters, a specially modified information entropy calculation method is also provided. It will be shown that all of the important knowledge is in practice hidden in these biclusters.

Originality/value

The novelty of the new method lies in the detected biclusters can be conveniently translated into if-then rules. It provides an intuitively explainable and comprehensive approach to extract rules from high-dimensional data while keeping high classification accuracy.

Details

International Journal of Crowd Science, vol. 2 no. 2

Type: Research Article

DOI:

ISSN: 2398-7294

Keywords

Open Access

Article

Publication date: 7 October 2021

Using transfer learning for diabetic retinopathy stage classification

Enas M.F. El Houby

Diabetic retinopathy (DR) is one of the dangerous complications of diabetes. Its grade level must be tracked to manage its progress and to start the appropriate decision for…

HTML

PDF (1.4 MB)

Downloads

2582

Abstract

Purpose

Diabetic retinopathy (DR) is one of the dangerous complications of diabetes. Its grade level must be tracked to manage its progress and to start the appropriate decision for treatment in time. Effective automated methods for the detection of DR and the classification of its severity stage are necessary to reduce the burden on ophthalmologists and diagnostic contradictions among manual readers.

Design/methodology/approach

In this research, convolutional neural network (CNN) was used based on colored retinal fundus images for the detection of DR and classification of its stages. CNN can recognize sophisticated features on the retina and provides an automatic diagnosis. The pre-trained VGG-16 CNN model was applied using a transfer learning (TL) approach to utilize the already learned parameters in the detection.

Findings

By conducting different experiments set up with different severity groupings, the achieved results are promising. The best-achieved accuracies for 2-class, 3-class, 4-class and 5-class classifications are 86.5, 80.5, 63.5 and 73.7, respectively.

Originality/value

In this research, VGG-16 was used to detect and classify DR stages using the TL approach. Different combinations of classes were used in the classification of DR severity stages to illustrate the ability of the model to differentiate between the classes and verify the effect of these changes on the performance of the model.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Open Access

Article

Publication date: 2 January 2024

Technology-mediated lesson study: a step-by-step guide

Michelle Hudson, Heather Leary, Max Longhurst, Joshua Stowers, Tracy Poulsen, Clara Smith and Rebecca L. Sansom

The authors are developing a model for rural science teacher professional development, building teacher expertise and collaboration and creating high-quality science lessons…

HTML

PDF (458 KB)

Downloads

643

Abstract

Purpose

The authors are developing a model for rural science teacher professional development, building teacher expertise and collaboration and creating high-quality science lessons: technology-mediated lesson study (TMLS).

Design/methodology/approach

TMLS provided the means for geographically distributed teachers to collaborate, develop, implement and improve lessons. TMLS uses technology to capture lesson implementation and collaborate on lesson iterations.

Findings

This paper describes the seven steps of the TMLS process with examples, showing how teachers develop their content and pedagogical knowledge while building relationships.

Originality/value

The TMLS approach provides an innovative option for teachers to collaborate across distances and form strong, lasting relationships with others.

Details

International Journal for Lesson & Learning Studies, vol. 13 no. 5

Type: Research Article

DOI:

ISSN: 2046-8253

Keywords

Open Access

Article

Publication date: 29 September 2022

Drivers of climate variability and increasing water salinity impacts on the farmer’s income risk with future outlook mitigation

Arshad Ahmad Khan, Sufyan Ullah Khan, Muhammad Abu Sufyan Ali, Aftab Khan, Yousaf Hayat and Jianchao Luo

The main aim of this study is to investigate the impact of climate change and water salinity on farmer’s income risk with future outlook mitigation. Salinity and climate change…

HTML

PDF (2.6 MB)

Downloads

647

Abstract

Purpose

The main aim of this study is to investigate the impact of climate change and water salinity on farmer’s income risk with future outlook mitigation. Salinity and climate change are a threat to agricultural productivity worldwide. However, the combined effects of climate change and salinity impacts on farmers' income are not well understood, particularly in developing countries.

Design/methodology/approach

The response-yield function and general maximum entropy methods were used to predict the impact of temperature, precipitation and salinity on crop yield. The target minimization of total absolute deviations (MOTAD)-positive mathematical programming model was used to simulate the impact of climate change and salinity on socioeconomic and environmental indicators. In the end, a multicriteria decision-making model was used, aiming at the selection of suitable climate scenarios.

Findings

The results revealed that precipitation shows a significantly decreasing trend, while temperature and groundwater salinity (EC) illustrate a significantly increasing trend. Climate change and EC negatively impact the farmer's income and water shadow prices. Maximum reduction in income and water shadow prices was observed for A2 scenario (−12.4% and 19.4%) during 2050. The environmental index was the most important, with priority of 43.4% compared to socioeconomic indicators. Subindex amount of water used was also significant in study area, with 28.1% priority. The technique for order preference by similarity to ideal solution ranking system found that B1 was the best climatic scenario for adopting climate change adaptation in the research region.

Originality/value

In this study, farmers' income threats were assessed with the aspects of different climate scenario (A1, A1B and B1) over the horizons of 2030, 2040 and 2050 and three different indicators (economic, social and environmental) in Northwestern region of Pakistan. Only in arid and semiarid regions has climate change raised temperature and reduced rainfall, which are preliminary symptoms of growing salinity.

Details

International Journal of Climate Change Strategies and Management, vol. 14 no. 5

Type: Research Article

DOI:

ISSN: 1756-8692

Keywords

Open Access

Article

Publication date: 21 May 2021

Identification of data mining research frontier based on conference papers

Yue Huang, Hu Liu and Jing Pan

Identifying the frontiers of a specific research field is one of the most basic tasks in bibliometrics and research published in leading conferences is crucial to the data mining…

HTML

PDF (323 KB)

Downloads

1108

Abstract

Purpose

Identifying the frontiers of a specific research field is one of the most basic tasks in bibliometrics and research published in leading conferences is crucial to the data mining research community, whereas few research studies have focused on it. The purpose of this study is to detect the intellectual structure of data mining based on conference papers.

Design/methodology/approach

This study takes the authoritative conference papers of the ranking 9 in the data mining field provided by Google Scholar Metrics as a sample. According to paper amount, this paper first detects the annual situation of the published documents and the distribution of the published conferences. Furthermore, from the research perspective of keywords, CiteSpace was used to dig into the conference papers to identify the frontiers of data mining, which focus on keywords term frequency, keywords betweenness centrality, keywords clustering and burst keywords.

Findings

Research showed that the research heat of data mining had experienced a linear upward trend during 2007 and 2016. The frontier identification based on the conference papers showed that there were five research hotspots in data mining, including clustering, classification, recommendation, social network analysis and community detection. The research contents embodied in the conference papers were also very rich.

Originality/value

This study detected the research frontier from leading data mining conference papers. Based on the keyword co-occurrence network, from four dimensions of keyword term frequency, betweeness centrality, clustering analysis and burst analysis, this paper identified and analyzed the research frontiers of data mining discipline from 2007 to 2016.

Details

International Journal of Crowd Science, vol. 5 no. 2

Type: Research Article

DOI:

ISSN: 2398-7294

Keywords

Open Access

Article

Publication date: 11 October 2018

Towards data-driven software engineering skills assessment

Jun Lin, Han Yu, Zhengxiang Pan, Zhiqi Shen and Lizhen Cui

Today’s software engineers often work in teams to develop complex software systems. Therefore, successful software engineering in practice require team members to possess not only…

HTML

PDF (994 KB)

Downloads

1806

Abstract

Purpose

Today’s software engineers often work in teams to develop complex software systems. Therefore, successful software engineering in practice require team members to possess not only sound programming skills such as analysis, design, coding and testing but also soft skills such as communication, collaboration and self-management. However, existing examination-based assessments are often inadequate for quantifying students’ soft skill development. The purpose of this paper is to explore alternative ways for assessing software engineering students’ skills through a data-driven approach.

Design/methodology/approach

In this paper, the exploratory data analysis approach is adopted. Leveraging the proposed online agile project management tool – Human-centred Agile Software Engineering (HASE), a study was conducted involving 21 Scrum teams consisting of over 100 undergraduate software engineering students in multi-week coursework projects in 2014.

Findings

During this study, students performed close to 170,000 software engineering activities logged by HASE. By analysing the collected activity trajectory data set, the authors demonstrate the potential for this new research direction to enable software engineering educators to have a quantifiable way of understanding their students’ skill development, and take a proactive approach in helping them improve their programming and soft skills.

Originality/value

To the best of the authors’ knowledge, there has yet to be published previous studies using software engineering activity data to assess software engineers’ skills.

Details

International Journal of Crowd Science, vol. 2 no. 2

Type: Research Article

DOI:

ISSN: 2398-7294

Keywords

Open Access

Article

Publication date: 4 September 2017

Using blockchain to build trusted LoRaWAN sharing server

Jun Lin, Zhiqi Shen, Chunyan Miao and Siyuan Liu

With the rapid growth of the Internet of Things (IoT) market and requirement, low power wide area (LPWA) technologies have become popular. In various LPWA technologies, Narrow…

HTML

PDF (767 KB)

Downloads

10761

Abstract

Purpose

With the rapid growth of the Internet of Things (IoT) market and requirement, low power wide area (LPWA) technologies have become popular. In various LPWA technologies, Narrow Band IoT (NB-IoT) and long range (LoRa) are two main leading competitive technologies. Compared with NB-IoT networks, which are mainly built and managed by mobile network operators, LoRa wide area networks (LoRaWAN) are mainly operated by private companies or organizations, which suggests two issues: trust of the private network operators and lack of network coverage. This study aims to propose a conceptual architecture design of a blockchain built-in solution for LoRaWAN network servers to solve these two issues for LoRaWAN IoT solution.

Design/methodology/approach

The study proposed modeling, model analysis and architecture design.

Findings

The proposed solution uses the blockchain technology to build an open, trusted, decentralized and tamper-proof system, which provides the indisputable mechanism to verify that the data of a transaction has existed at a specific time in the network.

Originality/value

To the best of our knowledge, this is the first work that integrates blockchain technology and LoRaWAN IoT technology.

Details

International Journal of Crowd Science, vol. 1 no. 3

Type: Research Article

DOI:

ISSN: 2398-7294

Keywords

Open Access

Article

Publication date: 4 May 2021

Robust ensemble of handcrafted and learned approaches for DNA-binding proteins

Loris Nanni and Sheryl Brahnam

Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or…

HTML

PDF (377 KB)

Downloads

1352

Abstract

Purpose

Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP classification, one that performs competitively across several DNA-BP classification tasks.

Design/methodology/approach

Efficient DNA-BP classifier systems require the discovery of powerful protein representations and feature extraction methods. Experiments were performed that combined and compared descriptors extracted from state-of-the-art matrix/image protein representations. These descriptors were trained on separate support vector machines (SVMs) and evaluated. Convolutional neural networks with different parameter settings were fine-tuned on two matrix representations of proteins. Decisions were fused with the SVMs using the weighted sum rule and evaluated to experimentally derive the most powerful general-purpose DNA-BP classifier system.

Findings

The best ensemble proposed here produced comparable, if not superior, classification results on a broad and fair comparison with the literature across four different datasets representing a variety of DNA-BP classification tasks, thereby demonstrating both the power and generalizability of the proposed system.

Originality/value

Most DNA-BP methods proposed in the literature are only validated on one (rarely two) datasets/tasks. In this work, the authors report the performance of our general-purpose DNA-BP system on four datasets representing different DNA-BP classification tasks. The excellent results of the proposed best classifier system demonstrate the power of the proposed approach. These results can now be used for baseline comparisons by other researchers in the field.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Open Access

Book part

Publication date: 4 May 2018

Eco-informatics: The Encouragement of Ecological Data Management

Muhammad Arhami, Anita Desiani, Munawar and Raisah Hayati

Purpose – The purpose of this research is to study the ecological developments that are growing rapidly and are complemented by technological developments that make ecology a…

HTML

PDF (614 KB)

EPUB (215 KB)

Abstract

Purpose – The purpose of this research is to study the ecological developments that are growing rapidly and are complemented by technological developments that make ecology a discipline which is able to collaborate, integrate, and use data for the development of science.

Design/Methodology/Approach – The method involves integration, analysis, and conclusion, drawing knowledge dissemination from heterogeneous ecological data that make the ecological research so complex requiring an approach to simplify the problem.

Findings – The data involved in ecology are very complex and diverse and spread from various sources, which are not mutually integrated so that a structured arrangement is required through the arrangement of computer-based data management.

Research Limitations/Implications – Eco-informatics is one of the options to manage the data, settings, and transform it into information and knowledge.

Details

Proceedings of MICoMS 2017

Type: Book

DOI:

ISBN:

Keywords

Access

Year

Content type

1 – 10 of 114

Abstract

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Details

Keywords

Access

Year

Content type

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information