Search results

1 – 10 of 179
Article
Publication date: 22 August 2008

Fabrice Coutier and Giovanni Sebastiani

This purpose of this paper is to describe a fast and easy method of both clustering samples and identifying active genes in cDNA microarray data.

Abstract

Purpose

This purpose of this paper is to describe a fast and easy method of both clustering samples and identifying active genes in cDNA microarray data.

Design/methodology/approach

The method relies on alternation of identification of the active genes using a mixture model and clustering of the samples based on Ward hierarchical clustering. The initial‐point of the procedure is obtained by means of a χ2 test. The method attempts to locally minimize the sum of the within cluster sample variances under a suitable Gaussian assumption on the distribution of data.

Findings

This paper illustrates the proposed methodology and its success by means of results from both simulated and real cDNA microarray data. The comparison of the results with those from a related known method demonstrates the superiority of the proposed approach.

Research limitations/implications

Only empirical evidence of algorithm convergence is provided. Theoretical proof of algorithm convergence is an open issue.

Practical implications

The proposed methodology can be applied to perform cDNA microarray data analysis.

Originality/value

This paper provides a contribution to the development of successful statistical methods for cDNA microarray data analysis.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 1 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 15 February 2008

Richard S. Segall, Gauri S. Guha and Sarath A. Nonis

This paper seeks to present a complete set of graphical and numerical outputs of data mining performed for microarray databases of plant data as described in earlier research by…

Abstract

Purpose

This paper seeks to present a complete set of graphical and numerical outputs of data mining performed for microarray databases of plant data as described in earlier research by the authors. A brief description of data mining is also presented, as well as a brief background of previous research.

Design/methodology/approach

The paper uses applications of data mining using SAS Enterprise Miner Version 4 for plant data from the Osmotic Stress Microarray Information Database (OSMID) that is available on the web for both normalized and log(2) transformed data.

Findings

This paper illustrates that useful information about the effects of environmental stress tolerances (ESTs) on plants can be obtained by using data mining.

Research limitations/implications

Use of SAS Enterprise Miner was very effective for performing data mining of microarray databases with its modules of cluster analysis, decision trees, and descriptive and visual statistics.

Practical implications

The data used from the OSMID database are considered to be representative of those that could be used for biotech application such as the manufacture of plant‐made‐pharmaceuticals and genetically modified foods.

Originality/value

This paper contributes to the discussion on the use of data mining for microarray databases and specifically for studying the effects of ESTs on plants.

Details

Kybernetes, vol. 37 no. 1
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 1 December 2006

Richard S. Segall and Qingyu Zhang

To present research in the area of the applications of modern heuristics and data mining techniques in knowledge discovery.

2792

Abstract

Purpose

To present research in the area of the applications of modern heuristics and data mining techniques in knowledge discovery.

Design/methodology/approach

Applications of data mining for neural networks using NeuralWare Predict® software, genetic algorithms using Biodiscovery GeneSight® (2005) software, and regression and discriminant analysis using SPSS® were selected for bioscience data sets of continuous numerical‐valued Abalone fish data and discrete nominal‐valued mushroom data.

Findings

This paper illustrates the useful information that can be obtained using data mining for evolutionary algorithms specifically as those for neural networks, genetic algorithms, regression analysis, and discriminant analysis.

Research limitations/implications

The use of NeuralWare Predict® was a very effective method of implementing training rules for neural networks to identify the important attributes of numerical and nominal valued data.

Practical implications

The software and algorithms discussed in the paper can be used to visualize and mine microarray data.

Originality/value

The paper contributes to the discussion on the data visualization and data mining of microarray database for bioinformatics and emphasizes new applicability of modern heuristics and software.

Details

Kybernetes, vol. 35 no. 10
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 28 September 2021

Nageswara Rao Eluri, Gangadhara Rao Kancharla, Suresh Dara and Venkatesulu Dondeti

Gene selection is considered as the fundamental process in the bioinformatics field. The existing methodologies pertain to cancer classification are mostly clinical basis, and its…

Abstract

Purpose

Gene selection is considered as the fundamental process in the bioinformatics field. The existing methodologies pertain to cancer classification are mostly clinical basis, and its diagnosis capability is limited. Nowadays, the significant problems of cancer diagnosis are solved by the utilization of gene expression data. The researchers have been introducing many possibilities to diagnose cancer appropriately and effectively. This paper aims to develop the cancer data classification using gene expression data.

Design/methodology/approach

The proposed classification model involves three main phases: “(1) Feature extraction, (2) Optimal Feature Selection and (3) Classification”. Initially, five benchmark gene expression datasets are collected. From the collected gene expression data, the feature extraction is performed. To diminish the length of the feature vectors, optimal feature selection is performed, for which a new meta-heuristic algorithm termed as quantum-inspired immune clone optimization algorithm (QICO) is used. Once the relevant features are selected, the classification is performed by a deep learning model called recurrent neural network (RNN). Finally, the experimental analysis reveals that the proposed QICO-based feature selection model outperforms the other heuristic-based feature selection and optimized RNN outperforms the other machine learning methods.

Findings

The proposed QICO-RNN is acquiring the best outcomes at any learning percentage. On considering the learning percentage 85, the accuracy of the proposed QICO-RNN was 3.2% excellent than RNN, 4.3% excellent than RF, 3.8% excellent than NB and 2.1% excellent than KNN for Dataset 1. For Dataset 2, at learning percentage 35, the accuracy of the proposed QICO-RNN was 13.3% exclusive than RNN, 8.9% exclusive than RF and 14.8% exclusive than NB and KNN. Hence, the developed QICO algorithm is performing well in classifying the cancer data using gene expression data accurately.

Originality/value

This paper introduces a new optimal feature selection model using QICO and QICO-based RNN for effective classification of cancer data using gene expression data. This is the first work that utilizes an optimal feature selection model using QICO and QICO-RNN for effective classification of cancer data using gene expression data.

Book part
Publication date: 1 November 2007

Irina Farquhar, Michael Kane, Alan Sorkin and Kent H. Summers

This chapter proposes an optimized innovative information technology as a means for achieving operational functionalities of real-time portable electronic health records, system…

Abstract

This chapter proposes an optimized innovative information technology as a means for achieving operational functionalities of real-time portable electronic health records, system interoperability, longitudinal health-risks research cohort and surveillance of adverse events infrastructure, and clinical, genome regions – disease and interventional prevention infrastructure. In application to the Dod-VA (Department of Defense and Veteran's Administration) health information systems, the proposed modernization can be carried out as an “add-on” expansion (estimated at $288 million in constant dollars) or as a “stand-alone” innovative information technology system (estimated at $489.7 million), and either solution will prototype an infrastructure for nation-wide health information systems interoperability, portable real-time electronic health records (EHRs), adverse events surveillance, and interventional prevention based on targeted single nucleotide polymorphisms (SNPs) discovery.

Details

The Value of Innovation: Impact on Health, Life Quality, Safety, and Regulatory Research
Type: Book
ISBN: 978-1-84950-551-2

Article
Publication date: 4 May 2010

Qingyu Zhang and Richard S. Segall

The purpose of this paper is to review and compare selected software for data mining, text mining (TM), and web mining that are not available as free open‐source software.

2903

Abstract

Purpose

The purpose of this paper is to review and compare selected software for data mining, text mining (TM), and web mining that are not available as free open‐source software.

Design/methodology/approach

Selected softwares are compared with their common and unique features. The software for data mining are SAS® Enterprise Miner™, Megaputer PolyAnalyst® 5.0, NeuralWare Predict®, and BioDiscovery GeneSight®. The software for TM are CompareSuite, SAS® Text Miner, TextAnalyst, VisualText, Megaputer PolyAnalyst® 5.0, and WordStat. The software for web mining are Megaputer PolyAnalyst®, SPSS Clementine®, ClickTracks, and QL2.

Findings

This paper discusses and compares the existing features, characteristics, and algorithms of selected software for data mining, TM, and web mining, respectively. These softwares are also applied to available data sets.

Research limitations/implications

The limitations are the inclusion of selected software and datasets rather than considering the entire realm of these. This review could be used as a framework for comparing other data, text, and web mining software.

Practical implications

This paper can be helpful for an organization or individual when choosing proper software to meet their mining needs.

Originality/value

Each of the software selected for this research has its own unique characteristics, properties, and algorithms. No other paper compares these selected softwares both visually and descriptively for all the three types of data, text, and web mining.

Details

Kybernetes, vol. 39 no. 4
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 7 January 2014

Charlie Mayor and Lyn Robinson

The purpose of this article is to evaluate the development and use of the gene ontology (GO), a scientific vocabulary widely used in molecular biology databases, with particular…

1092

Abstract

Purpose

The purpose of this article is to evaluate the development and use of the gene ontology (GO), a scientific vocabulary widely used in molecular biology databases, with particular reference to the relation between the theoretical basis of the GO, and the pragmatics of its application.

Design/methodology/approach

The study uses a combination of bibliometric analysis, content analysis and discourse analysis. These analyses focus on details of the ways in which the terms of the ontology are amended and deleted, and in which they are applied by users.

Findings

Although the GO is explicitly based on an objective realist epistemology, a considerable extent of subjectivity and social factors are evident in its development and use. It is concluded that bio-ontologies could beneficially be extended to be pluralist, while remaining objective, taking a view of concepts closer to that of more traditional controlled vocabularies.

Originality/value

This is one of very few studies which evaluate the development of a formal ontology in relation to its conceptual foundations, and the first to consider the GO in this way.

Details

Journal of Documentation, vol. 70 no. 1
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 6 March 2017

Tai-Wei Chiang and Ta-Cheng Chen

The categorization response model through gene expression patterns turns into one of the most favorable utilizations of the microarray technology. In this study, the aim is to…

Abstract

Purpose

The categorization response model through gene expression patterns turns into one of the most favorable utilizations of the microarray technology. In this study, the aim is to propose a grid computing-based meta-evolutionary mining approach as a categorization response model for gene selection and cancer classification.

Design/methodology/approach

The proposed approach is based on the grid computing infrastructure for establishing the best attributes set selected from a big microarray data. The novel discriminant analysis is based on vector distant of median method as the evaluation function of meta-evolutionary mining approach. In this study, the proposed approach lays stress on finding the best attributes set for constructing a categorization response model with highest categorization accuracy.

Findings

Examples for several benchmarking cancer microarray data sets were used to evaluate the proposed approach, whose results are also compared with other approaches in literatures. Experimental results from four benchmarking problems indicate that the proposed approach works effectively and efficiently, and the results of the proposed methods are superior to or as well as other existing methods in literatures.

Originality/value

The novel discriminant analysis is based on vector distant of median method as the evaluation function of meta-evolutionary mining approach to discover the best feature subset automatically from the microarray tumor database. In this study, the proposed approach lays stress on finding the best attributes set for constructing a categorization response model with highest categorization accuracy.

Details

Engineering Computations, vol. 34 no. 1
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 5 June 2009

Bruno Feres de Souza, Carlos Soares and André C.P.L.F. de Carvalho

The purpose of this paper is to investigate the applicability of meta‐learning to the problem of algorithm recommendation for gene expression data classification.

Abstract

Purpose

The purpose of this paper is to investigate the applicability of meta‐learning to the problem of algorithm recommendation for gene expression data classification.

Design/methodology/approach

Meta‐learning was used to provide a preference order of machine learning algorithms, based on their expected performances. Two approaches were considered for such: k‐nearest neighbors and support vector machine‐based ranking methods. They were applied to a set of 49 publicly available microarray datasets. The evaluation of the methods followed standard procedures suggested in the meta‐learning literature.

Findings

Empirical evidences show that both ranking methods produce more interesting suggestions for gene expression data classification than the baseline method. Although the rankings are more accurate, a significant difference in the performances of the top classifiers was not observed.

Practical implications

As the experiments conducted in this paper suggest, the use of meta‐learning approaches can provide an efficient data driven way to select algorithms for gene expression data classification.

Originality/value

This paper reports contributions to the areas of meta‐learning and gene expression data analysis. Regarding the former, it supports the claim that meta‐learning can be suitably applied to problems of a specific domain, expanding its current practice. To the latter, it introduces a cost effective approach to better deal with classification tasks.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 2 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 21 August 2009

Beatriz Pontes, Federico Divina, Raúl Giráldez and Jesús S. Aguilar‐Ruiz

The purpose of this paper is to present a novel control mechanism for avoiding overlapping among biclusters in expression data.

Abstract

Purpose

The purpose of this paper is to present a novel control mechanism for avoiding overlapping among biclusters in expression data.

Design/methodology/approach

Biclustering is a technique used in analysis of microarray data. One of the most popular biclustering algorithms is introduced by Cheng and Church (2000) (Ch&Ch). Even if this heuristic is successful at finding interesting biclusters, it presents several drawbacks. The main shortcoming is that it introduces random values in the expression matrix to control the overlapping. The overlapping control method presented in this paper is based on a matrix of weights, that is used to estimate the overlapping of a bicluster with already found ones. In this way, the algorithm is always working on real data and so the biclusters it discovers contain only original data.

Findings

The paper shows that the original algorithm wrongly estimates the quality of the biclusters after some iterations, due to random values that it introduces. The empirical results show that the proposed approach is effective in order to improve the heuristic. It is also important to highlight that many interesting biclusters found by using our approach would have not been obtained using the original algorithm.

Originality/value

The original algorithm proposed by Ch&Ch is one of the most successful algorithms for discovering biclusters in microarray data. However, it presents some limitations, the most relevant being the substitution phase adopted in order to avoid overlapping among biclusters. The modified version of the algorithm proposed in this paper improves the original one, as proven in the experimentation.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 2 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

1 – 10 of 179