Search results

1 – 10 of 185
Open Access
Article
Publication date: 28 July 2020

Harleen Kaur and Vinita Kumari

Diabetes is a major metabolic disorder which can affect entire body system adversely. Undiagnosed diabetes can increase the risk of cardiac stroke, diabetic nephropathy and other…

11411

Abstract

Diabetes is a major metabolic disorder which can affect entire body system adversely. Undiagnosed diabetes can increase the risk of cardiac stroke, diabetic nephropathy and other disorders. All over the world millions of people are affected by this disease. Early detection of diabetes is very important to maintain a healthy life. This disease is a reason of global concern as the cases of diabetes are rising rapidly. Machine learning (ML) is a computational method for automatic learning from experience and improves the performance to make more accurate predictions. In the current research we have utilized machine learning technique in Pima Indian diabetes dataset to develop trends and detect patterns with risk factors using R data manipulation tool. To classify the patients into diabetic and non-diabetic we have developed and analyzed five different predictive models using R data manipulation tool. For this purpose we used supervised machine learning algorithms namely linear kernel support vector machine (SVM-linear), radial basis function (RBF) kernel support vector machine, k-nearest neighbour (k-NN), artificial neural network (ANN) and multifactor dimensionality reduction (MDR).

Open Access
Article
Publication date: 28 July 2020

Noura AlNuaimi, Mohammad Mehedy Masud, Mohamed Adel Serhani and Nazar Zaki

Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real time…

3576

Abstract

Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real time. However, storing and processing large and varied datasets (known as big data) is challenging to do in real time. In machine learning, streaming feature selection has always been considered a superior technique for selecting the relevant subset features from highly dimensional data and thus reducing learning complexity. In the relevant literature, streaming feature selection refers to the features that arrive consecutively over time; despite a lack of exact figure on the number of features, numbers of instances are well-established. Many scholars in the field have proposed streaming-feature-selection algorithms in attempts to find the proper solution to this problem. This paper presents an exhaustive and methodological introduction of these techniques. This study provides a review of the traditional feature-selection algorithms and then scrutinizes the current algorithms that use streaming feature selection to determine their strengths and weaknesses. The survey also sheds light on the ongoing challenges in big-data research.

Details

Applied Computing and Informatics, vol. 18 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 19 December 2023

Qinxu Ding, Ding Ding, Yue Wang, Chong Guan and Bosheng Ding

The rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a comprehensive…

1483

Abstract

Purpose

The rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a comprehensive examination of the research landscape in LLMs, providing an overview of the prevailing themes and topics within this dynamic domain.

Design/methodology/approach

Drawing from an extensive corpus of 198 records published between 1996 to 2023 from the relevant academic database encompassing journal articles, books, book chapters, conference papers and selected working papers, this study delves deep into the multifaceted world of LLM research. In this study, the authors employed the BERTopic algorithm, a recent advancement in topic modeling, to conduct a comprehensive analysis of the data after it had been meticulously cleaned and preprocessed. BERTopic leverages the power of transformer-based language models like bidirectional encoder representations from transformers (BERT) to generate more meaningful and coherent topics. This approach facilitates the identification of hidden patterns within the data, enabling authors to uncover valuable insights that might otherwise have remained obscure. The analysis revealed four distinct clusters of topics in LLM research: “language and NLP”, “education and teaching”, “clinical and medical applications” and “speech and recognition techniques”. Each cluster embodies a unique aspect of LLM application and showcases the breadth of possibilities that LLM technology has to offer. In addition to presenting the research findings, this paper identifies key challenges and opportunities in the realm of LLMs. It underscores the necessity for further investigation in specific areas, including the paramount importance of addressing potential biases, transparency and explainability, data privacy and security, and responsible deployment of LLM technology.

Findings

The analysis revealed four distinct clusters of topics in LLM research: “language and NLP”, “education and teaching”, “clinical and medical applications” and “speech and recognition techniques”. Each cluster embodies a unique aspect of LLM application and showcases the breadth of possibilities that LLM technology has to offer. In addition to presenting the research findings, this paper identifies key challenges and opportunities in the realm of LLMs. It underscores the necessity for further investigation in specific areas, including the paramount importance of addressing potential biases, transparency and explainability, data privacy and security, and responsible deployment of LLM technology.

Practical implications

This classification offers practical guidance for researchers, developers, educators, and policymakers to focus efforts and resources. The study underscores the importance of addressing challenges in LLMs, including potential biases, transparency, data privacy, and responsible deployment. Policymakers can utilize this information to shape regulations, while developers can tailor technology development based on the diverse applications identified. The findings also emphasize the need for interdisciplinary collaboration and highlight ethical considerations, providing a roadmap for navigating the complex landscape of LLM research and applications.

Originality/value

This study stands out as the first to examine the evolution of LLMs across such a long time frame and across such diversified disciplines. It provides a unique perspective on the key areas of LLM research, highlighting the breadth and depth of LLM’s evolution.

Details

Journal of Electronic Business & Digital Economics, vol. 3 no. 1
Type: Research Article
ISSN: 2754-4214

Keywords

Open Access
Article
Publication date: 8 March 2021

Mamdouh Abdel Alim Saad Mowafy and Walaa Mohamed Elaraby Mohamed Shallan

Heart diseases have become one of the most causes of death among Egyptians. With 500 deaths per 100,000 occurring annually in Egypt, it has been noticed that medical data faces a…

1100

Abstract

Purpose

Heart diseases have become one of the most causes of death among Egyptians. With 500 deaths per 100,000 occurring annually in Egypt, it has been noticed that medical data faces a high-dimensional problem that leads to a decrease in the classification accuracy of heart data. So the purpose of this study is to improve the classification accuracy of heart disease data for helping doctors efficiently diagnose heart disease by using a hybrid classification technique.

Design/methodology/approach

This paper used a new approach based on the integration between dimensionality reduction techniques as multiple correspondence analysis (MCA) and principal component analysis (PCA) with fuzzy c means (FCM) then with both of multilayer perceptron (MLP) and radial basis function networks (RBFN) which separate patients into different categories based on their diagnosis results in this paper, a comparative study of the performance performed including six structures such as MLP, RBFN, MLP via FCM–MCA, MLP via FCM–PCA, RBFN via FCM–MCA and RBFN via FCM–PCA to reach to the best classifier.

Findings

The results show that the MLP via FCM–MCA classifier structure has the highest ratio of classification accuracy and has the best performance superior to other methods; and that Smoking was the most factor causing heart disease.

Originality/value

This paper shows the importance of integrating statistical methods in increasing the classification accuracy of heart disease data.

Details

Review of Economics and Political Science, vol. 6 no. 3
Type: Research Article
ISSN: 2356-9980

Keywords

Open Access
Article
Publication date: 15 December 2023

Nicola Castellano, Roberto Del Gobbo and Lorenzo Leto

The concept of productivity is central to performance management and decision-making, although it is complex and multifaceted. This paper aims to describe a methodology based on…

Abstract

Purpose

The concept of productivity is central to performance management and decision-making, although it is complex and multifaceted. This paper aims to describe a methodology based on the use of Big Data in a cluster analysis combined with a data envelopment analysis (DEA) that provides accurate and reliable productivity measures in a large network of retailers.

Design/methodology/approach

The methodology is described using a case study of a leading kitchen furniture producer. More specifically, Big Data is used in a two-step analysis prior to the DEA to automatically cluster a large number of retailers into groups that are homogeneous in terms of structural and environmental factors and assess a within-the-group level of productivity of the retailers.

Findings

The proposed methodology helps reduce the heterogeneity among the units analysed, which is a major concern in DEA applications. The data-driven factorial and clustering technique allows for maximum within-group homogeneity and between-group heterogeneity by reducing subjective bias and dimensionality, which is embedded with the use of Big Data.

Practical implications

The use of Big Data in clustering applied to productivity analysis can provide managers with data-driven information about the structural and socio-economic characteristics of retailers' catchment areas, which is important in establishing potential productivity performance and optimizing resource allocation. The improved productivity indexes enable the setting of targets that are coherent with retailers' potential, which increases motivation and commitment.

Originality/value

This article proposes an innovative technique to enhance the accuracy of productivity measures through the use of Big Data clustering and DEA. To the best of the authors’ knowledge, no attempts have been made to benefit from the use of Big Data in the literature on retail store productivity.

Details

International Journal of Productivity and Performance Management, vol. 73 no. 11
Type: Research Article
ISSN: 1741-0401

Keywords

Open Access
Article
Publication date: 16 December 2021

Heba M. Ezzat

Since the beginning of 2020, economies faced many changes as a result of coronavirus disease 2019 (COVID-19) pandemic. The effect of COVID-19 on the Egyptian Exchange (EGX) is…

1397

Abstract

Purpose

Since the beginning of 2020, economies faced many changes as a result of coronavirus disease 2019 (COVID-19) pandemic. The effect of COVID-19 on the Egyptian Exchange (EGX) is investigated in this research.

Design/methodology/approach

To explore the impact of COVID-19, three periods were considered: (1) 17 months before the spread of COVID-19 and the start of the lockdown, (2) 17 months after the spread of COVID-19 and the during the lockdown and (3) 34 months comprehending the whole period (before and during COVID-19). Due to the large number of variables that could be considered, dimensionality reduction method, such as the principal component analysis (PCA) is followed. This method helps in determining the most individual stocks contributing to the main EGX index (EGX 30). The PCA, also, addresses the multicollinearity between the variables under investigation. Additionally, a principal component regression (PCR) model is developed to predict the future behavior of the EGX 30.

Findings

The results demonstrate that the first three principal components (PCs) could be considered to explain 89%, 85%, and 88% of data variability at (1) before COVID-19, (2) during COVID-19 and (3) the whole period, respectively. Furthermore, sectors of food and beverage, basic resources and real estate have not been affected by the COVID-19. The resulted Principal Component Regression (PCR) model performs very well. This could be concluded by comparing the observed values of EGX 30 with the predicted ones (R-squared estimated as 0.99).

Originality/value

To the best of our knowledge, no research has been conducted to investigate the effect of the COVID-19 on the EGX following an unsupervised machine learning method.

Details

Journal of Humanities and Applied Social Sciences, vol. 5 no. 5
Type: Research Article
ISSN: 2632-279X

Keywords

Open Access
Article
Publication date: 1 June 2005

P. J. Hassall and S. Ganesh

This paper provides a further investigation into the application of Correspondence Analysis (CA) as outlined by Greenacre (1984, 1993), which is one technique for “quantifying…

Abstract

This paper provides a further investigation into the application of Correspondence Analysis (CA) as outlined by Greenacre (1984, 1993), which is one technique for “quantifying qualitative data” in research on learning and teaching. It also builds on the utilisation of CA in the development of the emerging discipline of English as an International Language provided by Hassall and Ganesh (1996, 1999). This is accomplished by considering its application to the analysis of attitudinal data that positions the developing pedagogy of Teaching English as an International Language (TEIL) (see Hassall, 1996a & ff.) within the more established discipline of World Englishes (cf. Kachru, 1985, 1990). The multidimensional statistical technique Correspondence Analysis is used to provide an assessment of the interdependence of the rows and columns of a data matrix (primarily, a two-way contingency table). In this case, attitudinal data, produced at a number of international workshops which focused on the development of a justifiable pedagogy for Teaching English as an International Language (TEIL), are examined to provide a more complete picture of how these venues differed from each other with respect to the collective responses of the respondents. CA facilitates dimensionality reduction and provides graphical displays in low-dimensional spaces. In other words, it converts the rows and columns of a data matrix or contingency table into a series of points on a graph. The current study presents analyses of two different interpretations of this data.

Details

Learning and Teaching in Higher Education: Gulf Perspectives, vol. 2 no. 1
Type: Research Article
ISSN: 2077-5504

Open Access
Article
Publication date: 29 June 2022

Ibtissam Touahri

This paper purposed a multi-facet sentiment analysis system.

Abstract

Purpose

This paper purposed a multi-facet sentiment analysis system.

Design/methodology/approach

Hence, This paper uses multidomain resources to build a sentiment analysis system. The manual lexicon based features that are extracted from the resources are fed into a machine learning classifier to compare their performance afterward. The manual lexicon is replaced with a custom BOW to deal with its time consuming construction. To help the system run faster and make the model interpretable, this will be performed by employing different existing and custom approaches such as term occurrence, information gain, principal component analysis, semantic clustering, and POS tagging filters.

Findings

The proposed system featured by lexicon extraction automation and characteristics size optimization proved its efficiency when applied to multidomain and benchmark datasets by reaching 93.59% accuracy which makes it competitive to the state-of-the-art systems.

Originality/value

The construction of a custom BOW. Optimizing features based on existing and custom feature selection and clustering approaches.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 28 January 2019

Bothaina A. Al-Sheeb, A.M. Hamouda and Galal M. Abdella

The retention and success of engineering undergraduates are increasing concern for higher-education institutions. The study of success determinants are initial steps in any…

5591

Abstract

Purpose

The retention and success of engineering undergraduates are increasing concern for higher-education institutions. The study of success determinants are initial steps in any remedial initiative targeted to enhance student success and prevent any immature withdrawals. This study provides a comprehensive approach toward the prediction of student academic performance through the lens of the knowledge, attitudes and behavioral skills (KAB) model. The purpose of this paper is to aim to improve the modeling accuracy of students’ performance by introducing two methodologies based on variable selection and dimensionality reduction.

Design/methodology/approach

The performance of the proposed methodologies was evaluated using a real data set of ten critical-to-success factors on both attitude and skill-related behaviors of 320 first-year students. The study used two models. In the first model, exploratory factor analysis is used. The second model uses regression model selection. Ridge regression is used as a second step in each model. The efficiency of each model is discussed in the Results section of this paper.

Findings

The two methods were powerful in providing small mean-squared errors and hence, in improving the prediction of student performance. The results show that the quality of both methods is sensitive to the size of the reduced model and to the magnitude of the penalization parameter.

Research limitations/implications

First, the survey could have been conducted in two parts; students needed more time than expected to complete it. Second, if the study is to be carried out for second-year students, grades of general engineering courses can be included in the model for better estimation of students’ grade point averages. Third, the study only applies to first-year and second-year students because factors covered are those that are essential for students’ survival through the first few years of study.

Practical implications

The study proposes that vulnerable students could be identified as early as possible in the academic year. These students could be encouraged to engage more in their learning process. Carrying out such measurement at the beginning of the college year can provide professional and college administration with valuable insight on students perception of their own skills and attitudes toward engineering.

Originality/value

This study employs the KAB model as a comprehensive approach to the study of success predictors. The implementation of two new methodologies to improve the prediction accuracy of student success.

Details

Journal of Applied Research in Higher Education, vol. 11 no. 2
Type: Research Article
ISSN: 2050-7003

Keywords

Open Access
Article
Publication date: 5 March 2021

Xuan Ji, Jiachen Wang and Zhijun Yan

Stock price prediction is a hot topic and traditional prediction methods are usually based on statistical and econometric models. However, these models are difficult to deal with…

16629

Abstract

Purpose

Stock price prediction is a hot topic and traditional prediction methods are usually based on statistical and econometric models. However, these models are difficult to deal with nonstationary time series data. With the rapid development of the internet and the increasing popularity of social media, online news and comments often reflect investors’ emotions and attitudes toward stocks, which contains a lot of important information for predicting stock price. This paper aims to develop a stock price prediction method by taking full advantage of social media data.

Design/methodology/approach

This study proposes a new prediction method based on deep learning technology, which integrates traditional stock financial index variables and social media text features as inputs of the prediction model. This study uses Doc2Vec to build long text feature vectors from social media and then reduce the dimensions of the text feature vectors by stacked auto-encoder to balance the dimensions between text feature variables and stock financial index variables. Meanwhile, based on wavelet transform, the time series data of stock price is decomposed to eliminate the random noise caused by stock market fluctuation. Finally, this study uses long short-term memory model to predict the stock price.

Findings

The experiment results show that the method performs better than all three benchmark models in all kinds of evaluation indicators and can effectively predict stock price.

Originality/value

In this paper, this study proposes a new stock price prediction model that incorporates traditional financial features and social media text features which are derived from social media based on deep learning technology.

Details

International Journal of Crowd Science, vol. 5 no. 1
Type: Research Article
ISSN: 2398-7294

Keywords

1 – 10 of 185