Search results

Purpose – This study aims to introduce an application of web‐based data mining that integrates online data collection and data mining in selling strategies for online auctions…

HTML

PDF (257 KB)

Downloads

2225

Abstract

Purpose – This study aims to introduce an application of web‐based data mining that integrates online data collection and data mining in selling strategies for online auctions. This study seeks to illustrate the process of spider online data collection from eBay and the application of the classification and regression tree (CART) in constructing effective selling strategies. Design/methodology/approach – After developing a prototype of web‐based data mining, the four steps of spider online data collection and CART data mining are shown. A business dataset from eBay is collected, and the application to derive effective selling strategies for online auctions is used. Findings – In the web‐based data‐mining application the spiders can effectively and efficiently collect online auction data from the internet, and the CART model provides sellers with effective selling strategies. By using expected auction prices with the classification and regression trees, sellers can integrate their two primary goals, i.e. auction success and anticipated prices, in their selling strategies for online auctions. Practical implications – This study provides sellers with a useful tool to construct effective selling strategies by taking advantage of web‐based data mining. These effective selling strategies will help improve their online auction performance. Originality/value – This study contributes to the literature by providing an innovative tool for collecting online data and for constructing effective selling strategies, which are important for the growth of electronic marketplaces.

Details

Online Information Review, vol. 32 no. 2

Type: Research Article

DOI:

ISSN: 1468-4527

Keywords

View access options

Article

Publication date: 15 September 2023

Lost in the middle – a pragmatic approach for ERP managers to prioritize known vulnerabilities by applying classification and regression trees (CART)

Richard G. Mathieu and Alan E. Turovlin

Cyber risk has significantly increased over the past twenty years. In many organizations, data and operations are managed through a complex technology stack underpinned by an…

HTML

PDF (833 KB)

Downloads

153

Abstract

Purpose

Cyber risk has significantly increased over the past twenty years. In many organizations, data and operations are managed through a complex technology stack underpinned by an Enterprise Resource Planning (ERP) system such as systemanalyse programmentwicklung (SAP). The ERP environment by itself can be overwhelming for a typical ERP Manager, coupled with increasing cybersecurity issues that arise creating periods of intense time pressure, stress and workload, increasing risk to the organization. This paper aims to identify a pragmatic approach to prioritize vulnerabilities for the ERP Manager.

Design/methodology/approach

Applying attention-based theory, a pragmatic approach is developed to prioritize an organization’s response to the National Institute of Standards and Technology (NIST) National Vulnerability Database (NVD) vulnerabilities using a Classification and Regression Tree (CART).

Findings

The application of classification and regression tree (CART) to the National Institute of Standards and Technology’s National Vulnerability Database identifies prioritization unavailable within the NIST’s categorization.

Practical implications

The ERP Manager is a role between technology, functionality, centralized control and organization data. Without CART, vulnerabilities are left to a reactive approach, subject to overwhelming situations due to intense time pressure, stress and workload.

Originality/value

To the best of the authors’ knowledge, this work is original and has not been published elsewhere, nor is it currently under consideration for publication elsewhere. CART has previously not been applied to the prioritizing cybersecurity vulnerabilities.

Details

Information & Computer Security, vol. 31 no. 5

Type: Research Article

DOI:

ISSN: 2056-4961

Keywords

View access options

Article

Publication date: 29 May 2009

Housing price determinants in Istanbul, Turkey: An application of the classification and regression tree model

Onur Özsoy and Hasan Şahin

The purpose of this paper is to analyze empirically major factors that affect housing prices in Istanbul, Turkey using the classification and regression tree (CART) approach.

HTML

PDF (235 KB)

Downloads

1419

Abstract

Purpose

The purpose of this paper is to analyze empirically major factors that affect housing prices in Istanbul, Turkey using the classification and regression tree (CART) approach.

Design/methodology/approach

The data set was collected from various internet pages of real estate agencies during June 2007. The CART approach was then applied to derive main results and to make implications with regard to the housing market in Istanbul, Turkey.

Findings

The CART results indicate that sizes, elavators, existance of security, existance of central heating units and existance of view are the most important variables crucially affecting housing prices in Istanbul. The average price of houses in Istanbul was found to be 373,372.36 New Turkish Liras. The average size of a house was 138.37 m². The average age of houses is 15.07 years old with the average number of rooms being 3.11. The average number of baths is 1.43 and average number of toilets is 1.22. Only 5 percent of homes have storage space, 45 percent of homes have parking space, 64 percent of homes are heated with furnace, whereas only 29 percent of homes are used central heating system. Among the 31 variables employed in this study, it was concluded size, elavator, security, central heating unit and view are the most important factors that have impact on housing prices in housing market in Istanbul.

Practical implications

Future research and analysis of housing market in Istanbul and in Turkey can benefit from the method used in this study and findings derived from this research to come up with more general model(s) to include more houses in a wide range of regions in Turkey to analyze the determinants of housing prices in Turkey in general.

Originality/value

Examining housing prices using the CART model is relatively new in the field of housing economics. Additionally, this study is the first to use the CART model to analyze housing market in Istanbul and in Turkey and derive valuable housing policies to be used by the authorities.

Details

International Journal of Housing Markets and Analysis, vol. 2 no. 2

Type: Research Article

DOI:

ISSN: 1753-8270

Keywords

Open Access

Article

Publication date: 22 November 2022

Research on optimization of index system design and its inspection method: data quality diagnosis, index classification and stratification

Kedong Yin, Yun Cao, Shiwei Zhou and Xinman Lv

The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems…

HTML

PDF (779 KB)

Downloads

691

Abstract

Purpose

The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems for the design optimization and inspection process. The research may form the basis for a rational, comprehensive evaluation and provide the most effective way of improving the quality of management decision-making. It is of practical significance to improve the rationality and reliability of the index system and provide standardized, scientific reference standards and theoretical guidance for the design and construction of the index system.

Design/methodology/approach

Using modern methods such as complex networks and machine learning, a system for the quality diagnosis of index data and the classification and stratification of index systems is designed. This guarantees the quality of the index data, realizes the scientific classification and stratification of the index system, reduces the subjectivity and randomness of the design of the index system, enhances its objectivity and rationality and lays a solid foundation for the optimal design of the index system.

Findings

Based on the ideas of statistics, system theory, machine learning and data mining, the focus in the present research is on “data quality diagnosis” and “index classification and stratification” and clarifying the classification standards and data quality characteristics of index data; a data-quality diagnosis system of “data review – data cleaning – data conversion – data inspection” is established. Using a decision tree, explanatory structural model, cluster analysis, K-means clustering and other methods, classification and hierarchical method system of indicators is designed to reduce the redundancy of indicator data and improve the quality of the data used. Finally, the scientific and standardized classification and hierarchical design of the index system can be realized.

Originality/value

The innovative contributions and research value of the paper are reflected in three aspects. First, a method system for index data quality diagnosis is designed, and multi-source data fusion technology is adopted to ensure the quality of multi-source, heterogeneous and mixed-frequency data of the index system. The second is to design a systematic quality-inspection process for missing data based on the systematic thinking of the whole and the individual. Aiming at the accuracy, reliability, and feasibility of the patched data, a quality-inspection method of patched data based on inversion thought and a unified representation method of data fusion based on a tensor model are proposed. The third is to use the modern method of unsupervised learning to classify and stratify the index system, which reduces the subjectivity and randomness of the design of the index system and enhances its objectivity and rationality.

Details

Marine Economics and Management, vol. 5 no. 2

Type: Research Article

DOI:

ISSN: 2516-158X

Keywords

Open Access

Article

Publication date: 2 April 2019

Data envelopment analysis and data mining to efficiency estimation and evaluation

Abdel Latef M. Anouze and Imad Bou-Hamad

This paper aims to assess the application of seven statistical and data mining techniques to second-stage data envelopment analysis (DEA) for bank performance.

HTML

PDF (989 KB)

Downloads

6765

Abstract

Purpose

This paper aims to assess the application of seven statistical and data mining techniques to second-stage data envelopment analysis (DEA) for bank performance.

Design/methodology/approach

Different statistical and data mining techniques are used to second-stage DEA for bank performance as a part of an attempt to produce a powerful model for bank performance with effective predictive ability. The projected data mining tools are classification and regression trees (CART), conditional inference trees (CIT), random forest based on CART and CIT, bagging, artificial neural networks and their statistical counterpart, logistic regression.

Findings

The results showed that random forests and bagging outperform other methods in terms of predictive power.

Originality/value

This is the first study to assess the impact of environmental factors on banking performance in Middle East and North Africa countries.

Details

International Journal of Islamic and Middle Eastern Finance and Management, vol. 12 no. 2

Type: Research Article

DOI:

ISSN: 1753-8394

Keywords

View access options

Article

Publication date: 16 March 2010

Classification and prediction in customer scoring

Cataldo Zuccaro

The purpose of this paper is to discuss and assess the structural characteristics (conceptual utility) of the most popular classification and predictive techniques employed in…

HTML

PDF (119 KB)

Downloads

2327

Abstract

Purpose

The purpose of this paper is to discuss and assess the structural characteristics (conceptual utility) of the most popular classification and predictive techniques employed in customer relationship management and customer scoring and to evaluate their classification and predictive precision.

Design/methodology/approach

A sample of customers' credit rating and socio‐demographic profiles are employed to evaluate the analytic and classification properties of discriminant analysis, binary logistic regression, artificial neural networks, C5 algorithm, and regression trees employing Chi‐squared Automatic Interaction Detector (CHAID).

Findings

With regards to interpretability and the conceptual utility of the parameters generated by the five techniques, logistic regression provides easily interpretable parameters through its logit. The logits can be interpreted in the same way as regression slopes. In addition, the logits can be converted to odds providing a common sense evaluation of the relative importance of each independent variable. Finally, the technique provides robust statistical tests to evaluate the model parameters. Finally, both CHAID and the C5 algorithm provide visual tools (regression tree) and semantic rules (rule set for classification) to facilitate the interpretation of the model parameters. These can be highly desirable properties when the researcher attempts to explain the conceptual and operational foundations of the model.

Originality/value

Most treatments of complex classification procedures have been undertaken idiosyncratically, that is, evaluating only one technique. This paper evaluates and compares the conceptual utility and predictive precision of five different classification techniques on a moderate sample size and provides clear guidelines in technique selection when undertaking customer scoring and classification.

Details

Journal of Modelling in Management, vol. 5 no. 1

Type: Research Article

DOI:

ISSN: 1746-5664

Keywords

View access options

Article

Publication date: 28 February 2023

A comparative analysis of text representation, classification and clustering methods over real project proposals

Meltem Aksoy, Seda Yanık and Mehmet Fatih Amasyali

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals…

HTML

PDF (5.3 MB)

Downloads

271

Abstract

Purpose

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals are primarily based on manual matching of similar topics, discipline areas and keywords declared by project applicants. When the number of proposals increases, this task becomes complex and requires excessive time. This paper aims to demonstrate how to effectively use the rich information in the titles and abstracts of Turkish project proposals to group them automatically.

Design/methodology/approach

This study proposes a model that effectively groups Turkish project proposals by combining word embedding, clustering and classification techniques. The proposed model uses FastText, BERT and term frequency/inverse document frequency (TF/IDF) word-embedding techniques to extract terms from the titles and abstracts of project proposals in Turkish. The extracted terms were grouped using both the clustering and classification techniques. Natural groups contained within the corpus were discovered using k-means, k-means++, k-medoids and agglomerative clustering algorithms. Additionally, this study employs classification approaches to predict the target class for each document in the corpus. To classify project proposals, various classifiers, including k-nearest neighbors (KNN), support vector machines (SVM), artificial neural networks (ANN), classification and regression trees (CART) and random forest (RF), are used. Empirical experiments were conducted to validate the effectiveness of the proposed method by using real data from the Istanbul Development Agency.

Findings

The results show that the generated word embeddings can effectively represent proposal texts as vectors, and can be used as inputs for clustering or classification algorithms. Using clustering algorithms, the document corpus is divided into five groups. In addition, the results demonstrate that the proposals can easily be categorized into predefined categories using classification algorithms. SVM-Linear achieved the highest prediction accuracy (89.2%) with the FastText word embedding method. A comparison of manual grouping with automatic classification and clustering results revealed that both classification and clustering techniques have a high success rate.

Research limitations/implications

The proposed model automatically benefits from the rich information in project proposals and significantly reduces numerous time-consuming tasks that managers must perform manually. Thus, it eliminates the drawbacks of the current manual methods and yields significantly more accurate results. In the future, additional experiments should be conducted to validate the proposed method using data from other funding organizations.

Originality/value

This study presents the application of word embedding methods to effectively use the rich information in the titles and abstracts of Turkish project proposals. Existing research studies focus on the automatic grouping of proposals; traditional frequency-based word embedding methods are used for feature extraction methods to represent project proposals. Unlike previous research, this study employs two outperforming neural network-based textual feature extraction techniques to obtain terms representing the proposals: BERT as a contextual word embedding method and FastText as a static word embedding method. Moreover, to the best of our knowledge, there has been no research conducted on the grouping of project proposals in Turkish.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 16 no. 3

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

View access options

Article

Publication date: 14 May 2020

Identifying financial statement fraud with decision rules obtained from Modified Random Forest

Byungdae An and Yongmoo Suh

Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies…

HTML

PDF (751 KB)

Downloads

875

Abstract

Purpose

Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies tend to conceal bad information, which causes a great loss to various stakeholders. Thus, the objective of the paper is to propose a novel approach to building a classification model to identify FSF, which shows high classification performance and from which human-readable rules are extracted to explain why a company is likely to commit FSF.

Design/methodology/approach

Having prepared multiple sub-datasets to cope with class imbalance problem, we build a set of decision trees for each sub-dataset; select a subset of the set as a model for the sub-dataset by removing the tree, each of whose performance is less than the average accuracy of all trees in the set; and then select one such model which shows the best accuracy among the models. We call the resulting model MRF (Modified Random Forest). Given a new instance, we extract rules from the MRF model to explain whether the company corresponding to the new instance is likely to commit FSF or not.

Findings

Experimental results show that MRF classifier outperformed the benchmark models. The results also revealed that all the variables related to profit belong to the set of the most important indicators to FSF and that two new variables related to gross profit which were unapprised in previous studies on FSF were identified.

Originality/value

This study proposed a method of building a classification model which shows the outstanding performance and provides decision rules that can be used to explain the classification results. In addition, a new way to resolve the class imbalance problem was suggested in this paper.

Details

Data Technologies and Applications, vol. 54 no. 2

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 1 July 2000

Kanban setting through artificial intelligence: a comparative study of artificial neural networks and decision trees

Ina S. Markham, Richard G. Mathieu and Barry A. Wray

Determining the number of circulating kanban cards is important in order effectively to operate a just‐in‐time with kanban production system. While a number of techniques exist…

HTML

PDF (145 KB)

Downloads

1494

Abstract

Determining the number of circulating kanban cards is important in order effectively to operate a just‐in‐time with kanban production system. While a number of techniques exist for setting the number of kanbans, artificial neural networks (ANNs) and classification and regression trees (CARTs) represent two practical approaches with special capabilities for operationalizing the kanban setting problem. This paper provides a comparison of ANNs with CART for setting the number of kanbans in a dynamically varying production environment. Our results show that both methods are comparable in terms of accuracy and response speed, but that CARTs have advantages in terms of explainability and development speed. The paper concludes with a discussion of the implications of using these techniques in an operational setting.

Details

Integrated Manufacturing Systems, vol. 11 no. 4

Type: Research Article

DOI:

ISSN: 0957-6061

Keywords

Access

Year

Content type

1 – 10 of 279

Abstract

Details

Abstract

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Details

Keywords

Access

Year

Content type

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information