Search results
1 – 10 of 279
Purpose – This study aims to introduce an application of web‐based data mining that integrates online data collection and data mining in selling strategies for online auctions…
Abstract
Purpose – This study aims to introduce an application of web‐based data mining that integrates online data collection and data mining in selling strategies for online auctions. This study seeks to illustrate the process of spider online data collection from eBay and the application of the classification and regression tree (CART) in constructing effective selling strategies. Design/methodology/approach – After developing a prototype of web‐based data mining, the four steps of spider online data collection and CART data mining are shown. A business dataset from eBay is collected, and the application to derive effective selling strategies for online auctions is used. Findings – In the web‐based data‐mining application the spiders can effectively and efficiently collect online auction data from the internet, and the CART model provides sellers with effective selling strategies. By using expected auction prices with the classification and regression trees, sellers can integrate their two primary goals, i.e. auction success and anticipated prices, in their selling strategies for online auctions. Practical implications – This study provides sellers with a useful tool to construct effective selling strategies by taking advantage of web‐based data mining. These effective selling strategies will help improve their online auction performance. Originality/value – This study contributes to the literature by providing an innovative tool for collecting online data and for constructing effective selling strategies, which are important for the growth of electronic marketplaces.
Details
Keywords
Richard G. Mathieu and Alan E. Turovlin
Cyber risk has significantly increased over the past twenty years. In many organizations, data and operations are managed through a complex technology stack underpinned by an…
Abstract
Purpose
Cyber risk has significantly increased over the past twenty years. In many organizations, data and operations are managed through a complex technology stack underpinned by an Enterprise Resource Planning (ERP) system such as systemanalyse programmentwicklung (SAP). The ERP environment by itself can be overwhelming for a typical ERP Manager, coupled with increasing cybersecurity issues that arise creating periods of intense time pressure, stress and workload, increasing risk to the organization. This paper aims to identify a pragmatic approach to prioritize vulnerabilities for the ERP Manager.
Design/methodology/approach
Applying attention-based theory, a pragmatic approach is developed to prioritize an organization’s response to the National Institute of Standards and Technology (NIST) National Vulnerability Database (NVD) vulnerabilities using a Classification and Regression Tree (CART).
Findings
The application of classification and regression tree (CART) to the National Institute of Standards and Technology’s National Vulnerability Database identifies prioritization unavailable within the NIST’s categorization.
Practical implications
The ERP Manager is a role between technology, functionality, centralized control and organization data. Without CART, vulnerabilities are left to a reactive approach, subject to overwhelming situations due to intense time pressure, stress and workload.
Originality/value
To the best of the authors’ knowledge, this work is original and has not been published elsewhere, nor is it currently under consideration for publication elsewhere. CART has previously not been applied to the prioritizing cybersecurity vulnerabilities.
Details
Keywords
The purpose of this paper is to analyze empirically major factors that affect housing prices in Istanbul, Turkey using the classification and regression tree (CART) approach.
Abstract
Purpose
The purpose of this paper is to analyze empirically major factors that affect housing prices in Istanbul, Turkey using the classification and regression tree (CART) approach.
Design/methodology/approach
The data set was collected from various internet pages of real estate agencies during June 2007. The CART approach was then applied to derive main results and to make implications with regard to the housing market in Istanbul, Turkey.
Findings
The CART results indicate that sizes, elavators, existance of security, existance of central heating units and existance of view are the most important variables crucially affecting housing prices in Istanbul. The average price of houses in Istanbul was found to be 373,372.36 New Turkish Liras. The average size of a house was 138.37 m2. The average age of houses is 15.07 years old with the average number of rooms being 3.11. The average number of baths is 1.43 and average number of toilets is 1.22. Only 5 percent of homes have storage space, 45 percent of homes have parking space, 64 percent of homes are heated with furnace, whereas only 29 percent of homes are used central heating system. Among the 31 variables employed in this study, it was concluded size, elavator, security, central heating unit and view are the most important factors that have impact on housing prices in housing market in Istanbul.
Practical implications
Future research and analysis of housing market in Istanbul and in Turkey can benefit from the method used in this study and findings derived from this research to come up with more general model(s) to include more houses in a wide range of regions in Turkey to analyze the determinants of housing prices in Turkey in general.
Originality/value
Examining housing prices using the CART model is relatively new in the field of housing economics. Additionally, this study is the first to use the CART model to analyze housing market in Istanbul and in Turkey and derive valuable housing policies to be used by the authorities.
Details
Keywords
Kedong Yin, Yun Cao, Shiwei Zhou and Xinman Lv
The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems…
Abstract
Purpose
The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems for the design optimization and inspection process. The research may form the basis for a rational, comprehensive evaluation and provide the most effective way of improving the quality of management decision-making. It is of practical significance to improve the rationality and reliability of the index system and provide standardized, scientific reference standards and theoretical guidance for the design and construction of the index system.
Design/methodology/approach
Using modern methods such as complex networks and machine learning, a system for the quality diagnosis of index data and the classification and stratification of index systems is designed. This guarantees the quality of the index data, realizes the scientific classification and stratification of the index system, reduces the subjectivity and randomness of the design of the index system, enhances its objectivity and rationality and lays a solid foundation for the optimal design of the index system.
Findings
Based on the ideas of statistics, system theory, machine learning and data mining, the focus in the present research is on “data quality diagnosis” and “index classification and stratification” and clarifying the classification standards and data quality characteristics of index data; a data-quality diagnosis system of “data review – data cleaning – data conversion – data inspection” is established. Using a decision tree, explanatory structural model, cluster analysis, K-means clustering and other methods, classification and hierarchical method system of indicators is designed to reduce the redundancy of indicator data and improve the quality of the data used. Finally, the scientific and standardized classification and hierarchical design of the index system can be realized.
Originality/value
The innovative contributions and research value of the paper are reflected in three aspects. First, a method system for index data quality diagnosis is designed, and multi-source data fusion technology is adopted to ensure the quality of multi-source, heterogeneous and mixed-frequency data of the index system. The second is to design a systematic quality-inspection process for missing data based on the systematic thinking of the whole and the individual. Aiming at the accuracy, reliability, and feasibility of the patched data, a quality-inspection method of patched data based on inversion thought and a unified representation method of data fusion based on a tensor model are proposed. The third is to use the modern method of unsupervised learning to classify and stratify the index system, which reduces the subjectivity and randomness of the design of the index system and enhances its objectivity and rationality.
Details
Keywords
Abdel Latef M. Anouze and Imad Bou-Hamad
This paper aims to assess the application of seven statistical and data mining techniques to second-stage data envelopment analysis (DEA) for bank performance.
Abstract
Purpose
This paper aims to assess the application of seven statistical and data mining techniques to second-stage data envelopment analysis (DEA) for bank performance.
Design/methodology/approach
Different statistical and data mining techniques are used to second-stage DEA for bank performance as a part of an attempt to produce a powerful model for bank performance with effective predictive ability. The projected data mining tools are classification and regression trees (CART), conditional inference trees (CIT), random forest based on CART and CIT, bagging, artificial neural networks and their statistical counterpart, logistic regression.
Findings
The results showed that random forests and bagging outperform other methods in terms of predictive power.
Originality/value
This is the first study to assess the impact of environmental factors on banking performance in Middle East and North Africa countries.
Details
Keywords
The purpose of this paper is to discuss and assess the structural characteristics (conceptual utility) of the most popular classification and predictive techniques employed in…
Abstract
Purpose
The purpose of this paper is to discuss and assess the structural characteristics (conceptual utility) of the most popular classification and predictive techniques employed in customer relationship management and customer scoring and to evaluate their classification and predictive precision.
Design/methodology/approach
A sample of customers' credit rating and socio‐demographic profiles are employed to evaluate the analytic and classification properties of discriminant analysis, binary logistic regression, artificial neural networks, C5 algorithm, and regression trees employing Chi‐squared Automatic Interaction Detector (CHAID).
Findings
With regards to interpretability and the conceptual utility of the parameters generated by the five techniques, logistic regression provides easily interpretable parameters through its logit. The logits can be interpreted in the same way as regression slopes. In addition, the logits can be converted to odds providing a common sense evaluation of the relative importance of each independent variable. Finally, the technique provides robust statistical tests to evaluate the model parameters. Finally, both CHAID and the C5 algorithm provide visual tools (regression tree) and semantic rules (rule set for classification) to facilitate the interpretation of the model parameters. These can be highly desirable properties when the researcher attempts to explain the conceptual and operational foundations of the model.
Originality/value
Most treatments of complex classification procedures have been undertaken idiosyncratically, that is, evaluating only one technique. This paper evaluates and compares the conceptual utility and predictive precision of five different classification techniques on a moderate sample size and provides clear guidelines in technique selection when undertaking customer scoring and classification.
Details
Keywords
Meltem Aksoy, Seda Yanık and Mehmet Fatih Amasyali
When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals…
Abstract
Purpose
When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals are primarily based on manual matching of similar topics, discipline areas and keywords declared by project applicants. When the number of proposals increases, this task becomes complex and requires excessive time. This paper aims to demonstrate how to effectively use the rich information in the titles and abstracts of Turkish project proposals to group them automatically.
Design/methodology/approach
This study proposes a model that effectively groups Turkish project proposals by combining word embedding, clustering and classification techniques. The proposed model uses FastText, BERT and term frequency/inverse document frequency (TF/IDF) word-embedding techniques to extract terms from the titles and abstracts of project proposals in Turkish. The extracted terms were grouped using both the clustering and classification techniques. Natural groups contained within the corpus were discovered using k-means, k-means++, k-medoids and agglomerative clustering algorithms. Additionally, this study employs classification approaches to predict the target class for each document in the corpus. To classify project proposals, various classifiers, including k-nearest neighbors (KNN), support vector machines (SVM), artificial neural networks (ANN), classification and regression trees (CART) and random forest (RF), are used. Empirical experiments were conducted to validate the effectiveness of the proposed method by using real data from the Istanbul Development Agency.
Findings
The results show that the generated word embeddings can effectively represent proposal texts as vectors, and can be used as inputs for clustering or classification algorithms. Using clustering algorithms, the document corpus is divided into five groups. In addition, the results demonstrate that the proposals can easily be categorized into predefined categories using classification algorithms. SVM-Linear achieved the highest prediction accuracy (89.2%) with the FastText word embedding method. A comparison of manual grouping with automatic classification and clustering results revealed that both classification and clustering techniques have a high success rate.
Research limitations/implications
The proposed model automatically benefits from the rich information in project proposals and significantly reduces numerous time-consuming tasks that managers must perform manually. Thus, it eliminates the drawbacks of the current manual methods and yields significantly more accurate results. In the future, additional experiments should be conducted to validate the proposed method using data from other funding organizations.
Originality/value
This study presents the application of word embedding methods to effectively use the rich information in the titles and abstracts of Turkish project proposals. Existing research studies focus on the automatic grouping of proposals; traditional frequency-based word embedding methods are used for feature extraction methods to represent project proposals. Unlike previous research, this study employs two outperforming neural network-based textual feature extraction techniques to obtain terms representing the proposals: BERT as a contextual word embedding method and FastText as a static word embedding method. Moreover, to the best of our knowledge, there has been no research conducted on the grouping of project proposals in Turkish.
Details
Keywords
Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies…
Abstract
Purpose
Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies tend to conceal bad information, which causes a great loss to various stakeholders. Thus, the objective of the paper is to propose a novel approach to building a classification model to identify FSF, which shows high classification performance and from which human-readable rules are extracted to explain why a company is likely to commit FSF.
Design/methodology/approach
Having prepared multiple sub-datasets to cope with class imbalance problem, we build a set of decision trees for each sub-dataset; select a subset of the set as a model for the sub-dataset by removing the tree, each of whose performance is less than the average accuracy of all trees in the set; and then select one such model which shows the best accuracy among the models. We call the resulting model MRF (Modified Random Forest). Given a new instance, we extract rules from the MRF model to explain whether the company corresponding to the new instance is likely to commit FSF or not.
Findings
Experimental results show that MRF classifier outperformed the benchmark models. The results also revealed that all the variables related to profit belong to the set of the most important indicators to FSF and that two new variables related to gross profit which were unapprised in previous studies on FSF were identified.
Originality/value
This study proposed a method of building a classification model which shows the outstanding performance and provides decision rules that can be used to explain the classification results. In addition, a new way to resolve the class imbalance problem was suggested in this paper.
Details
Keywords
Ina S. Markham, Richard G. Mathieu and Barry A. Wray
Determining the number of circulating kanban cards is important in order effectively to operate a just‐in‐time with kanban production system. While a number of techniques exist…
Abstract
Determining the number of circulating kanban cards is important in order effectively to operate a just‐in‐time with kanban production system. While a number of techniques exist for setting the number of kanbans, artificial neural networks (ANNs) and classification and regression trees (CARTs) represent two practical approaches with special capabilities for operationalizing the kanban setting problem. This paper provides a comparison of ANNs with CART for setting the number of kanbans in a dynamically varying production environment. Our results show that both methods are comparable in terms of accuracy and response speed, but that CARTs have advantages in terms of explainability and development speed. The paper concludes with a discussion of the implications of using these techniques in an operational setting.
Details